secure

Title: Secure Internet Exams Despite Coercion. (arXiv:2207.12796v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12796
Code URL: null
Copy Paste: [[2207.12796] Secure Internet Exams Despite Coercion](http://arxiv.org/abs/2207.12796)
Summary:
We study coercion-resistance for online exams. We propose two properties, Anonymous Submission and Single-Blindness which, if hold, preserve the anonymity of the links between tests, test takers, and examiners even when the parties coerce one another into revealing secrets. The properties are relevant: not even Remark!, a secure exam protocol that satisfied anonymous marking and anonymous examiners results to be coercion resistant. Then, we propose a coercion-resistance protocol which satisfies, in addition to known anonymity properties, the two novel properties we have introduced. We prove our claims formally in ProVerif. The paper has also another contribution: it describes an attack (and a fix) to an exponentiation mixnet that Remark! uses to ensure unlinkability. We use the secure version of the mixnet in our new protocol.

Title: Review of Advanced Monitoring Mechanisms in Peer-to-Peer (P2P) Botnets. (arXiv:2207.12936v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12936
Code URL: null
Copy Paste: [[2207.12936] Review of Advanced Monitoring Mechanisms in Peer-to-Peer (P2P) Botnets](http://arxiv.org/abs/2207.12936)
Summary:
Internet security is getting less secure because of the existing of botnet threats. An attack plan can only be planned out to take down the botnet after the monitoring activities to understand the behaviour of a botnet. Nowadays, the architecture of the botnet is developed using Peer-to-Peer (P2P) connection causing it to be harder to be monitored and track down. This paper is mainly about existing botnet monitoring tools. The purpose of this paper is to study the ways to monitor a botnet and how monitoring mechanism works. The monitoring tools are categorized into active and passive mechanism. A crawler is an active mechanism while sensor and Honeypot are the passive mechanisms. Previous work about each mechanism is present in this paper as well.

Title: Spatial data sharing with secure multi-party computation for exploratory spatial data analysis. (arXiv:2207.13069v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.13069
Code URL: null
Copy Paste: [[2207.13069] Spatial data sharing with secure multi-party computation for exploratory spatial data analysis](http://arxiv.org/abs/2207.13069)
Summary:
Spatial data sharing plays a significant role in opening data research and promoting government agency transparency. However, valuable spatial data, like high-precision geographic information and personal traffic records, cannot be made public because they may incur leakage risks such as intrusion, theft, and the unauthorised sale of proprietary information. When participants with confidential data distrust each other but want to use the other datasets for calculations, the most common solution is to provide their original data to a trusted third party. However, the trusted third party frequently risks being attacked and having the data leaked. To maintain data controllability, most companies and organisations refuse to share their data. In this study, we introduce secure multi-party computation to spatial data sharing to address the sharing problem. Additionally, we describe the design and implementation of the protocols of two exploratory spatial data analyses: global Moran's I and local Moran's I. Furthermore, we build a system to demonstrate process realisation and results visualisation. Comparing our system with existing data-sharing schemes, our system Identifies the correct result without incurring leaking risks during spatial data sharing.

security

Title: Towards Smart City Security: Violence and Weaponized Violence Detection using DCNN. (arXiv:2207.12850v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12850
Code URL: https://github.com/Ti-Oluwanimi/Violence_Detection
Copy Paste: [[2207.12850] Towards Smart City Security: Violence and Weaponized Violence Detection using DCNN](http://arxiv.org/abs/2207.12850)
Summary:
In this ever connected society, CCTVs have had a pivotal role in enforcing safety and security of the citizens by recording unlawful activities for the authorities to take actions. In a smart city context, using Deep Convolutional Neural Networks (DCNN) to detection violence and weaponized violence from CCTV videos will provide an additional layer of security by ensuring real-time detection around the clock. In this work, we introduced a new specialised dataset by gathering real CCTV footage of both weaponized and non-weaponized violence as well as non-violence videos from YouTube. We also proposed a novel approach in merging consecutive video frames into a single salient image which will then be the input to the DCNN. Results from multiple DCNN architectures have proven the effectiveness of our method by having the highest accuracy of 99\%. We also take into consideration the efficiency of our methods through several parameter trade-offs to ensure smart city sustainability.

Title: Scalable Cyber-Physical Testbed for Cybersecurity Evaluation of Synchrophasors in Power Systems. (arXiv:2207.12610v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12610
Code URL: null
Copy Paste: [[2207.12610] Scalable Cyber-Physical Testbed for Cybersecurity Evaluation of Synchrophasors in Power Systems](http://arxiv.org/abs/2207.12610)
Summary:
This paper presents a real-time cyber-physical (CPS) testbed for power systems with different real attack scenarios on the synchrophasors-phasor measurement units (PMU). The testbed focuses on real-time cyber-security emulation with components including a digital real-time simulator, virtual machines (VM), a communication network emulator, and a package manipulation tool. The script-based VM deployment and the software-defined network emulation facilitate a highly-scalable cyber-physical testbed, which enables emulations of a real power system under different attack scenarios such as Address Resolution Protocol (ARP) poisoning attack, Man In The Middle (MITM) attack, False Data Injection Attack (FDIA), and Eavesdropping Attack. The common synchrophasor, IEEE C37.118.2 named pySynphasor has been implemented and analyzed for its security vulnerabilities. The paper also presented an interactive framework for injecting false data into a realistic system utilizing the pySynphasor module. The framework can dissect and reconstruct the C37.118.2 packets, which expands the potential of testing and developing PMU-based systems and their security in detail and benefits the power industry and academia. A case for the demonstration of the FDIA attack on the linear state estimation together with the bad-data detection procedure are presented as an example of the testbed capability.

Title: Review of Peer-to-Peer Botnets and Detection Mechanisms. (arXiv:2207.12937v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12937
Code URL: null
Copy Paste: [[2207.12937] Review of Peer-to-Peer Botnets and Detection Mechanisms](http://arxiv.org/abs/2207.12937)
Summary:
Cybercrimes are becoming a bigger menace to both people and corporations. It poses a serious challenge to the modern digital world. According to a press release from 2019 Cisco and Cybersecurity Ventures, Cisco stopped seven trillion threats in 2018, or 20 billion threats every day, on behalf of its clients. According to Cybersecurity Ventures, the global cost of cybercrime will reach \$6 trillion annually by 2021, which is significantly more than the annual damage caused by all natural disasters and more profitable than the global trade in all major illegal narcotics put together. Malware software, including viruses, worms, spyware, keyloggers, Trojan horses, and botnets, is therefore frequently used in cybercrime. The most common malware employed by attackers to carry out cybercrimes is the botnet, which is available in a variety of forms and for a variety of purposes when attacking computer assets. However, the issue continues to exist and worsen, seriously harming both enterprises and people who conduct their business online. The detection of P2P (Peer to Peer) botnet, which has emerged as one of the primary hazards in network cyberspace for acting as the infrastructure for several cyber-crimes, has proven more difficult than regular botnets using a few existing approaches. As a result, this study will explore various P2P botnet detection algorithms by outlining their essential characteristics, advantages and disadvantages, obstacles, and future research.

Title: On the Security of IO-Link Wireless Communication in the Safety Domain. (arXiv:2207.12938v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12938
Code URL: null
Copy Paste: [[2207.12938] On the Security of IO-Link Wireless Communication in the Safety Domain](http://arxiv.org/abs/2207.12938)
Summary:
Security is an essential requirement of Industrial Control System (ICS) environments and its underlying communication infrastructure. Especially the lowest communication level within Supervisory Control and Data Acquisition (SCADA) systems - the field level - commonly lacks security measures. Since emerging wireless technologies within field level expose the lowest communication infrastructure towards potential attackers, additional security measures above the prevalent concept of air-gapped communication must be considered. Therefore, this work analyzes security aspects for the wireless communication protocol IO-LinkWireless (IOLW), which is commonly used for sensor and actuator field level communication. A possible architecture for an IOLW safety layer has already been presented recently [1]. In this paper, the overall attack surface of IOLW within its typical environment is analyzed and attack preconditions are investigated to assess the effectiveness of different security measures. Additionally, enhanced security measures are evaluated for the communication systems and the results are summarized. Also, interference of security measures and functional safety principles within the communication are investigated, which do not necessarily complement one another but may also have contradictory requirements. This work is intended to discuss and propose enhancements of the IOLW standard with additional security considerations in future implementations.

Title: Reconciling Security and Communication Efficiency in Federated Learning. (arXiv:2207.12779v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12779
Code URL: null
Copy Paste: [[2207.12779] Reconciling Security and Communication Efficiency in Federated Learning](http://arxiv.org/abs/2207.12779)
Summary:
Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper, we formalize and address the problem of compressing client-to-server model updates under the Secure Aggregation primitive, a core component of Federated Learning pipelines that allows the server to aggregate the client updates without accessing them individually. In particular, we adapt standard scalar quantization and pruning methods to Secure Aggregation and propose Secure Indexing, a variant of Secure Aggregation that supports quantization for extreme compression. We establish state-of-the-art results on LEAF benchmarks in a secure Federated Learning setup with up to 40$\times$ compression in uplink communication with no meaningful loss in utility compared to uncompressed baselines.

privacy

Title: AGAPECert: An Auditable, Generalized, Automated, Privacy-Enabling Certification Framework with Oblivious Smart Contracts. (arXiv:2207.12482v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12482
Code URL: null
Copy Paste: [[2207.12482] AGAPECert: An Auditable, Generalized, Automated, Privacy-Enabling Certification Framework with Oblivious Smart Contracts](http://arxiv.org/abs/2207.12482)
Summary:
This paper introduces AGAPECert, an Auditable, Generalized, Automated, Privacy-Enabling, Certification framework capable of performing auditable computation on private data and reporting real-time aggregate certification status without disclosing underlying private data. AGAPECert utilizes a novel mix of trusted execution environments, blockchain technologies, and a real-time graph-based API standard to provide automated, oblivious, and auditable certification. Our technique allows a privacy-conscious data owner to run pre-approved Oblivious Smart Contract code in their own environment on their own private data to produce Private Automated Certifications. These certifications are verifiable, purely functional transformations of the available data, enabling a third party to trust that the private data must have the necessary properties to produce the resulting certification. Recently, a multitude of solutions for certification and traceability in supply chains have been proposed. These often suffer from significant privacy issues because they tend to take a" shared, replicated database" approach: every node in the network has access to a copy of all relevant data and contract code to guarantee the integrity and reach consensus, even in the presence of malicious nodes. In these contexts of certifications that require global coordination, AGAPECert can include a blockchain to guarantee ordering of events, while keeping a core privacy model where private data is not shared outside of the data owner's own platform. AGAPECert contributes an open-source certification framework that can be adopted in any regulated environment to keep sensitive data private while enabling a trusted automated workflow.

Title: Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning. (arXiv:2207.12831v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12831
Code URL: https://github.com/haiphanNJIT/PrivateDeepLearning
Copy Paste: [[2207.12831] Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning](http://arxiv.org/abs/2207.12831)
Summary:
In this paper, we show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss. Based upon this, we introduce a formal definition of Lifelong DP, in which the participation of any data tuples in the training set of any tasks is protected, under a consistently bounded DP protection, given a growing stream of tasks. A consistently bounded DP means having only one fixed value of the DP privacy budget, regardless of the number of tasks. To preserve Lifelong DP, we propose a scalable and heterogeneous algorithm, called L2DP-ML with a streaming batch training, to efficiently train and continue releasing new versions of an L2M model, given the heterogeneity in terms of data sizes and the training order of tasks, without affecting DP protection of the private training set. An end-to-end theoretical analysis and thorough evaluations show that our mechanism is significantly better than baseline approaches in preserving Lifelong DP. The implementation of L2DP-ML is available at: https://github.com/haiphanNJIT/PrivateDeepLearning.

protect

defense

Title: Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization. (arXiv:2207.13036v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.13036
Code URL: null
Copy Paste: [[2207.13036] Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization](http://arxiv.org/abs/2207.13036)
Summary:
Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is frequently used to improve the robustness of DNNs, which can improve the robustness in training a mixture of corrupted and clean data. However, existing AT based methods are either computationally expensive in generating such adversarial examples, and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable predictions for \textit{transferred adversarial examples} generated to fool a wide spectrum of defense models. In this work, we propose an approach of Jacobian norm with Selective Input Gradient Regularization (J-SIGR), which selectively regularizes gradient-based saliency maps to imitate its interpretable prediction with respect to the input through Jacobian normalization. As such, we achieve the defense of DNNs with both high interpretability and computation efficiency. Finally, we evaluate our method across different architectures against powerful adversarial attacks. Experiments demonstrate that the proposed J-SIGR confers improved robustness against transferred adversarial attacks and shows that the network predictions are easy-interpretable.

attack

Title: Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning. (arXiv:2207.12535v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12535
Code URL: https://github.com/xinleihe/semi-leak
Copy Paste: [[2207.12535] Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning](http://arxiv.org/abs/2207.12535)
Summary:
Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models. State-of-the-art SSL methods can achieve comparable performance to supervised learning by leveraging much fewer labeled data. However, most existing works focus on improving the performance of SSL. In this work, we take a different angle by studying the training data privacy of SSL. Specifically, we propose the first data augmentation-based membership inference attacks against ML models trained by SSL. Given a data sample and the black-box access to a model, the goal of membership inference attack is to determine whether the data sample belongs to the training dataset of the model. Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks and achieves the best performance against the model trained by SSL. Moreover, we uncover that the reason for membership leakage in SSL is different from the commonly believed one in supervised learning, i.e., overfitting (the gap between training and testing accuracy). We observe that the SSL model is well generalized to the testing data (with almost 0 overfitting) but ''memorizes'' the training data by giving a more confident prediction regardless of its correctness. We also explore early stopping as a countermeasure to prevent membership inference attacks against SSL. The results show that early stopping can mitigate the membership inference attack, but with the cost of model's utility degradation.

Title: FRIB: Low-poisoning Rate Invisible Backdoor Attack based on Feature Repair. (arXiv:2207.12863v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12863
Code URL: null
Copy Paste: [[2207.12863] FRIB: Low-poisoning Rate Invisible Backdoor Attack based on Feature Repair](http://arxiv.org/abs/2207.12863)
Summary:
During the generation of invisible backdoor attack poisoned data, the feature space transformation operation tends to cause the loss of some poisoned features and weakens the mapping relationship between source images with triggers and target labels, resulting in the need for a higher poisoning rate to achieve the corresponding backdoor attack success rate. To solve the above problems, we propose the idea of feature repair for the first time and introduce the blind watermark technique to repair the poisoned features lost during the generation of poisoned data. Under the premise of ensuring consistent labeling, we propose a low-poisoning rate invisible backdoor attack based on feature repair, named FRIB. Benefiting from the above design concept, the new method enhances the mapping relationship between the source images with triggers and the target labels, and increases the degree of misleading DNNs, thus achieving a high backdoor attack success rate with a very low poisoning rate. Ultimately, the detailed experimental results show that the goal of achieving a high success rate of backdoor attacks with a very low poisoning rate is achieved on all MNIST, CIFAR10, GTSRB, and ImageNet datasets.

Title: Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis. (arXiv:2207.13064v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.13064
Code URL: null
Copy Paste: [[2207.13064] Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis](http://arxiv.org/abs/2207.13064)
Summary:
As tools for content editing mature, and artificial intelligence (AI) based algorithms for synthesizing media grow, the presence of manipulated content across online media is increasing. This phenomenon causes the spread of misinformation, creating a greater need to distinguish between "real'' and "manipulated'' content. To this end, we present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated). Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face. VideoSham, on the other hand, contains more diverse, context-rich, and human-centric, high-resolution videos manipulated using a combination of 6 different spatial and temporal attacks. Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham. We performed a user study on Amazon Mechanical Turk with 1200 participants to understand if they can differentiate between the real and manipulated videos in VideoSham. Finally, we dig deeper into the strengths and weaknesses of performances by humans and SOTA-algorithms to identify gaps that need to be filled with better AI algorithms.

Title: DeFakePro: Decentralized DeepFake Attacks Detection using ENF Authentication. (arXiv:2207.13070v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.13070
Code URL: null
Copy Paste: [[2207.13070] DeFakePro: Decentralized DeepFake Attacks Detection using ENF Authentication](http://arxiv.org/abs/2207.13070)
Summary:
Advancements in generative models, like Deepfake allows users to imitate a targeted person and manipulate online interactions. It has been recognized that disinformation may cause disturbance in society and ruin the foundation of trust. This article presents DeFakePro, a decentralized consensus mechanism-based Deepfake detection technique in online video conferencing tools. Leveraging Electrical Network Frequency (ENF), an environmental fingerprint embedded in digital media recording, affords a consensus mechanism design called Proof-of-ENF (PoENF) algorithm. The similarity in ENF signal fluctuations is utilized in the PoENF algorithm to authenticate the media broadcasted in conferencing tools. By utilizing the video conferencing setup with malicious participants to broadcast deep fake video recordings to other participants, the DeFakePro system verifies the authenticity of the incoming media in both audio and video channels.

Title: Versatile Weight Attack via Flipping Limited Bits. (arXiv:2207.12405v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12405
Code URL: https://github.com/jiawangbai/versatile-weight-attack
Copy Paste: [[2207.12405] Versatile Weight Attack via Flipping Limited Bits](http://arxiv.org/abs/2207.12405)
Summary:
To explore the vulnerability of deep neural networks (DNNs), many attack paradigms have been well studied, such as the poisoning-based backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper, we study a novel attack paradigm, which modifies model parameters in the deployment stage. Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack, where the effectiveness term could be customized depending on the attacker's purpose. Furthermore, we present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA). To this end, we formulate this problem as a mixed integer programming (MIP) to jointly determine the state of the binary bits (0 or 1) in the memory and learn the sample modification. Utilizing the latest technique in integer programming, we equivalently reformulate this MIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method. Consequently, the flipped critical bits can be easily determined through optimization, rather than using a heuristic strategy. Extensive experiments demonstrate the superiority of SSA and TSA in attacking DNNs.

Title: Coronavirus disease situation analysis and prediction using machine learning: a study on Bangladeshi population. (arXiv:2207.13056v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.13056
Code URL: null
Copy Paste: [[2207.13056] Coronavirus disease situation analysis and prediction using machine learning: a study on Bangladeshi population](http://arxiv.org/abs/2207.13056)
Summary:
During a pandemic, early prognostication of patient infected rates can reduce the death by ensuring treatment facility and proper resource allocation. In recent months, the number of death and infected rates has increased more distinguished than before in Bangladesh. The country is struggling to provide moderate medical treatment to many patients. This study distinguishes machine learning models and creates a prediction system to anticipate the infected and death rate for the coming days. Equipping a dataset with data from March 1, 2020, to August 10, 2021, a multi-layer perceptron (MLP) model was trained. The data was managed from a trusted government website and concocted manually for training purposes. Several test cases determine the model's accuracy and prediction capability. The comparison between specific models assumes that the MLP model has more reliable prediction capability than the support vector regression (SVR) and linear regression model. The model presents a report about the risky situation and impending coronavirus disease (COVID-19) attack. According to the prediction produced by the model, Bangladesh may suffer another COVID-19 attack, where the number of infected cases can be between 929 to 2443 and death cases between 19 to 57.

Title: $p$-DkNN: Out-of-Distribution Detection Through Statistical Testing of Deep Representations. (arXiv:2207.12545v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12545
Code URL: null
Copy Paste: [[2207.12545] $p$-DkNN: Out-of-Distribution Detection Through Statistical Testing of Deep Representations](http://arxiv.org/abs/2207.12545)
Summary:
The lack of well-calibrated confidence estimates makes neural networks inadequate in safety-critical domains such as autonomous driving or healthcare. In these settings, having the ability to abstain from making a prediction on out-of-distribution (OOD) data can be as important as correctly classifying in-distribution data. We introduce $p$-DkNN, a novel inference procedure that takes a trained deep neural network and analyzes the similarity structures of its intermediate hidden representations to compute $p$-values associated with the end-to-end model prediction. The intuition is that statistical tests performed on latent representations can serve not only as a classifier, but also offer a statistically well-founded estimation of uncertainty. $p$-DkNN is scalable and leverages the composition of representations learned by hidden layers, which makes deep representation learning successful. Our theoretical analysis builds on Neyman-Pearson classification and connects it to recent advances in selective classification (reject option). We demonstrate advantageous trade-offs between abstaining from predicting on OOD inputs and maintaining high accuracy on in-distribution inputs. We find that $p$-DkNN forces adaptive attackers crafting adversarial examples, a form of worst-case OOD inputs, to introduce semantically meaningful changes to the inputs.

robust

Title: Large-displacement 3D Object Tracking with Hybrid Non-local Optimization. (arXiv:2207.12620v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12620
Code URL: https://github.com/cvbubbles/nonlocal-3dtracking
Copy Paste: [[2207.12620] Large-displacement 3D Object Tracking with Hybrid Non-local Optimization](http://arxiv.org/abs/2207.12620)
Summary:
Optimization-based 3D object tracking is known to be precise and fast, but sensitive to large inter-frame displacements. In this paper we propose a fast and effective non-local 3D tracking method. Based on the observation that erroneous local minimum are mostly due to the out-of-plane rotation, we propose a hybrid approach combining non-local and local optimizations for different parameters, resulting in efficient non-local search in the 6D pose space. In addition, a precomputed robust contour-based tracking method is proposed for the pose optimization. By using long search lines with multiple candidate correspondences, it can adapt to different frame displacements without the need of coarse-to-fine search. After the pre-computation, pose updates can be conducted very fast, enabling the non-local optimization to run in real time. Our method outperforms all previous methods for both small and large displacements. For large displacements, the accuracy is greatly improved ($81.7\% \;\text{v.s.}\; 19.4\%$). At the same time, real-time speed ($>$50fps) can be achieved with only CPU. The source code is available at \url{https://github.com/cvbubbles/nonlocal-3dtracking}.

Title: Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering. (arXiv:2207.12647v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12647
Code URL: null
Copy Paste: [[2207.12647] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering](http://arxiv.org/abs/2207.12647)
Summary:
Existing visual question answering methods tend to capture the spurious correlations from visual and linguistic modalities, and fail to discover the true casual mechanism that facilitates reasoning truthfully based on the dominant visual evidence and the correct question intention. Additionally, the existing methods usually ignore the complex event-level understanding in multi-modal settings that requires a strong cognitive capability of causal inference to jointly model cross-modal event temporality, causality, and dynamics. In this work, we focus on event-level visual question answering from a new perspective, i.e., cross-modal causal relational reasoning, by introducing causal intervention methods to mitigate the spurious correlations and discover the true causal structures for the integration of visual and linguistic modalities. Specifically, we propose a novel event-level visual question answering framework named Cross-Modal Causal RelatIonal Reasoning (CMCIR), to achieve robust casuality-aware visual-linguistic question answering. To uncover the causal structures for visual and linguistic modalities, the novel Causality-aware Visual-Linguistic Reasoning (CVLR) module is proposed to collaboratively disentangle the visual and linguistic spurious correlations via elaborately designed front-door and back-door causal intervention modules. To discover the fine-grained interactions between linguistic semantics and spatial-temporal representations, we build a novel Spatial-Temporal Transformer (STT) that builds the multi-modal co-occurrence interactions between visual and linguistic content. Extensive experiments on large-scale event-level urban dataset SUTD-TrafficQA and three benchmark real-world datasets TGIF-QA, MSVD-QA, and MSRVTT-QA demonstrate the effectiveness of our CMCIR for discovering visual-linguistic causal structures.

Title: ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection. (arXiv:2207.12654v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12654
Code URL: https://github.com/yinjunbo/proposalcontrast
Copy Paste: [[2207.12654] ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection](http://arxiv.org/abs/2207.12654)
Summary:
Existing approaches for unsupervised point cloud pre-training are constrained to either scene-level or point/voxel-level instance discrimination. Scene-level methods tend to lose local details that are crucial for recognizing the road objects, while point/voxel-level methods inherently suffer from limited receptive field that is incapable of perceiving large objects or context environments. Considering region-level representations are more suitable for 3D object detection, we devise a new unsupervised point cloud pre-training framework, called ProposalContrast, that learns robust 3D representations by contrasting region proposals. Specifically, with an exhaustive set of region proposals sampled from each point cloud, geometric point relations within each proposal are modeled for creating expressive proposal representations. To better accommodate 3D detection properties, ProposalContrast optimizes with both inter-cluster and inter-proposal separation, i.e., sharpening the discriminativeness of proposal representations across semantic classes and object instances. The generalizability and transferability of ProposalContrast are verified on various 3D detectors (i.e., PV-RCNN, CenterPoint, PointPillars and PointRCNN) and datasets (i.e., KITTI, Waymo and ONCE).

Title: A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks. (arXiv:2207.12687v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12687
Code URL: null
Copy Paste: [[2207.12687] A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks](http://arxiv.org/abs/2207.12687)
Summary:
3D shapes provide substantially more information than 2D images. However, the acquisition of 3D shapes is sometimes very difficult or even impossible in comparison with acquiring 2D images, making it necessary to derive the 3D shape from 2D images. Although this is, in general, a mathematically ill-posed problem, it might be solved by constraining the problem formulation using prior information. Here, we present a new approach based on Kendall's shape space to reconstruct 3D shapes from single monocular 2D images. The work is motivated by an application to study the feeding behavior of the basking shark, an endangered species whose massive size and mobility render 3D shape data nearly impossible to obtain, hampering understanding of their feeding behaviors and ecology. 2D images of these animals in feeding position, however, are readily available. We compare our approach with state-of-the-art shape-based approaches, both on human stick models and on shark head skeletons. Using a small set of training shapes, we show that the Kendall shape space approach is substantially more robust than previous methods and results in plausible shapes. This is essential for the motivating application in which specimens are rare and therefore only few training shapes are available.

Title: Unsupervised Domain Adaptation for Video Transformers in Action Recognition. (arXiv:2207.12842v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12842
Code URL: null
Copy Paste: [[2207.12842] Unsupervised Domain Adaptation for Video Transformers in Action Recognition](http://arxiv.org/abs/2207.12842)
Summary:
Over the last few years, Unsupervised Domain Adaptation (UDA) techniques have acquired remarkable importance and popularity in computer vision. However, when compared to the extensive literature available for images, the field of videos is still relatively unexplored. On the other hand, the performance of a model in action recognition is heavily affected by domain shift. In this paper, we propose a simple and novel UDA approach for video action recognition. Our approach leverages recent advances on spatio-temporal transformers to build a robust source model that better generalises to the target domain. Furthermore, our architecture learns domain invariant features thanks to the introduction of a novel alignment loss term derived from the Information Bottleneck principle. We report results on two video action recognition benchmarks for UDA, showing state-of-the-art performance on HMDB$\leftrightarrow$UCF, as well as on Kinetics$\rightarrow$NEC-Drone, which is more challenging. This demonstrates the effectiveness of our method in handling different levels of domain shift. The source code is available at https://github.com/vturrisi/UDAVT.

Title: Robust and Efficient Segmentation of Cross-domain Medical Images. (arXiv:2207.12995v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12995
Code URL: null
Copy Paste: [[2207.12995] Robust and Efficient Segmentation of Cross-domain Medical Images](http://arxiv.org/abs/2207.12995)
Summary:
Efficient medical image segmentation aims to provide accurate pixel-wise prediction for the medical images with the lightweight implementation framework. However, lightweight frameworks generally fail to achieve high performance, and suffer from the poor generalizable ability on cross-domain tasks.In this paper, we propose a generalizable knowledge distillation method for robust and efficient segmentation of cross-domain medical images. Primarily, we propose the Model-Specific Alignment Networks (MSAN) to provide the domain-invariant representations which are regularized by a Pre-trained Semantic AutoEncoder (P-SAE). Meanwhile, a customized Alignment Consistency Training (ACT) strategy is designed to promote the MSAN training. With the domain-invariant representative vectors in MSAN, we propose two generalizable knowledge distillation schemes, Dual Contrastive Graph Distillation (DCGD) and Domain-Invariant Cross Distillation (DICD). Specifically, in DCGD, two types of implicit contrastive graphs are designed to represent the intra-coupling and inter-coupling semantic correlations from the perspective of data distribution. In DICD, the domain-invariant semantic vectors from the two models (i.e., teacher and student) are leveraged to cross-reconstruct features by the header exchange of MSAN, which achieves generalizable improvement for both the encoder and decoder in the student model. Furthermore, a metric named Fr\'echet Semantic Distance (FSD) is tailored to verify the effectiveness of the regularized domain-invariant features. Extensive experiments conducted on the Liver and Retinal Vessel Segmentation datasets demonstrate the priority of our method, in terms of performance and generalization on lightweight frameworks.

Title: NewsStories: Illustrating articles with visual summaries. (arXiv:2207.13061v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.13061
Code URL: https://github.com/newsstoriesdata/newsstories.github.io
Copy Paste: [[2207.13061] NewsStories: Illustrating articles with visual summaries](http://arxiv.org/abs/2207.13061)
Summary:
Recent self-supervised approaches have used large-scale image-text datasets to learn powerful representations that transfer to many tasks without finetuning. These methods often assume that there is one-to-one correspondence between its images and their (short) captions. However, many tasks require reasoning about multiple images and long text narratives, such as describing news articles with visual summaries. Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images. In addition, unlike prior work which assumed captions have a literal relation to the image, we assume images only contain loose illustrative correspondence with the text. To explore this problem, we introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos. We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images. Finally, we introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.

Title: Controllable User Dialogue Act Augmentation for Dialogue State Tracking. (arXiv:2207.12757v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.12757
Code URL: https://github.com/miulab/cuda-dst
Copy Paste: [[2207.12757] Controllable User Dialogue Act Augmentation for Dialogue State Tracking](http://arxiv.org/abs/2207.12757)
Summary:
Prior work has demonstrated that data augmentation is useful for improving dialogue state tracking. However, there are many types of user utterances, while the prior method only considered the simplest one for augmentation, raising the concern about poor generalization capability. In order to better cover diverse dialogue acts and control the generation quality, this paper proposes controllable user dialogue act augmentation (CUDA-DST) to augment user utterances with diverse behaviors. With the augmented data, different state trackers gain improvement and show better robustness, achieving the state-of-the-art performance on MultiWOZ 2.1

Title: Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids. (arXiv:2207.12478v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12478
Code URL: null
Copy Paste: [[2207.12478] Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids](http://arxiv.org/abs/2207.12478)
Summary:
Plasma is defined as the fourth state of matter and non-thermal plasma can be produced at atmospheric pressure under a high electrical field. The strong and broad-spectrum antimicrobial effect of plasma-activated liquids (PALs) is now well known. The proven applicability of machine learning (ML) in the medical field is encouraging for its application in the field of plasma medicine as well. Thus, ML applications on PALs could present a new perspective to better understand the influences of various parameters on their antimicrobial effects. In this paper, comparative supervised ML models are presented by using previously obtained data to qualitatively predict the in vitro antimicrobial activity of PALs. A literature search was performed and data is collected from 33 relevant articles. After the required preprocessing steps, two supervised ML methods, namely classification, and regression are applied to data to obtain microbial inactivation (MI) predictions. For classification, MI is labeled in four categories and for regression, MI is used as a continuous variable. Two different robust cross-validation strategies are conducted for classification and regression models to evaluate the proposed method; repeated stratified k-fold cross-validation and k-fold cross-validation, respectively. We also investigate the effect of different features on models. The results demonstrated that the hyperparameter-optimized Random Forest Classifier (oRFC) and Random Forest Regressor (oRFR) provided better results than other models for the classification and regression, respectively. Finally, the best test accuracy of 82.68% for oRFC and R2 of 0.75 for the oRFR are obtained. ML techniques could contribute to a better understanding of plasma parameters that have a dominant role in the desired antimicrobial effect. Furthermore, such findings may contribute to the definition of a plasma dose in the future.

Title: The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning. (arXiv:2207.12546v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12546
Code URL: https://github.com/ihmegroup/lossy_ml
Copy Paste: [[2207.12546] The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning](http://arxiv.org/abs/2207.12546)
Summary:
In general, large datasets enable deep learning models to perform with good accuracy and generalizability. However, massive high-fidelity simulation datasets (from molecular chemistry, astrophysics, computational fluid dynamics (CFD), etc. can be challenging to curate due to dimensionality and storage constraints. Lossy compression algorithms can help mitigate limitations from storage, as long as the overall data fidelity is preserved. To illustrate this point, we demonstrate that deep learning models, trained and tested on data from a petascale CFD simulation, are robust to errors introduced during lossy compression in a semantic segmentation problem. Our results demonstrate that lossy compression algorithms offer a realistic pathway for exposing high-fidelity scientific data to open-source data repositories for building community datasets. In this paper, we outline, construct, and evaluate the requirements for establishing a big data framework, demonstrated at https://blastnet.github.io/, for scientific machine learning.

Title: Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety. (arXiv:2207.12615v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12615
Code URL: null
Copy Paste: [[2207.12615] Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety](http://arxiv.org/abs/2207.12615)
Summary:
While directly fine-tuning (FT) large-scale, pretrained models on task-specific data is well-known to induce strong in-distribution task performance, recent works have demonstrated that different adaptation protocols, such as linear probing (LP) prior to FT, can improve out-of-distribution generalization. However, the design space of such adaptation protocols remains under-explored and the evaluation of such protocols has primarily focused on distribution shifts. Therefore, in this work, we evaluate common adaptation protocols across distributions shifts and machine learning safety metrics (e.g., anomaly detection, calibration, robustness to corruptions). We find that protocols induce disparate trade-offs that were not apparent from prior evaluation. Further, we demonstrate that appropriate pairing of data augmentation and protocol can substantially mitigate this trade-off. Finally, we hypothesize and empirically see that using hardness-promoting augmentations during LP and then FT with augmentations may be particularly effective for trade-off mitigation.

Title: Variational multiscale reinforcement learning for discovering reduced order closure models of nonlinear spatiotemporal transport systems. (arXiv:2207.12854v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12854
Code URL: null
Copy Paste: [[2207.12854] Variational multiscale reinforcement learning for discovering reduced order closure models of nonlinear spatiotemporal transport systems](http://arxiv.org/abs/2207.12854)
Summary:
A central challenge in the computational modeling and simulation of a multitude of science applications is to achieve robust and accurate closures for their coarse-grained representations due to underlying highly nonlinear multiscale interactions. These closure models are common in many nonlinear spatiotemporal systems to account for losses due to reduced order representations, including many transport phenomena in fluids. Previous data-driven closure modeling efforts have mostly focused on supervised learning approaches using high fidelity simulation data. On the other hand, reinforcement learning (RL) is a powerful yet relatively uncharted method in spatiotemporally extended systems. In this study, we put forth a modular dynamic closure modeling and discovery framework to stabilize the Galerkin projection based reduced order models that may arise in many nonlinear spatiotemporal dynamical systems with quadratic nonlinearity. However, a key element in creating a robust RL agent is to introduce a feasible reward function, which can be constituted of any difference metrics between the RL model and high fidelity simulation data. First, we introduce a multi-modal RL (MMRL) to discover mode-dependant closure policies that utilize the high fidelity data in rewarding our RL agent. We then formulate a variational multiscale RL (VMRL) approach to discover closure models without requiring access to the high fidelity data in designing the reward function. Specifically, our chief innovation is to leverage variational multiscale formalism to quantify the difference between modal interactions in Galerkin systems. Our results in simulating the viscous Burgers equation indicate that the proposed VMRL method leads to robust and accurate closure parameterizations, and it may potentially be used to discover scale-aware closure models for complex dynamical systems.

Title: Efficient Algorithms for Sparse Moment Problems without Separation. (arXiv:2207.13008v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.13008
Code URL: null
Copy Paste: [[2207.13008] Efficient Algorithms for Sparse Moment Problems without Separation](http://arxiv.org/abs/2207.13008)
Summary:
We consider the sparse moment problem of learning a $k$-spike mixture in high dimensional space from its noisy moment information in any dimension. We measure the accuracy of the learned mixtures using transportation distance. Previous algorithms either assume certain separation assumptions, use more recovery moments, or run in (super) exponential time. Our algorithm for the 1-dimension problem (also called the sparse Hausdorff moment problem) is a robust version of the classic Prony's method, and our contribution mainly lies in the analysis. We adopt a global and much tighter analysis than previous work (which analyzes the perturbation of the intermediate results of Prony's method). A useful technical ingredient is a connection between the linear system defined by the Vandermonde matrix and the Schur polynomial, which allows us to provide tight perturbation bound independent of the separation and may be useful in other contexts. To tackle the high dimensional problem, we first solve the 2-dimensional problem by extending the 1-dimension algorithm and analysis to complex numbers. Our algorithm for the high dimensional case determines the coordinates of each spike by aligning a 1-d projection of the mixture to a random vector and a set of 2d-projections of the mixture. Our results have applications to learning topic models and Gaussian mixtures, implying improved sample complexity results or running time over prior work.

biometric

steal

extraction

Title: TGCF: Texture guided color fusion for impressionism oil painting style rendering. (arXiv:2207.12585v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12585
Code URL: null
Copy Paste: [[2207.12585] TGCF: Texture guided color fusion for impressionism oil painting style rendering](http://arxiv.org/abs/2207.12585)
Summary:
As a major branch of Non-Photorealistic Rendering (NPR), image stylization mainly uses the computer algorithms to render a photo into an artistic painting. Recent work has shown that the extraction of style information such as stroke texture and color of the target style image is the key to image stylization. Given its stroke texture and color characteristics, a new stroke rendering method is proposed, which fully considers the tonal characteristics and the representative color of the original oil painting, in order to fit the tone of the original oil painting image into the stylized image and make it close to the artist's creative effect. The experiments have validated the efficacy of the proposed model. This method would be more suitable for the works of pointillism painters with a relatively uniform sense of direction, especially for natural scenes. When the original painting brush strokes have a clearer sense of direction, using this method to simulate brushwork texture features can be less satisfactory.

Title: Generative Extraction of Audio Classifiers for Speaker Identification. (arXiv:2207.12816v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.12816
Code URL: null
Copy Paste: [[2207.12816] Generative Extraction of Audio Classifiers for Speaker Identification](http://arxiv.org/abs/2207.12816)
Summary:
It is perhaps no longer surprising that machine learning models, especially deep neural networks, are particularly vulnerable to attacks. One such vulnerability that has been well studied is model extraction: a phenomenon in which the attacker attempts to steal a victim's model by training a surrogate model to mimic the decision boundaries of the victim model. Previous works have demonstrated the effectiveness of such an attack and its devastating consequences, but much of this work has been done primarily for image and text processing tasks. Our work is the first attempt to perform model extraction on {\em audio classification models}. We are motivated by an attacker whose goal is to mimic the behavior of the victim's model trained to identify a speaker. This is particularly problematic in security-sensitive domains such as biometric authentication. We find that prior model extraction techniques, where the attacker \textit{naively} uses a proxy dataset to attack a potential victim's model, fail. We therefore propose the use of a generative model to create a sufficiently large and diverse pool of synthetic attack queries. We find that our approach is able to extract a victim's model trained on \texttt{LibriSpeech} using queries synthesized with a proxy dataset based off of \texttt{VoxCeleb}; we achieve a test accuracy of 84.41\% with a budget of 3 million queries.

Title: From Interpretable Filters to Predictions of Convolutional Neural Networks with Explainable Artificial Intelligence. (arXiv:2207.12958v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12958
Code URL: null
Copy Paste: [[2207.12958] From Interpretable Filters to Predictions of Convolutional Neural Networks with Explainable Artificial Intelligence](http://arxiv.org/abs/2207.12958)
Summary:
Convolutional neural networks (CNN) are known for their excellent feature extraction capabilities to enable the learning of models from data, yet are used as black boxes. An interpretation of the convolutional filtres and associated features can help to establish an understanding of CNN to distinguish various classes. In this work, we focus on the explainability of a CNN model called as cnnexplain that is used for Covid-19 and non-Covid-19 classification with a focus on the interpretability of features by the convolutional filters, and how these features contribute to classification. Specifically, we have used various explainable artificial intelligence (XAI) methods, such as visualizations, SmoothGrad, Grad-CAM, and LIME to provide interpretation of convolutional filtres, and relevant features, and their role in classification. We have analyzed the explanation of these methods for Covid-19 detection using dry cough spectrograms. Explanation results obtained from the LIME, SmoothGrad, and Grad-CAM highlight important features of different spectrograms and their relevance to classification.

membership infer

federate

fair

Title: Innovations in Neural Data-to-text Generation. (arXiv:2207.12571v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.12571
Code URL: null
Copy Paste: [[2207.12571] Innovations in Neural Data-to-text Generation](http://arxiv.org/abs/2207.12571)
Summary:
The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella. With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability.

Title: Estimating and Controlling for Fairness via Sensitive Attribute Predictors. (arXiv:2207.12497v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12497
Code URL: null
Copy Paste: [[2207.12497] Estimating and Controlling for Fairness via Sensitive Attribute Predictors](http://arxiv.org/abs/2207.12497)
Summary:
Although machine learning classifiers have been increasingly used in high-stakes decision making (e.g., cancer diagnosis, criminal prosecution decisions), they have demonstrated biases against underrepresented groups. Standard definitions of fairness require access to sensitive attributes of interest (e.g., gender and race), which are often unavailable. In this work we demonstrate that in these settings where sensitive attributes are unknown, one can still reliably estimate and ultimately control for fairness by using proxy sensitive attributes derived from a sensitive attribute predictor. Specifically, we first show that with just a little knowledge of the complete data distribution, one may use a sensitive attribute predictor to obtain upper and lower bounds of the classifier's true fairness metric. Second, we demonstrate how one can provably control for fairness with respect to the true sensitive attributes by controlling for fairness with respect to the proxy sensitive attributes. Our results hold under assumptions that are significantly milder than previous works. We illustrate our results on a series of synthetic and real datasets.

Title: Benchmark time series data sets for PyTorch -- the torchtime package. (arXiv:2207.12503v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.12503
Code URL: null
Copy Paste: [[2207.12503] Benchmark time series data sets for PyTorch -- the torchtime package](http://arxiv.org/abs/2207.12503)
Summary:
The development of models for Electronic Health Record data is an area of active research featuring a small number of public benchmark data sets. Researchers typically write custom data processing code but this hinders reproducibility and can introduce errors. The Python package torchtime provides reproducible implementations of commonly used PhysioNet and UEA & UCR time series classification repository data sets for PyTorch. Features are provided for working with irregularly sampled and partially observed time series of unequal length. It aims to simplify access to PhysioNet data and enable fair comparisons of models in this exciting area of research.

interpretability

Title: Explaining Deep Neural Networks for Point Clouds using Gradient-based Visualisations. (arXiv:2207.12984v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.12984
Code URL: null
Copy Paste: [[2207.12984] Explaining Deep Neural Networks for Point Clouds using Gradient-based Visualisations](http://arxiv.org/abs/2207.12984)
Summary:
Explaining decisions made by deep neural networks is a rapidly advancing research topic. In recent years, several approaches have attempted to provide visual explanations of decisions made by neural networks designed for structured 2D image input data. In this paper, we propose a novel approach to generate coarse visual explanations of networks designed to classify unstructured 3D data, namely point clouds. Our method uses gradients flowing back to the final feature map layers and maps these values as contributions of the corresponding points in the input point cloud. Due to dimensionality disagreement and lack of spatial consistency between input points and final feature maps, our approach combines gradients with points dropping to compute explanations of different parts of the point cloud iteratively. The generality of our approach is tested on various point cloud classification networks, including 'single object' networks PointNet, PointNet++, DGCNN, and a 'scene' network VoteNet. Our method generates symmetric explanation maps that highlight important regions and provide insight into the decision-making process of network architectures. We perform an exhaustive evaluation of trust and interpretability of our explanation method against comparative approaches using quantitative, quantitative and human studies. All our code is implemented in PyTorch and will be made publicly available.

Title: Advanced Conditional Variational Autoencoders (A-CVAE): Towards interpreting open-domain conversation generation via disentangling latent feature representation. (arXiv:2207.12696v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.12696
Code URL: null
Copy Paste: [[2207.12696] Advanced Conditional Variational Autoencoders (A-CVAE): Towards interpreting open-domain conversation generation via disentangling latent feature representation](http://arxiv.org/abs/2207.12696)
Summary:
Currently end-to-end deep learning based open-domain dialogue systems remain black box models, making it easy to generate irrelevant contents with data-driven models. Specifically, latent variables are highly entangled with different semantics in the latent space due to the lack of priori knowledge to guide the training. To address this problem, this paper proposes to harness the generative model with a priori knowledge through a cognitive approach involving mesoscopic scale feature disentanglement. Particularly, the model integrates the macro-level guided-category knowledge and micro-level open-domain dialogue data for the training, leveraging the priori knowledge into the latent space, which enables the model to disentangle the latent variables within the mesoscopic scale. Besides, we propose a new metric for open-domain dialogues, which can objectively evaluate the interpretability of the latent space distribution. Finally, we validate our model on different datasets and experimentally demonstrate that our model is able to generate higher quality and more interpretable dialogues than other models.

Title: Equivariant and Invariant Grounding for Video Question Answering. (arXiv:2207.12783v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.12783
Code URL: null
Copy Paste: [[2207.12783] Equivariant and Invariant Grounding for Video Question Answering](http://arxiv.org/abs/2207.12783)
Summary:
Video Question Answering (VideoQA) is the task of answering the natural language questions about a video. Producing an answer requires understanding the interplay across visual scenes in video and linguistic semantics in question. However, most leading VideoQA models work as black boxes, which make the visual-linguistic alignment behind the answering process obscure. Such black-box nature calls for visual explainability that reveals ``What part of the video should the model look at to answer the question?''. Only a few works present the visual explanations in a post-hoc fashion, which emulates the target model's answering process via an additional method. Nonetheless, the emulation struggles to faithfully exhibit the visual-linguistic alignment during answering.

Instead of post-hoc explainability, we focus on intrinsic interpretability to make the answering process transparent. At its core is grounding the question-critical cues as the causal scene to yield answers, while rolling out the question-irrelevant information as the environment scene. Taking a causal look at VideoQA, we devise a self-interpretable framework, Equivariant and Invariant Grounding for Interpretable VideoQA (EIGV). Specifically, the equivariant grounding encourages the answering to be sensitive to the semantic changes in the causal scene and question; in contrast, the invariant grounding enforces the answering to be insensitive to the changes in the environment scene. By imposing them on the answering process, EIGV is able to distinguish the causal scene from the environment information, and explicitly present the visual-linguistic alignment. Extensive experiments on three benchmark datasets justify the superiority of EIGV in terms of accuracy and visual interpretability over the leading baselines.

Title: Is Attention Interpretation? A Quantitative Assessment On Sets. (arXiv:2207.13018v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.13018
Code URL: null
Copy Paste: [[2207.13018] Is Attention Interpretation? A Quantitative Assessment On Sets](http://arxiv.org/abs/2207.13018)
Summary:
The debate around the interpretability of attention mechanisms is centered on whether attention scores can be used as a proxy for the relative amounts of signal carried by sub-components of data. We propose to study the interpretability of attention in the context of set machine learning, where each data point is composed of an unordered collection of instances with a global label. For classical multiple-instance-learning problems and simple extensions, there is a well-defined "importance" ground truth that can be leveraged to cast interpretation as a binary classification problem, which we can quantitatively evaluate. By building synthetic datasets over several data modalities, we perform a systematic assessment of attention-based interpretations. We find that attention distributions are indeed often reflective of the relative importance of individual instances, but that silent failures happen where a model will have high classification performance but attention patterns that do not align with expectations. Based on these observations, we propose to use ensembling to minimize the risk of misleading attention-based explanations.