secure

security

Title: On the Feasibility of Unclonable Encryption, and More. (arXiv:2207.06589v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.06589
Code URL: null
Copy Paste: [[2207.06589] On the Feasibility of Unclonable Encryption, and More](http://arxiv.org/abs/2207.06589)
Summary:
Unclonable encryption, first introduced by Broadbent and Lord (TQC'20), is a one-time encryption scheme with the following security guarantee: any non-local adversary (A, B, C) cannot simultaneously distinguish encryptions of two equal length messages. This notion is termed as unclonable indistinguishability. Prior works focused on achieving a weaker notion of unclonable encryption, where we required that any non-local adversary (A, B, C) cannot simultaneously recover the entire message m. Seemingly innocuous, understanding the feasibility of encryption schemes satisfying unclonable indistinguishability (even for 1-bit messages) has remained elusive.

We make progress towards establishing the feasibility of unclonable encryption.

- We show that encryption schemes satisfying unclonable indistinguishability exist unconditionally in the quantum random oracle model.

- Towards understanding the necessity of oracles, we present a negative result stipulating that a large class of encryption schemes cannot satisfy unclonable indistinguishability.

- Finally, we also establish the feasibility of another closely related primitive: copy-protection for single-bit output point functions. Prior works only established the feasibility of copy-protection for multi-bit output point functions or they achieved constant security error for single-bit output point functions.

Title: Artificial Dust Based Attack Modelling: A Threat to the Security of Next Generation WCN. (arXiv:2207.06683v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.06683
Code URL: null
Copy Paste: [[2207.06683] Artificial Dust Based Attack Modelling: A Threat to the Security of Next Generation WCN](http://arxiv.org/abs/2207.06683)
Summary:
This paper introduces a systematic and novel mechanism for devising a security attack in the WCN (Wireless Communication Network). The proposed model involves the implementation of the AD (Artificial Dust) by the intruder, followed by the execution of the HD (Half-Duplex) attack. The communication network is based on the deployment of urban and rural scenarios with an unknown CSI (Channel State Information). Depending on the achieved path loss based on the distance of the user from the BS, the user with the highest path loss is particularized for the attack. The formulation of AD divulges the increased susceptibilities of the secure network specifically for the selected legitimate user. The parameter of visibility defines the amount of AD present in the communication channel. Based on the enumerated attenuation created by the artificial dust, the parameter of secrecy rate is evaluated with varying distance of the user from the BS and the operating frequency. Furthermore, the proposed scheme of the HD attack is initiated by the intruder at the specified valid user. The strategy of the attack focuses on the continuous monitor of the uplink and attempts the spoofing attack on the downlink wherein the allocation of the resources takes place. The efficacy of the proposed approach is corroborated through the examination of simulation results. The assessment of the proposed mechanism highlights notable characteristics as compared to the conventional methodology of the FD (Full- Duplex) attack.

privacy

Title: Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank. (arXiv:2207.06944v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.06944
Code URL: null
Copy Paste: [[2207.06944] Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank](http://arxiv.org/abs/2207.06944)
Summary:
Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data.

In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms.

protect

defense

Title: PIAT: Physics Informed Adversarial Training for Solving Partial Differential Equations. (arXiv:2207.06647v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06647
Code URL: https://github.com/rohban-lab/piat
Copy Paste: [[2207.06647] PIAT: Physics Informed Adversarial Training for Solving Partial Differential Equations](http://arxiv.org/abs/2207.06647)
Summary:
In this paper, we propose the physics informed adversarial training (PIAT) of neural networks for solving nonlinear differential equations (NDE). It is well-known that the standard training of neural networks results in non-smooth functions. Adversarial training (AT) is an established defense mechanism against adversarial attacks, which could also help in making the solution smooth. AT include augmenting the training mini-batch with a perturbation that makes the network output mismatch the desired output adversarially. Unlike formal AT, which relies only on the training data, here we encode the governing physical laws in the form of nonlinear differential equations using automatic differentiation in the adversarial network architecture. We compare PIAT with PINN to indicate the effectiveness of our method in solving NDEs for up to 10 dimensions. Moreover, we propose weight decay and Gaussian smoothing to demonstrate the PIAT advantages. The code repository is available at https://github.com/rohban-lab/PIAT.

attack

Title: Adversarial Attacks on Monocular Pose Estimation. (arXiv:2207.07032v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.07032
Code URL: https://github.com/neurai-lab/mono-pose-attack
Copy Paste: [[2207.07032] Adversarial Attacks on Monocular Pose Estimation](http://arxiv.org/abs/2207.07032)
Summary:
Advances in deep learning have resulted in steady progress in computer vision with improved accuracy on tasks such as object detection and semantic segmentation. Nevertheless, deep neural networks are vulnerable to adversarial attacks, thus presenting a challenge in reliable deployment. Two of the prominent tasks in 3D scene-understanding for robotics and advanced drive assistance systems are monocular depth and pose estimation, often learned together in an unsupervised manner. While studies evaluating the impact of adversarial attacks on monocular depth estimation exist, a systematic demonstration and analysis of adversarial perturbations against pose estimation are lacking. We show how additive imperceptible perturbations can not only change predictions to increase the trajectory drift but also catastrophically alter its geometry. We also study the relation between adversarial perturbations targeting monocular depth and pose estimation networks, as well as the transferability of perturbations to other networks with different architectures and losses. Our experiments show how the generated perturbations lead to notable errors in relative rotation and translation predictions and elucidate vulnerabilities of the networks.

Title: Behavioral Model For Live Detection of Apps Based Attack. (arXiv:2207.06686v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2207.06686
Code URL: null
Copy Paste: [[2207.06686] Behavioral Model For Live Detection of Apps Based Attack](http://arxiv.org/abs/2207.06686)
Summary:
Smartphones with the platforms of applications are gaining extensive attention and popularity. The enormous use of different applications has paved the way to numerous security threats. The threats are in the form of attacks such as permission control attacks, phishing attacks, spyware attacks, botnets, malware attacks, privacy leakage attacks. Moreover, other vulnerabilities include invalid authorization of apps, compromise on the confidentiality of data, invalid access control. In this paper, an application-based attack modeling and attack detection is proposed. Due to A novel attack vulnerability is identified based on the app execution on the smartphone. The attack modeling involves an end-user vulnerable application to initiate an attack. The vulnerable application is installed at the background end on the smartphone with hidden visibility from the end-user. Thereby, accessing the confidential information. The detection model involves the proposed technique of an Application-based Behavioral Model Analysis (ABMA) scheme to address the attack model. The model incorporates application-based comparative parameter analysis to perform the process of intrusion detection. The ABMA is estimated by using the parameters of power, battery level, and the data usage. Based on the source internet accessibility, the analysis is performed using three different configurations as, WiFi, mobile data, and the combination of the two. The simulation results verify and demonstrates the effectiveness of the proposed model.

Title: Anomal-E: A Self-Supervised Network Intrusion Detection System based on Graph Neural Networks. (arXiv:2207.06819v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06819
Code URL: null
Copy Paste: [[2207.06819] Anomal-E: A Self-Supervised Network Intrusion Detection System based on Graph Neural Networks](http://arxiv.org/abs/2207.06819)
Summary:
This paper investigates Graph Neural Networks (GNNs) application for self-supervised network intrusion and anomaly detection. GNNs are a deep learning approach for graph-based data that incorporate graph structures into learning to generalise graph representations and output embeddings. As network flows are naturally graph-based, GNNs are a suitable fit for analysing and learning network behaviour. The majority of current implementations of GNN-based Network Intrusion Detection Systems (NIDSs) rely heavily on labelled network traffic which can not only restrict the amount and structure of input traffic, but also the NIDSs potential to adapt to unseen attacks. To overcome these restrictions, we present Anomal-E, a GNN approach to intrusion and anomaly detection that leverages edge features and graph topological structure in a self-supervised process. This approach is, to the best our knowledge, the first successful and practical approach to network intrusion detection that utilises network flows in a self-supervised, edge leveraging GNN. Experimental results on two modern benchmark NIDS datasets not only clearly display the improvement of using Anomal-E embeddings rather than raw features, but also the potential Anomal-E has for detection on wild network traffic.

robust

Title: Lipschitz Continuity Retained Binary Neural Network. (arXiv:2207.06540v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06540
Code URL: https://github.com/42shawn/lcr_bnn
Copy Paste: [[2207.06540] Lipschitz Continuity Retained Binary Neural Network](http://arxiv.org/abs/2207.06540)
Summary:
Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective. However, robustness remains to be an ill-defined concept without solid theoretical support. In this work, we introduce the Lipschitz continuity, a well-defined functional property, as the rigorous criteria to define the model robustness for BNN. We then propose to retain the Lipschitz continuity as a regularization term to improve the model robustness. Particularly, while the popular Lipschitz-involved regularization methods often collapse in BNN due to its extreme sparsity, we design the Retention Matrices to approximate spectral norms of the targeted weight matrices, which can be deployed as the approximation for the Lipschitz constant of BNNs without the exact Lipschitz constant computation (NP-hard). Our experiments prove that our BNN-specific regularization method can effectively strengthen the robustness of BNN (testified on ImageNet-C), achieving state-of-the-art performance on CIFAR and ImageNet.

Title: Deepfake Video Detection with Spatiotemporal Dropout Transformer. (arXiv:2207.06612v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06612
Code URL: null
Copy Paste: [[2207.06612] Deepfake Video Detection with Spatiotemporal Dropout Transformer](http://arxiv.org/abs/2207.06612)
Summary:
While the abuse of deepfake technology has caused serious concerns recently, how to detect deepfake videos is still a challenge due to the high photo-realistic synthesis of each frame. Existing image-level approaches often focus on single frame and ignore the spatiotemporal cues hidden in deepfake videos, resulting in poor generalization and robustness. The key of a video-level detector is to fully exploit the spatiotemporal inconsistency distributed in local facial regions across different frames in deepfake videos. Inspired by that, this paper proposes a simple yet effective patch-level approach to facilitate deepfake video detection via spatiotemporal dropout transformer. The approach reorganizes each input video into bag of patches that is then fed into a vision transformer to achieve robust representation. Specifically, a spatiotemporal dropout operation is proposed to fully explore patch-level spatiotemporal cues and serve as effective data augmentation to further enhance model's robustness and generalization ability. The operation is flexible and can be easily plugged into existing vision transformers. Extensive experiments demonstrate the effectiveness of our approach against 25 state-of-the-arts with impressive robustness, generalizability, and representation ability.

Title: Octuplet Loss: Make Face Recognition Robust to Image Resolution. (arXiv:2207.06726v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06726
Code URL: https://github.com/martlgap/octuplet-loss
Copy Paste: [[2207.06726] Octuplet Loss: Make Face Recognition Robust to Image Resolution](http://arxiv.org/abs/2207.06726)
Summary:
Image resolution, or in general, image quality, plays an essential role in the performance of today's face recognition systems. To address this problem, we propose a novel combination of the popular triplet loss to improve robustness against image resolution via fine-tuning of existing face recognition models. With octuplet loss, we leverage the relationship between high-resolution images and their synthetically down-sampled variants jointly with their identity labels. Fine-tuning several state-of-the-art approaches with our method proves that we can significantly boost performance for cross-resolution (high-to-low resolution) face verification on various datasets without meaningfully exacerbating the performance on high-to-high resolution images. Our method applied on the FaceTransformer network achieves 95.12% face verification accuracy on the challenging XQLFW dataset while reaching 99.73% on the LFW database. Moreover, the low-to-low face verification accuracy benefits from our method. We release our code to allow seamless integration of the octuplet loss into existing frameworks.

Title: GeoSegNet: Point Cloud Semantic Segmentation via Geometric Encoder-Decoder Modeling. (arXiv:2207.06766v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06766
Code URL: https://github.com/chen-yuiyui/geosegnet
Copy Paste: [[2207.06766] GeoSegNet: Point Cloud Semantic Segmentation via Geometric Encoder-Decoder Modeling](http://arxiv.org/abs/2207.06766)
Summary:
Semantic segmentation of point clouds, aiming to assign each point a semantic category, is critical to 3D scene understanding.Despite of significant advances in recent years, most of existing methods still suffer from either the object-level misclassification or the boundary-level ambiguity. In this paper, we present a robust semantic segmentation network by deeply exploring the geometry of point clouds, dubbed GeoSegNet. Our GeoSegNet consists of a multi-geometry based encoder and a boundary-guided decoder. In the encoder, we develop a new residual geometry module from multi-geometry perspectives to extract object-level features. In the decoder, we introduce a contrastive boundary learning module to enhance the geometric representation of boundary points. Benefiting from the geometric encoder-decoder modeling, our GeoSegNet can infer the segmentation of objects effectively while making the intersections (boundaries) of two or more objects clear. Experiments show obvious improvements of our method over its competitors in terms of the overall segmentation accuracy and object boundary clearness. Code is available at https://github.com/Chen-yuiyui/GeoSegNet.

Title: Pose-based Tremor Classification for Parkinson's Disease Diagnosis from Video. (arXiv:2207.06828v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06828
Code URL: null
Copy Paste: [[2207.06828] Pose-based Tremor Classification for Parkinson's Disease Diagnosis from Video](http://arxiv.org/abs/2207.06828)
Summary:
Parkinson's disease (PD) is a progressive neurodegenerative disorder that results in a variety of motor dysfunction symptoms, including tremors, bradykinesia, rigidity and postural instability. The diagnosis of PD mainly relies on clinical experience rather than a definite medical test, and the diagnostic accuracy is only about 73-84% since it is challenged by the subjective opinions or experiences of different medical experts. Therefore, an efficient and interpretable automatic PD diagnosis system is valuable for supporting clinicians with more robust diagnostic decision-making. To this end, we propose to classify Parkinson's tremor since it is one of the most predominant symptoms of PD with strong generalizability. Different from other computer-aided time and resource-consuming Parkinson's Tremor (PT) classification systems that rely on wearable sensors, we propose SPAPNet, which only requires consumer-grade non-intrusive video recording of camera-facing human movements as input to provide undiagnosed patients with low-cost PT classification results as a PD warning sign. For the first time, we propose to use a novel attention module with a lightweight pyramidal channel-squeezing-fusion architecture to extract relevant PT information and filter the noise efficiently. This design aids in improving both classification performance and system interpretability. Experimental results show that our system outperforms state-of-the-arts by achieving a balanced accuracy of 90.9% and an F1-score of 90.6% in classifying PT with the non-PT class.

Title: Scene Text Recognition with Permuted Autoregressive Sequence Models. (arXiv:2207.06966v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06966
Code URL: https://github.com/baudm/parseq
Copy Paste: [[2207.06966] Scene Text Recognition with Permuted Autoregressive Sequence Models](http://arxiv.org/abs/2207.06966)
Summary:
Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an external LM. The conditional independence of the external LM on the input image may cause it to erroneously rectify correct predictions, leading to significant inefficiencies. Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. It unifies context-free non-AR and context-aware AR inference, and iterative refinement using bidirectional context. Using synthetic training data, PARSeq achieves state-of-the-art (SOTA) results in STR benchmarks (91.9% accuracy) and more challenging datasets. It establishes new SOTA results (96.0% accuracy) when trained on real data. PARSeq is optimal on accuracy vs parameter count, FLOPS, and latency because of its simple, unified structure and parallel token processing. Due to its extensive use of attention, it is robust on arbitrarily-oriented text which is common in real-world images. Code, pretrained weights, and data are available at: https://github.com/baudm/parseq.

Title: Language Modelling with Pixels. (arXiv:2207.06991v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.06991
Code URL: https://github.com/xplip/pixel
Copy Paste: [[2207.06991] Language Modelling with Pixels](http://arxiv.org/abs/2207.06991)
Summary:
Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when we attempt to scale the number of supported languages. Tackling this bottleneck results in a trade-off between what can be represented in the embedding matrix and computational issues in the output layer. This paper introduces PIXEL, the Pixel-based Encoder of Language, which suffers from neither of these issues. PIXEL is a pretrained language model that renders text as images, making it possible to transfer representations across languages based on orthographic similarity or the co-activation of pixels. PIXEL is trained to reconstruct the pixels of masked patches, instead of predicting a distribution over tokens. We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts. We find that PIXEL substantially outperforms BERT on syntactic and semantic processing tasks on scripts that are not found in the pretraining data, but PIXEL is slightly weaker than BERT when working with Latin scripts. Furthermore, we find that PIXEL is more robust to noisy text inputs than BERT, further confirming the benefits of modelling language with pixels.

Title: A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer. (arXiv:2207.07036v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.07036
Code URL: null
Copy Paste: [[2207.07036] A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer](http://arxiv.org/abs/2207.07036)
Summary:
While audio-visual speech models can yield superior performance and robustness compared to audio-only models, their development and adoption are hindered by the lack of labeled and unlabeled audio-visual data and the cost to deploy one model per modality. In this paper, we present u-HuBERT, a self-supervised pre-training framework that can leverage both multimodal and unimodal speech with a unified masked cluster prediction objective. By utilizing modality dropout during pre-training, we demonstrate that a single fine-tuned model can achieve performance on par or better than the state-of-the-art modality-specific models. Moreover, our model fine-tuned only on audio can perform well with audio-visual and visual speech input, achieving zero-shot modality generalization for speech recognition and speaker verification. In particular, our single model yields 1.2%/1.4%/27.2% speech recognition word error rate on LRS3 with audio-visual/audio/visual input.

Title: Modeling Long-term Dependencies and Short-term Correlations in Patient Journey Data with Temporal Attention Networks for Health Prediction. (arXiv:2207.06414v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06414
Code URL: null
Copy Paste: [[2207.06414] Modeling Long-term Dependencies and Short-term Correlations in Patient Journey Data with Temporal Attention Networks for Health Prediction](http://arxiv.org/abs/2207.06414)
Summary:
Building models for health prediction based on Electronic Health Records (EHR) has become an active research area. EHR patient journey data consists of patient time-ordered clinical events/visits from patients. Most existing studies focus on modeling long-term dependencies between visits, without explicitly taking short-term correlations between consecutive visits into account, where irregular time intervals, incorporated as auxiliary information, are fed into health prediction models to capture latent progressive patterns of patient journeys. We present a novel deep neural network with four modules to take into account the contributions of various variables for health prediction: i) the Stacked Attention module strengthens the deep semantics in clinical events within each patient journey and generates visit embeddings, ii) the Short-Term Temporal Attention module models short-term correlations between consecutive visit embeddings while capturing the impact of time intervals within those visit embeddings, iii) the Long-Term Temporal Attention module models long-term dependencies between visit embeddings while capturing the impact of time intervals within those visit embeddings, iv) and finally, the Coupled Attention module adaptively aggregates the outputs of Short-Term Temporal Attention and Long-Term Temporal Attention modules to make health predictions. Experimental results on MIMIC-III demonstrate superior predictive accuracy of our model compared to existing state-of-the-art methods, as well as the interpretability and robustness of this approach. Furthermore, we found that modeling short-term correlations contributes to local priors generation, leading to improved predictive modeling of patient journeys.

Title: A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQA. (arXiv:2207.06490v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.06490
Code URL: null
Copy Paste: [[2207.06490] A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQA](http://arxiv.org/abs/2207.06490)
Summary:
Numerical reasoning is required when solving most problems in our life, but it has been neglected in previous artificial intelligence researches. FinQA challenge has been organized to strengthen the study on numerical reasoning where the participants are asked to predict the numerical reasoning program to solve financial question. The result of FinQA will be evaluated by both execution accuracy and program accuracy. In this paper, we present our approach to tackle the task objective by developing models with different specialized capabilities and fusing their strength. Overall, our approach achieves the 1st place in FinQA challenge, with 71.93% execution accuracy and 67.03% program accuracy.

Title: Learning to translate by learning to communicate. (arXiv:2207.07025v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.07025
Code URL: https://github.com/clmbrs/communication-translation
Copy Paste: [[2207.07025] Learning to translate by learning to communicate](http://arxiv.org/abs/2207.07025)
Summary:
We formulate and test a technique to use Emergent Communication (EC) with a pretrained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages. It has been argued that the currently dominant paradigm in NLP of pretraining on text-only corpora will not yield robust natural language understanding systems, and the need for grounded, goal-oriented, and interactive language learning has been highlighted. In our approach, we embed a modern multilingual model (mBART, Liu et. al. 2020) into an EC image-reference game, in which the model is incentivized to use multilingual generations to accomplish a vision-grounded task, with the hypothesis that this will align multiple languages to a shared task space. We present two variants of EC Fine-Tuning (Steinert-Threlkeld et. al. 2022), one of which outperforms a backtranslation-based baseline in 6/8 translation settings, and proves especially beneficial for the very low-resource languages of Nepali and Sinhala.

Title: CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One. (arXiv:2207.06543v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06543
Code URL: https://github.com/lywang3081/coscl
Copy Paste: [[2207.06543] CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One](http://arxiv.org/abs/2207.06543)
Summary:
Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance.

Title: DropNet: Reducing Neural Network Complexity via Iterative Pruning. (arXiv:2207.06646v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06646
Code URL: https://github.com/tanchongmin/DropNet
Copy Paste: [[2207.06646] DropNet: Reducing Neural Network Complexity via Iterative Pruning](http://arxiv.org/abs/2207.06646)
Summary:
Modern deep neural networks require a significant amount of computing time and power to train and deploy, which limits their usage on edge devices. Inspired by the iterative weight pruning in the Lottery Ticket Hypothesis, we propose DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity. DropNet iteratively removes nodes/filters with the lowest average post-activation value across all training samples. Empirically, we show that DropNet is robust across diverse scenarios, including MLPs and CNNs using the MNIST, CIFAR-10 and Tiny ImageNet datasets. We show that up to 90% of the nodes/filters can be removed without any significant loss of accuracy. The final pruned network performs well even with reinitialization of the weights and biases. DropNet also has similar accuracy to an oracle which greedily removes nodes/filters one at a time to minimise training loss, highlighting its effectiveness.

Title: Work In Progress: Safety and Robustness Verification of Autoencoder-Based Regression Models using the NNV Tool. (arXiv:2207.06759v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06759
Code URL: null
Copy Paste: [[2207.06759] Work In Progress: Safety and Robustness Verification of Autoencoder-Based Regression Models using the NNV Tool](http://arxiv.org/abs/2207.06759)
Summary:
This work in progress paper introduces robustness verification for autoencoder-based regression neural network (NN) models, following state-of-the-art approaches for robustness verification of image classification NNs. Despite the ongoing progress in developing verification methods for safety and robustness in various deep neural networks (DNNs), robustness checking of autoencoder models has not yet been considered. We explore this open space of research and check ways to bridge the gap between existing DNN verification methods by extending existing robustness analysis methods for such autoencoder networks. While classification models using autoencoders work more or less similar to image classification NNs, the functionality of regression models is distinctly different. We introduce two definitions of robustness evaluation metrics for autoencoder-based regression models, specifically the percentage robustness and un-robustness grade. We also modified the existing Imagestar approach, adjusting the variables to take care of the specific input types for regression networks. The approach is implemented as an extension of NNV, then applied and evaluated on a dataset, with a case study experiment shown using the same dataset. As per the authors' understanding, this work in progress paper is the first to show possible reachability analysis of autoencoder-based NNs.

Title: Distance Learner: Incorporating Manifold Prior to Model Training. (arXiv:2207.06888v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06888
Code URL: https://github.com/microsoft/distance-learner
Copy Paste: [[2207.06888] Distance Learner: Incorporating Manifold Prior to Model Training](http://arxiv.org/abs/2207.06888)
Summary:
The manifold hypothesis (real world data concentrates near low-dimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very high dimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with varying success. In this paper, we propose a new method, Distance Learner, to incorporate this prior for DNN-based classifiers. Distance Learner is trained to predict the distance of a point from the underlying manifold of each class, rather than the class label. For classification, Distance Learner then chooses the class corresponding to the closest predicted class manifold. Distance Learner can also identify points as being out of distribution (belonging to neither class), if the distance to the closest manifold is higher than a threshold. We evaluate our method on multiple synthetic datasets and show that Distance Learner learns much more meaningful classification boundaries compared to a standard classifier. We also evaluate our method on the task of adversarial robustness, and find that it not only outperforms standard classifier by a large margin, but also performs at par with classifiers trained via state-of-the-art adversarial training.

Title: Early Detection of Ovarian Cancer by Wavelet Analysis of Protein Mass Spectra. (arXiv:2207.07028v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.07028
Code URL: null
Copy Paste: [[2207.07028] Early Detection of Ovarian Cancer by Wavelet Analysis of Protein Mass Spectra](http://arxiv.org/abs/2207.07028)
Summary:
Accurate and efficient detection of ovarian cancer at early stages is critical to ensure proper treatments for patients. Among the first-line modalities investigated in studies of early diagnosis are features distilled from protein mass spectra. This method, however, considers only a specific subset of spectral responses and ignores the interplay among protein expression levels, which can also contain diagnostic information. We propose a new modality that automatically searches protein mass spectra for discriminatory features by considering the self-similar nature of the spectra. Self-similarity is assessed by taking a wavelet decomposition of protein mass spectra and estimating the rate of level-wise decay in the energies of the resulting wavelet coefficients. Level-wise energies are estimated in a robust manner using distance variance, and rates are estimated locally via a rolling window approach. This results in a collection of rates that can be used to characterize the interplay among proteins, which can be indicative of cancer presence. Discriminatory descriptors are then selected from these evolutionary rates and used as classifying features. The proposed wavelet-based features are used in conjunction with features proposed in the existing literature for early stage diagnosis of ovarian cancer using two datasets published by the American National Cancer Institute. Including the wavelet-based features from the new modality results in improvements in diagnostic performance for early-stage ovarian cancer detection. This demonstrates the ability of the proposed modality to characterize new ovarian cancer diagnostic information.

biometric

steal

extraction

Title: Graph CNN for Moving Object Detection in Complex Environments from Unseen Videos. (arXiv:2207.06440v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06440
Code URL: null
Copy Paste: [[2207.06440] Graph CNN for Moving Object Detection in Complex Environments from Unseen Videos](http://arxiv.org/abs/2207.06440)
Summary:
Moving Object Detection (MOD) is a fundamental step for many computer vision applications. MOD becomes very challenging when a video sequence captured from a static or moving camera suffers from the challenges: camouflage, shadow, dynamic backgrounds, and lighting variations, to name a few. Deep learning methods have been successfully applied to address MOD with competitive performance. However, in order to handle the overfitting problem, deep learning methods require a large amount of labeled data which is a laborious task as exhaustive annotations are always not available. Moreover, some MOD deep learning methods show performance degradation in the presence of unseen video sequences because the testing and training splits of the same sequences are involved during the network learning process. In this work, we pose the problem of MOD as a node classification problem using Graph Convolutional Neural Networks (GCNNs). Our algorithm, dubbed as GraphMOD-Net, encompasses instance segmentation, background initialization, feature extraction, and graph construction. GraphMOD-Net is tested on unseen videos and outperforms state-of-the-art methods in unsupervised, semi-supervised, and supervised learning in several challenges of the Change Detection 2014 (CDNet2014) and UCSD background subtraction datasets.

Title: TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents. (arXiv:2207.06744v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06744
Code URL: null
Copy Paste: [[2207.06744] TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents](http://arxiv.org/abs/2207.06744)
Summary:
Recently, automatically extracting information from visually rich documents (e.g., tickets and resumes) has become a hot and vital research topic due to its widespread commercial value. Most existing methods divide this task into two subparts: the text reading part for obtaining the plain text from the original document images and the information extraction part for extracting key contents. These methods mainly focus on improving the second, while neglecting that the two parts are highly correlated. This paper proposes a unified end-to-end information extraction framework from visually rich documents, where text reading and information extraction can reinforce each other via a well-designed multi-modal context block. Specifically, the text reading part provides multi-modal features like visual, textual and layout features. The multi-modal context block is developed to fuse the generated multi-modal features and even the prior knowledge from the pre-trained language model for better semantic representation. The information extraction part is responsible for generating key contents with the fused context features. The framework can be trained in an end-to-end trainable manner, achieving global optimization. What is more, we define and group visually rich documents into four categories across two dimensions, the layout and text type. For each document category, we provide or recommend the corresponding benchmarks, experimental settings and strong baselines for remedying the problem that this research area lacks the uniform evaluation standard. Extensive experiments on four kinds of benchmarks (from fixed layout to variable layout, from full-structured text to semi-unstructured text) are reported, demonstrating the proposed method's effectiveness. Data, source code and models are available.

Title: DEXTER: An end-to-end system to extract table contents from electronic medical health documents. (arXiv:2207.06823v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.06823
Code URL: null
Copy Paste: [[2207.06823] DEXTER: An end-to-end system to extract table contents from electronic medical health documents](http://arxiv.org/abs/2207.06823)
Summary:
In this paper, we propose DEXTER, an end to end system to extract information from tables present in medical health documents, such as electronic health records (EHR) and explanation of benefits (EOB). DEXTER consists of four sub-system stages: i) table detection ii) table type classification iii) cell detection; and iv) cell content extraction. We propose a two-stage transfer learning-based approach using CDeC-Net architecture along with Non-Maximal suppression for table detection. We design a conventional computer vision-based approach for table type classification and cell detection using parameterized kernels based on image size for detecting rows and columns. Finally, we extract the text from the detected cells using pre-existing OCR engine Tessaract. To evaluate our system, we manually annotated a sample of the real-world medical dataset (referred to as Meddata) consisting of wide variations of documents (in terms of appearance) covering different table structures, such as bordered, partially bordered, borderless, or coloured tables. We experimentally show that DEXTER outperforms the commercially available Amazon Textract and Microsoft Azure Form Recognizer systems on the annotated real-world medical dataset

Title: Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration. (arXiv:2207.06717v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.06717
Code URL: null
Copy Paste: [[2207.06717] Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration](http://arxiv.org/abs/2207.06717)
Summary:
Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a challenging research problem. Previous work ignores the visual property of documents and treats them as plain text, resulting in incomplete modality. In this paper, we propose a Layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents (VRDs), so as to generate accurate responses in dialogue systems. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents, becoming the largest VRD-based information extraction dataset to the best of our knowledge. We also develop benchmark methods that extend the token-based language model to consider layout features like humans. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.

Title: Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases. (arXiv:2207.06729v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2207.06729
Code URL: null
Copy Paste: [[2207.06729] Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases](http://arxiv.org/abs/2207.06729)
Summary:
Consolidated access to current and reliable terms from different subject fields and languages is necessary for content creators and translators. Terminology is also needed in AI applications such as machine translation, speech recognition, information extraction, and other natural language processing tools. In this work, we facilitate standards-based sharing and management of terminology resources by providing an open terminology management solution - the EuroTermBank Toolkit. It allows organisations to manage and search their terms, create term collections, and share them within and outside the organisation by participating in the network of federated databases. The data curated in the federated databases are automatically shared with EuroTermBank, the largest multilingual terminology resource in Europe, allowing translators and language service providers as well as researchers and students to access terminology resources in their most current version.

membership infer

federate

Title: Multi-Level Branched Regularization for Federated Learning. (arXiv:2207.06936v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06936
Code URL: https://github.com/jinkyu032/FedMLB
Copy Paste: [[2207.06936] Multi-Level Branched Regularization for Federated Learning](http://arxiv.org/abs/2207.06936)
Summary:
A critical challenge of federated learning is data heterogeneity and imbalance across clients, which leads to inconsistency between local networks and unstable convergence of global models. To alleviate the limitations, we propose a novel architectural regularization technique that constructs multiple auxiliary branches in each local model by grafting local and global subnetworks at several different levels and that learns the representations of the main pathway in the local model congruent to the auxiliary hybrid pathways via online knowledge distillation. The proposed technique is effective to robustify the global model even in the non-iid setting and is applicable to various federated learning frameworks conveniently without incurring extra communication costs. We perform comprehensive empirical studies and demonstrate remarkable performance gains in terms of accuracy and efficiency compared to existing methods. The source code is available at our project page.

fair

Title: The Free Energy Principle for Perception and Action: A Deep Learning Perspective. (arXiv:2207.06415v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.06415
Code URL: null
Copy Paste: [[2207.06415] The Free Energy Principle for Perception and Action: A Deep Learning Perspective](http://arxiv.org/abs/2207.06415)
Summary:
The free energy principle, and its corollary active inference, constitute a bio-inspired theory that assumes biological agents act to remain in a restricted set of preferred states of the world, i.e., they minimize their free energy. Under this principle, biological agents learn a generative model of the world and plan actions in the future that will maintain the agent in an homeostatic state that satisfies its preferences. This framework lends itself to being realized in silico, as it comprehends important aspects that make it computationally affordable, such as variational inference and amortized planning. In this work, we investigate the tool of deep learning to design and realize artificial agents based on active inference, presenting a deep-learning oriented presentation of the free energy principle, surveying works that are relevant in both machine learning and active inference areas, and discussing the design choices that are involved in the implementation process. This manuscript probes newer perspectives for the active inference framework, grounding its theoretical aspects into more pragmatic affairs, offering a practical guide to active inference newcomers and a starting point for deep learning practitioners that would like to investigate implementations of the free energy principle.

Title: From Shapley back to Pearson: Hypothesis Testing via the Shapley Value. (arXiv:2207.07038v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.07038
Code URL: https://github.com/sulam-group/shaplit
Copy Paste: [[2207.07038] From Shapley back to Pearson: Hypothesis Testing via the Shapley Value](http://arxiv.org/abs/2207.07038)
Summary:
Machine learning models, in particular artificial neural networks, are increasingly used to inform decision making in high-stakes scenarios across a variety of fields--from financial services, to public safety, and healthcare. While neural networks have achieved remarkable performance in many settings, their complex nature raises concerns on their reliability, trustworthiness, and fairness in real-world scenarios. As a result, several a-posteriori explanation methods have been proposed to highlight the features that influence a model's prediction. Notably, the Shapley value--a game theoretic quantity that satisfies several desirable properties--has gained popularity in the machine learning explainability literature. More traditionally, however, feature importance in statistical learning has been formalized by conditional independence, and a standard way to test for it is via Conditional Randomization Tests (CRTs). So far, these two perspectives on interpretability and feature importance have been considered distinct and separate. In this work, we show that Shapley-based explanation methods and conditional independence testing for feature importance are closely related. More precisely, we prove that evaluating a Shapley coefficient amounts to performing a specific set of conditional independence tests, as implemented by a procedure similar to the CRT but for a different null hypothesis. Furthermore, the obtained game-theoretic values upper bound the $p$-values of such tests. As a result, we grant large Shapley coefficients with a precise statistical sense of importance with controlled type I error.

Title: Bia Mitigation for Machine Learning Classifiers: A Comprehensive Survey. (arXiv:2207.07068v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2207.07068
Code URL: null
Copy Paste: [[2207.07068] Bia Mitigation for Machine Learning Classifiers: A Comprehensive Survey](http://arxiv.org/abs/2207.07068)
Summary:
This paper provides a comprehensive survey of bias mitigation methods for achieving fairness in Machine Learning (ML) models. We collect a total of 234 publications concerning bias mitigation for ML classifiers. These methods can be distinguished based on their intervention procedure (i.e., pre-processing, in-processing, post-processing) and the technology they apply. We investigate how existing bias mitigation methods are evaluated in the literature. In particular, we consider datasets, metrics and benchmarking. Based on the gathered insights (e.g., what is the most popular fairness metric? How many datasets are used for evaluating bias mitigation methods?). We hope to support practitioners in making informed choices when developing and evaluating new bias mitigation methods.

interpretability

Title: Explaining Image Enhancement Black-Box Methods through a Path Planning Based Algorithm. (arXiv:2207.07092v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.07092
Code URL: null
Copy Paste: [[2207.07092] Explaining Image Enhancement Black-Box Methods through a Path Planning Based Algorithm](http://arxiv.org/abs/2207.07092)
Summary:
Nowadays, image-to-image translation methods, are the state of the art for the enhancement of natural images. Even if they usually show high performance in terms of accuracy, they often suffer from several limitations such as the generation of artifacts and the scalability to high resolutions. Moreover, their main drawback is the completely black-box approach that does not allow to provide the final user with any insight about the enhancement processes applied. In this paper we present a path planning algorithm which provides a step-by-step explanation of the output produced by state of the art enhancement methods, overcoming black-box limitation. This algorithm, called eXIE, uses a variant of the A* algorithm to emulate the enhancement process of another method through the application of an equivalent sequence of enhancing operators. We applied eXIE to explain the output of several state-of-the-art models trained on the Five-K dataset, obtaining sequences of enhancing operators able to produce very similar results in terms of performance and overcoming the huge limitation of poor interpretability of the best performing algorithms.

Title: Fine-grained Few-shot Recognition by Deep Object Parsing. (arXiv:2207.07110v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2207.07110
Code URL: null
Copy Paste: [[2207.07110] Fine-grained Few-shot Recognition by Deep Object Parsing](http://arxiv.org/abs/2207.07110)
Summary:
In our framework, an object is made up of K distinct parts or units, and we parse a test instance by inferring the K parts, where each part occupies a distinct location in the feature space, and the instance features at this location, manifest as an active subset of part templates shared across all instances. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. We propose an end-to-end training method to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn multi-scale templates, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.