secure

security

Title: Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms. (arXiv:2302.11027v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11027
Code URL: null
Copy Paste: [[2302.11027] Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms](http://arxiv.org/abs/2302.11027) #security
Summary:
Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm ,76.76% on CNN-Transformer and 80% on C3D.

Title: A study on the invariance in security whatever the dimension of images for the steganalysis by deep-learning. (arXiv:2302.11527v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11527
Code URL: https://github.com/Kevin-Planolles/steganalysis_with_CNN_dilated-Yedroudj-Net
Copy Paste: [[2302.11527] A study on the invariance in security whatever the dimension of images for the steganalysis by deep-learning](http://arxiv.org/abs/2302.11527) #security
Summary:
In this paper, we study the performance invariance of convolutional neural networks when confronted with variable image sizes in the context of a more "wild steganalysis". First, we propose two algorithms and definitions for a fine experimental protocol with datasets owning "similar difficulty" and "similar security". The "smart crop 2" algorithm allows the introduction of the Nearly Nested Image Datasets (NNID) that ensure "a similar difficulty" between various datasets, and a dichotomous research algorithm allows a "similar security". Second, we show that invariance does not exist in state-of-the-art architectures. We also exhibit a difference in behavior depending on whether we test on images larger or smaller than the training images. Finally, based on the experiments, we propose to use the dilated convolution which leads to an improvement of a state-of-the-art architecture.

Title: Counterfeit Chip Detection using Scattering Parameter Analysis. (arXiv:2302.11034v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2302.11034
Code URL: null
Copy Paste: [[2302.11034] Counterfeit Chip Detection using Scattering Parameter Analysis](http://arxiv.org/abs/2302.11034) #security
Summary:
The increase in the number of counterfeit and recycled microelectronic chips in recent years has created significant security and safety concerns in various applications. Hence, detecting such counterfeit chips in electronic systems is critical before deployment in the field. Unfortunately, the conventional verification tools using physical inspection and side-channel methods are costly, unscalable, error-prone, and often incompatible with legacy systems. This paper introduces a generic non-invasive and low-cost counterfeit chip detection based on characterizing the impedance of the system's power delivery network (PDN). Our method relies on the fact that the impedance of the counterfeit and recycled chips differs from the genuine ones. To sense such impedance variations confidently, we deploy scattering parameters, frequently used for impedance characterization of RF/microwave circuits. Our proposed approach can directly be applied to soldered chips on the system's PCB and does not require any modifications on the legacy systems. To validate our claims, we perform extensive measurements on genuine and aged samples from two families of STMicroelectronics chips to assess the effectiveness of the proposed approach.

Title: An End-To-End Encrypted Cache System with Time-Dependent Access Control. (arXiv:2302.11292v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2302.11292
Code URL: null
Copy Paste: [[2302.11292] An End-To-End Encrypted Cache System with Time-Dependent Access Control](http://arxiv.org/abs/2302.11292) #security
Summary:
Due to the increasing use of encrypted communication, such as Transport Layer Security (TLS), encrypted cache systems are a promising approach for providing communication efficiency and privacy. Cache-22 is an encrypted cache system (Emura et al. ISITA 2020) that makes it possible to significantly reduce communication between a cache server and a service provider. In the final procedure of Cache-22, the service provider sends the corresponding decryption key to the user via TLS and this procedure allows the service provider to control which users can access the contents. For example, if a user has downloaded ciphertexts of several episodes of a show, the service provider can decide to provide some of the contents (e.g., the first episode) available for free while requiring a fee for the remaining contents. However, no concrete access control method has been implemented in the original Cache-22 system. In this paper, we add a scalable access control protocol to Cache-22. Specifically, we propose a time-dependent access control that requires a communication cost of $O(\log T_{\sf max})$ where $T_{\sf max}$ is the maximum time period. Although the protocol is stateful, we can provide time-dependent access control with scalability at the expense of this key management. We present experimental results and demonstrate that the modified system is effective for controlling access rights. We also observe a relationship between cache capacity and network traffic because the number of duplicated contents is higher than that in the original Cache-22 system, due to time-dependent access control.

Title: BUAA_BIGSCity: Spatial-Temporal Graph Neural Network for Wind Power Forecasting in Baidu KDD CUP 2022. (arXiv:2302.11159v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11159
Code URL: https://github.com/buaabigscity/kddcup2022
Copy Paste: [[2302.11159] BUAA_BIGSCity: Spatial-Temporal Graph Neural Network for Wind Power Forecasting in Baidu KDD CUP 2022](http://arxiv.org/abs/2302.11159) #security
Summary:
In this technical report, we present our solution for the Baidu KDD Cup 2022 Spatial Dynamic Wind Power Forecasting Challenge. Wind power is a rapidly growing source of clean energy. Accurate wind power forecasting is essential for grid stability and the security of supply. Therefore, organizers provide a wind power dataset containing historical data from 134 wind turbines and launch the Baidu KDD Cup 2022 to examine the limitations of current methods for wind power forecasting. The average of RMSE (Root Mean Square Error) and MAE (Mean Absolute Error) is used as the evaluation score. We adopt two spatial-temporal graph neural network models, i.e., AGCRN and MTGNN, as our basic models. We train AGCRN by 5-fold cross-validation and additionally train MTGNN directly on the training and validation sets. Finally, we ensemble the two models based on the loss values of the validation set as our final submission. Using our method, our team \team achieves -45.36026 on the test set. We release our codes on Github (https://github.com/BUAABIGSCity/KDDCUP2022) for reproduction.

privacy

Title: Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks. (arXiv:2302.11074v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2302.11074
Code URL: null
Copy Paste: [[2302.11074] Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks](http://arxiv.org/abs/2302.11074) #privacy
Summary:
Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model. Training an MTL model requires having the training data for all tasks available at the same time. As systems usually evolve over time, (e.g., to support new functionalities), adding a new task to an existing MTL model usually requires retraining the model from scratch on all the tasks and this can be time-consuming and computationally expensive. Moreover, in some scenarios, the data used to train the original training may be no longer available, for example, due to storage or privacy concerns. In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks. To avoid catastrophic forgetting, we propose to exploit unlabeled data from the same distributions of the old tasks. Our experiments on publicly available benchmarks show that such a technique dramatically benefits the distillation by preserving the already acquired knowledge (i.e., preventing up to 20% performance drops on old tasks) while obtaining good performance on the incrementally added tasks. Further, we also show that our approach is beneficial in practical settings by using data from a leading voice assistant.

Title: Deep Neural Networks for Encrypted Inference with TFHE. (arXiv:2302.10906v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10906
Code URL: null
Copy Paste: [[2302.10906] Deep Neural Networks for Encrypted Inference with TFHE](http://arxiv.org/abs/2302.10906) #privacy
Summary:
Fully homomorphic encryption (FHE) is an encryption method that allows to perform computation on encrypted data, without decryption. FHE preserves the privacy of the users of online services that handle sensitive data, such as health data, biometrics, credit scores and other personal information. A common way to provide a valuable service on such data is through machine learning and, at this time, Neural Networks are the dominant machine learning model for unstructured data. In this work we show how to construct Deep Neural Networks (DNN) that are compatible with the constraints of TFHE, an FHE scheme that allows arbitrary depth computation circuits. We discuss the constraints and show the architecture of DNNs for two computer vision tasks. We benchmark the architectures using the Concrete stack, an open-source implementation of TFHE.

Title: Multi-Message Shuffled Privacy in Federated Learning. (arXiv:2302.11152v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11152
Code URL: null
Copy Paste: [[2302.11152] Multi-Message Shuffled Privacy in Federated Learning](http://arxiv.org/abs/2302.11152) #privacy
Summary:
We study differentially private distributed optimization under communication constraints. A server using SGD for optimization aggregates the client-side local gradients for model updates using distributed mean estimation (DME). We develop a communication-efficient private DME, using the recently developed multi-message shuffled (MMS) privacy framework. We analyze our proposed DME scheme to show that it achieves the order-optimal privacy-communication-performance tradeoff resolving an open question in [1], whether the shuffled models can improve the tradeoff obtained in Secure Aggregation. This also resolves an open question on the optimal trade-off for private vector sum in the MMS model. We achieve it through a novel privacy mechanism that non-uniformly allocates privacy at different resolutions of the local gradient vectors. These results are directly applied to give guarantees on private distributed learning algorithms using this for private gradient aggregation iteratively. We also numerically evaluate the private DME algorithms.

Title: Learning to Simulate Daily Activities via Modeling Dynamic Human Needs. (arXiv:2302.10897v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10897
Code URL: null
Copy Paste: [[2302.10897] Learning to Simulate Daily Activities via Modeling Dynamic Human Needs](http://arxiv.org/abs/2302.10897) #privacy
Summary:
Daily activity data that records individuals' various types of activities in daily life are widely used in many applications such as activity scheduling, activity recommendation, and policymaking. Though with high value, its accessibility is limited due to high collection costs and potential privacy issues. Therefore, simulating human activities to produce massive high-quality data is of great importance to benefit practical applications. However, existing solutions, including rule-based methods with simplified assumptions of human behavior and data-driven methods directly fitting real-world data, both cannot fully qualify for matching reality. In this paper, motivated by the classic psychological theory, Maslow's need theory describing human motivation, we propose a knowledge-driven simulation framework based on generative adversarial imitation learning. To enhance the fidelity and utility of the generated activity data, our core idea is to model the evolution of human needs as the underlying mechanism that drives activity generation in the simulation model. Specifically, this is achieved by a hierarchical model structure that disentangles different need levels, and the use of neural stochastic differential equations that successfully captures piecewise-continuous characteristics of need dynamics. Extensive experiments demonstrate that our framework outperforms the state-of-the-art baselines in terms of data fidelity and utility. Besides, we present the insightful interpretability of the need modeling. The code is available at https://github.com/tsinghua-fib-lab/SAND.

Title: Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment. (arXiv:2302.10908v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10908
Code URL: null
Copy Paste: [[2302.10908] Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment](http://arxiv.org/abs/2302.10908) #privacy
Summary:
The presence of decision-making algorithms in society is rapidly increasing nowadays, while concerns about their transparency and the possibility of these algorithms becoming new sources of discrimination are arising. There is a certain consensus about the need to develop AI applications with a Human-Centric approach. Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes. All these four Human-Centric requirements are closely related to each other. With the aim of studying how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data, we propose a fictitious case study focused on automated recruitment: FairCVtest. We train automatic recruitment algorithms using a set of multimodal synthetic profiles including image, text, and structured data, which are consciously scored with gender and racial biases. FairCVtest shows the capacity of the Artificial Intelligence (AI) behind automatic recruitment tools built this way (a common practice in many other application scenarios beyond recruitment) to extract sensitive information from unstructured data and exploit it in combination to data biases in undesirable (unfair) ways. We present an overview of recent works developing techniques capable of removing sensitive information and biases from the decision-making process of deep learning architectures, as well as commonly used databases for fairness research in AI. We demonstrate how learning approaches developed to guarantee privacy in latent spaces can lead to unbiased and fair automatic decision-making process.

protect

defense

attack

Title: MultiRobustBench: Benchmarking Robustness Against Multiple Attacks. (arXiv:2302.10980v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10980
Code URL: null
Copy Paste: [[2302.10980] MultiRobustBench: Benchmarking Robustness Against Multiple Attacks](http://arxiv.org/abs/2302.10980) #attack
Summary:
The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.

Title: PAD: Towards Principled Adversarial Malware Detection Against Evasion Attacks. (arXiv:2302.11328v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2302.11328
Code URL: https://github.com/deqangss/pad4amd
Copy Paste: [[2302.11328] PAD: Towards Principled Adversarial Malware Detection Against Evasion Attacks](http://arxiv.org/abs/2302.11328) #attack
Summary:
Machine Learning (ML) techniques facilitate automating malicious software (malware for short) detection, but suffer from evasion attacks. Many researchers counter such attacks in heuristic manners short of both theoretical guarantees and defense effectiveness. We hence propose a new adversarial training framework, termed Principled Adversarial Malware Detection (PAD), which encourages convergence guarantees for robust optimization methods. PAD lays on a learnable convex measurement that quantifies distribution-wise discrete perturbations and protects the malware detector from adversaries, by which for smooth detectors, adversarial training can be performed heuristically with theoretical treatments. To promote defense effectiveness, we propose a new mixture of attacks to instantiate PAD for enhancing the deep neural network-based measurement and malware detector. Experimental results on two Android malware datasets demonstrate: (i) the proposed method significantly outperforms the state-of-the-art defenses; (ii) it can harden the ML-based malware detection against 27 evasion attacks with detection accuracies greater than 83.45%, while suffering an accuracy decrease smaller than 2.16% in the absence of attacks; (iii) it matches or outperforms many anti-malware scanners in VirusTotal service against realistic adversarial malware.

robust

Title: IB-RAR: Information Bottleneck as Regularizer for Adversarial Robustness. (arXiv:2302.10896v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10896
Code URL: null
Copy Paste: [[2302.10896] IB-RAR: Information Bottleneck as Regularizer for Adversarial Robustness](http://arxiv.org/abs/2302.10896) #robust
Summary:
In this paper, we propose a novel method, IB-RAR, which uses Information Bottleneck (IB) to strengthen adversarial robustness for both adversarial training and non-adversarial-trained methods. We first use the IB theory to build regularizers as learning objectives in the loss function. Then, we filter out unnecessary features of intermediate representation according to their mutual information (MI) with labels, as the network trained with IB provides easily distinguishable MI for its features. Experimental results show that our method can be naturally combined with adversarial training and provides consistently better accuracy on new adversarial examples. Our method improves the accuracy by an average of 3.07% against five adversarial attacks for the VGG16 network, trained with three adversarial training benchmarks and the CIFAR-10 dataset. In addition, our method also provides good robustness for undefended methods, such as training with cross-entropy loss only. Finally, in the absence of adversarial training, the VGG16 network trained using our method and the CIFAR-10 dataset reaches an accuracy of 35.86% against PGD examples, while using all layers reaches 25.61% accuracy.

Title: Distribution Normalization: An "Effortless" Test-Time Augmentation for Contrastively Learned Visual-language Models. (arXiv:2302.11084v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11084
Code URL: https://github.com/fengyuli2002/distribution-normalization
Copy Paste: [[2302.11084] Distribution Normalization: An "Effortless" Test-Time Augmentation for Contrastively Learned Visual-language Models](http://arxiv.org/abs/2302.11084) #robust
Summary:
Advances in the field of visual-language contrastive learning have made it possible for many downstream applications to be carried out efficiently and accurately by simply taking the dot product between image and text representations. One of the most representative approaches proposed recently known as CLIP has quickly garnered widespread adoption due to its effectiveness. CLIP is trained with an InfoNCE loss that takes into account both positive and negative samples to help learn a much more robust representation space. This paper however reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time. Intuitively, since the model has been optimized based on the InfoNCE loss, test-time procedures should ideally also be in alignment. The question lies in how one can retrieve any semblance of negative samples information during inference. We propose Distribution Normalization (DN), where we approximate the mean representation of a batch of test samples and use such a mean to represent what would be analogous to negative samples in the InfoNCE loss. DN requires no retraining or fine-tuning and can be effortlessly applied during inference. Extensive experiments on a wide variety of downstream tasks exhibit a clear advantage of DN over the dot product.

Title: Invariant Target Detection in Images through the Normalized 2-D Correlation Technique. (arXiv:2302.11196v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11196
Code URL: null
Copy Paste: [[2302.11196] Invariant Target Detection in Images through the Normalized 2-D Correlation Technique](http://arxiv.org/abs/2302.11196) #robust
Summary:
The normalized 2-D correlation technique is a robust method for detecting targets in images due to its ability to remain invariant under rotation, translation, and scaling. This paper examines the impact of translation, and scaling on target identification in images. The results indicate a high level of accuracy in detecting targets, even when they are exhibit variations in location and size. The results indicate that the similarity between the image and the two used targets improves as the resize ratio increases. All statistical estimators demonstrate a strong similarity between the original and extracted targets. The elapsed time for all scenarios falls within the range (44.75-44.85), (37.48-37.73) seconds for bird and children targets respectively, and the correlation coefficient displays stable relationships with values that fall within the range of (0.90-0.98) and (0.87-0.93) for bird and children targets respectively.

Title: Asynchronous Trajectory Matching-Based Multimodal Maritime Data Fusion for Vessel Traffic Surveillance in Inland Waterways. (arXiv:2302.11283v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11283
Code URL: https://github.com/gy65896/fvessel
Copy Paste: [[2302.11283] Asynchronous Trajectory Matching-Based Multimodal Maritime Data Fusion for Vessel Traffic Surveillance in Inland Waterways](http://arxiv.org/abs/2302.11283) #robust
Summary:
The automatic identification system (AIS) and video cameras have been widely exploited for vessel traffic surveillance in inland waterways. The AIS data could provide the vessel identity and dynamic information on vessel position and movements. In contrast, the video data could describe the visual appearances of moving vessels, but without knowing the information on identity, position and movements, etc. To further improve vessel traffic surveillance, it becomes necessary to fuse the AIS and video data to simultaneously capture the visual features, identity and dynamic information for the vessels of interest. However, traditional data fusion methods easily suffer from several potential limitations, e.g., asynchronous messages, missing data, random outliers, etc. In this work, we first extract the AIS- and video-based vessel trajectories, and then propose a deep learning-enabled asynchronous trajectory matching method (named DeepSORVF) to fuse the AIS-based vessel information with the corresponding visual targets. In addition, by combining the AIS- and video-based movement features, we also present a prior knowledge-driven anti-occlusion method to yield accurate and robust vessel tracking results under occlusion conditions. To validate the efficacy of our DeepSORVF, we have also constructed a new benchmark dataset (termed FVessel) for vessel detection, tracking, and data fusion. It consists of many videos and the corresponding AIS data collected in various weather conditions and locations. The experimental results have demonstrated that our method is capable of guaranteeing high-reliable data fusion and anti-occlusion vessel tracking.

Title: Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning. (arXiv:2302.11344v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11344
Code URL: null
Copy Paste: [[2302.11344] Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning](http://arxiv.org/abs/2302.11344) #robust
Summary:
Humans excel at lifelong learning, as the brain has evolved to be robust to distribution shifts and noise in our ever-changing environment. Deep neural networks (DNNs), however, exhibit catastrophic forgetting and the learned representations drift drastically as they encounter a new task. This alludes to a different error-based learning mechanism in the brain. Unlike DNNs, where learning scales linearly with the magnitude of the error, the sensitivity to errors in the brain decreases as a function of their magnitude. To this end, we propose \textit{ESMER} which employs a principled mechanism to modulate error sensitivity in a dual-memory rehearsal-based system. Concretely, it maintains a memory of past errors and uses it to modify the learning dynamics so that the model learns more from small consistent errors compared to large sudden errors. We also propose \textit{Error-Sensitive Reservoir Sampling} to maintain episodic memory, which leverages the error history to pre-select low-loss samples as candidates for the buffer, which are better suited for retaining information. Empirical results show that ESMER effectively reduces forgetting and abrupt drift in representations at the task boundary by gradually adapting to the new task while consolidating knowledge. Remarkably, it also enables the model to learn under high levels of label noise, which is ubiquitous in real-world data streams.

Title: Steerable Equivariant Representation Learning. (arXiv:2302.11349v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11349
Code URL: null
Copy Paste: [[2302.11349] Steerable Equivariant Representation Learning](http://arxiv.org/abs/2302.11349) #robust
Summary:
Pre-trained deep image representations are useful for post-training tasks such as classification through transfer learning, image retrieval, and object detection. Data augmentations are a crucial aspect of pre-training robust representations in both supervised and self-supervised settings. Data augmentations explicitly or implicitly promote invariance in the embedding space to the input image transformations. This invariance reduces generalization to those downstream tasks which rely on sensitivity to these particular data augmentations. In this paper, we propose a method of learning representations that are instead equivariant to data augmentations. We achieve this equivariance through the use of steerable representations. Our representations can be manipulated directly in embedding space via learned linear maps. We demonstrate that our resulting steerable and equivariant representations lead to better performance on transfer learning and robustness: e.g. we improve linear probe top-1 accuracy by between 1% to 3% for transfer; and ImageNet-C accuracy by upto 3.4%. We further show that the steerability of our representations provides significant speedup (nearly 50x) for test-time augmentations; by applying a large number of augmentations for out-of-distribution detection, we significantly improve OOD AUC on the ImageNet-C dataset over an invariant representation.

Title: ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms. (arXiv:2302.11408v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11408
Code URL: https://github.com/ruoxi-jia-group/asset
Copy Paste: [[2302.11408] ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms](http://arxiv.org/abs/2302.11408) #robust
Summary:
Backdoor data detection is traditionally studied in an end-to-end supervised learning (SL) setting. However, recent years have seen the proliferating adoption of self-supervised learning (SSL) and transfer learning (TL), due to their lesser need for labeled data. Successful backdoor attacks have also been demonstrated in these new settings. However, we lack a thorough understanding of the applicability of existing detection methods across a variety of learning settings. By evaluating 56 attack settings, we show that the performance of most existing detection methods varies significantly across different attacks and poison ratios, and all fail on the state-of-the-art clean-label attack. In addition, they either become inapplicable or suffer large performance losses when applied to SSL and TL. We propose a new detection method called Active Separation via Offset (ASSET), which actively induces different model behaviors between the backdoor and clean samples to promote their separation. We also provide procedures to adaptively select the number of suspicious points to remove. In the end-to-end SL setting, ASSET is superior to existing methods in terms of consistency of defensive performance across different attacks and robustness to changes in poison ratios; in particular, it is the only method that can detect the state-of-the-art clean-label attack. Moreover, ASSET's average detection rates are higher than the best existing methods in SSL and TL, respectively, by 69.3% and 33.2%, thus providing the first practical backdoor defense for these new DL settings. We open-source the project to drive further development and encourage engagement: https://github.com/ruoxi-jia-group/ASSET.

Title: Distilling Calibrated Student from an Uncalibrated Teacher. (arXiv:2302.11472v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11472
Code URL: null
Copy Paste: [[2302.11472] Distilling Calibrated Student from an Uncalibrated Teacher](http://arxiv.org/abs/2302.11472) #robust
Summary:
Knowledge distillation is a common technique for improving the performance of a shallow student network by transferring information from a teacher network, which in general, is comparatively large and deep. These teacher networks are pre-trained and often uncalibrated, as no calibration technique is applied to the teacher model while training. Calibration of a network measures the probability of correctness for any of its predictions, which is critical in high-risk domains. In this paper, we study how to obtain a calibrated student from an uncalibrated teacher. Our approach relies on the fusion of the data-augmentation techniques, including but not limited to cutout, mixup, and CutMix, with knowledge distillation. We extend our approach beyond traditional knowledge distillation and find it suitable for Relational Knowledge Distillation and Contrastive Representation Distillation as well. The novelty of the work is that it provides a framework to distill a calibrated student from an uncalibrated teacher model without compromising the accuracy of the distilled student. We perform extensive experiments to validate our approach on various datasets, including CIFAR-10, CIFAR-100, CINIC-10 and TinyImageNet, and obtained calibrated student models. We also observe robust performance of our approach while evaluating it on corrupted CIFAR-100C data.

Title: Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition. (arXiv:2302.11566v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11566
Code URL: null
Copy Paste: [[2302.11566] Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition](http://arxiv.org/abs/2302.11566) #robust
Summary:
We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos. Reconstructing humans that move naturally from monocular in-the-wild videos is difficult. Solving it requires accurately separating humans from arbitrary backgrounds. Moreover, it requires reconstructing detailed 3D surface from short video sequences, making it even more challenging. Despite these challenges, our method does not require any groundtruth supervision or priors extracted from large datasets of clothed human scans, nor do we rely on any external segmentation modules. Instead, it solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly, parameterized via two separate neural fields. Specifically, we define a temporally consistent human representation in canonical space and formulate a global optimization over the background model, the canonical human shape and texture, and per-frame human pose parameters. A coarse-to-fine sampling strategy for volume rendering and novel objectives are introduced for a clean separation of dynamic human and static background, yielding detailed and robust 3D human geometry reconstructions. We evaluate our methods on publicly available datasets and show improvements over prior art.

Title: The Impact of Subword Pooling Strategy for Cross-lingual Event Detection. (arXiv:2302.11365v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2302.11365
Code URL: https://github.com/isi-boston/ed-pooling
Copy Paste: [[2302.11365] The Impact of Subword Pooling Strategy for Cross-lingual Event Detection](http://arxiv.org/abs/2302.11365) #robust
Summary:
Pre-trained multilingual language models (e.g., mBERT, XLM-RoBERTa) have significantly advanced the state-of-the-art for zero-shot cross-lingual information extraction. These language models ubiquitously rely on word segmentation techniques that break a word into smaller constituent subwords. Therefore, all word labeling tasks (e.g. named entity recognition, event detection, etc.), necessitate a pooling strategy that takes the subword representations as input and outputs a representation for the entire word. Taking the task of cross-lingual event detection as a motivating example, we show that the choice of pooling strategy can have a significant impact on the target language performance. For example, the performance varies by up to 16 absolute $f_{1}$ points depending on the pooling strategy when training in English and testing in Arabic on the ACE task. We carry out our analysis with five different pooling strategies across nine languages in diverse multi-lingual datasets. Across configurations, we find that the canonical strategy of taking just the first subword to represent the entire word is usually sub-optimal. On the other hand, we show that attention pooling is robust to language and dataset variations by being either the best or close to the optimal strategy. For reproducibility, we make our code available at https://github.com/isi-boston/ed-pooling.

Title: Eigen-informed NeuralODEs: Dealing with stability and convergence issues of NeuralODEs. (arXiv:2302.10892v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10892
Code URL: null
Copy Paste: [[2302.10892] Eigen-informed NeuralODEs: Dealing with stability and convergence issues of NeuralODEs](http://arxiv.org/abs/2302.10892) #robust
Summary:
Using vanilla NeuralODEs to model large and/or complex systems often fails due two reasons: Stability and convergence. NeuralODEs are capable of describing stable as well as instable dynamic systems. Selecting an appropriate numerical solver is not trivial, because NeuralODE properties change during training. If the NeuralODE becomes more stiff, a suboptimal solver may need to perform very small solver steps, which significantly slows down the training process. If the NeuralODE becomes to instable, the numerical solver might not be able to solve it at all, which causes the training process to terminate. Often, this is tackled by choosing a computational expensive solver that is robust to instable and stiff ODEs, but at the cost of a significantly decreased training performance. Our method on the other hand, allows to enforce ODE properties that fit a specific solver or application-related boundary conditions. Concerning the convergence behavior, NeuralODEs often tend to run into local minima, especially if the system to be learned is highly dynamic and/or oscillating over multiple periods. Because of the vanishing gradient at a local minimum, the NeuralODE is often not capable of leaving it and converge to the right solution. We present a technique to add knowledge of ODE properties based on eigenvalues - like (partly) stability, oscillation capability, frequency, damping and/or stiffness - to the training objective of a NeuralODE. We exemplify our method at a linear as well as a nonlinear system model and show, that the presented training process is far more robust against local minima, instabilities and sparse data samples and improves training convergence and performance.

Title: Multi-modal Machine Learning in Engineering Design: A Review and Future Directions. (arXiv:2302.10909v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10909
Code URL: null
Copy Paste: [[2302.10909] Multi-modal Machine Learning in Engineering Design: A Review and Future Directions](http://arxiv.org/abs/2302.10909) #robust
Summary:
Multi-modal machine learning (MMML), which involves integrating multiple modalities of data and their corresponding processing methods, has demonstrated promising results in various practical applications, such as text-to-image translation. This review paper summarizes the recent progress and challenges in using MMML for engineering design tasks. First, we introduce the different data modalities commonly used as design representations and involved in MMML, including text, 2D pixel data (e.g., images and sketches), and 3D shape data (e.g., voxels, point clouds, and meshes). We then provide an overview of the various approaches and techniques used for representing, fusing, aligning, synthesizing, and co-learning multi-modal data as five fundamental concepts of MMML. Next, we review the state-of-the-art capabilities of MMML that potentially apply to engineering design tasks, including design knowledge retrieval, design evaluation, and design synthesis. We also highlight the potential benefits and limitations of using MMML in these contexts. Finally, we discuss the challenges and future directions in using MMML for engineering design, such as the need for large labeled multi-modal design datasets, robust and scalable algorithms, integrating domain knowledge, and handling data heterogeneity and noise. Overall, this review paper provides a comprehensive overview of the current state and prospects of MMML for engineering design applications.

Title: Adversarial Model for Offline Reinforcement Learning. (arXiv:2302.11048v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11048
Code URL: null
Copy Paste: [[2302.11048] Adversarial Model for Offline Reinforcement Learning](http://arxiv.org/abs/2302.11048) #robust
Summary:
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. ARMOR is designed to optimize policies for the worst-case performance relative to the reference policy through adversarially training a Markov decision process model. In theory, we prove that ARMOR, with a well-tuned hyperparameter, can compete with the best policy within data coverage when the reference policy is supported by the data. At the same time, ARMOR is robust to hyperparameter choices: the policy learned by ARMOR, with "any" admissible hyperparameter, would never degrade the performance of the reference policy, even when the reference policy is not covered by the dataset. To validate these properties in practice, we design a scalable implementation of ARMOR, which by adversarial training, can optimize policies without using model ensembles in contrast to typical model-based methods. We show that ARMOR achieves competent performance with both state-of-the-art offline model-free and model-based RL algorithms and can robustly improve the reference policy over various hyperparameter choices.

Title: Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time. (arXiv:2302.11068v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11068
Code URL: null
Copy Paste: [[2302.11068] Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time](http://arxiv.org/abs/2302.11068) #robust
Summary:
Given a matrix $M\in \mathbb{R}^{m\times n}$, the low rank matrix completion problem asks us to find a rank-$k$ approximation of $M$ as $UV^\top$ for $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ by only observing a few entries masked by a binary matrix $P_{\Omega}\in {0, 1 }^{m\times n}$. As a particular instance of the weighted low rank approximation problem, solving low rank matrix completion is known to be computationally hard even to find an approximate solution [RSW16]. However, due to its practical importance, many heuristics have been proposed for this problem. In the seminal work of Jain, Netrapalli, and Sanghavi [JNS13], they show that the alternating minimization framework provides provable guarantees for low rank matrix completion problem whenever $M$ admits an incoherent low rank factorization. Unfortunately, their algorithm requires solving two exact multiple response regressions per iteration and their analysis is non-robust as they exploit the structure of the exact solution.

In this paper, we take a major step towards a more efficient and robust alternating minimization framework for low rank matrix completion. Our main result is a robust alternating minimization algorithm that can tolerate moderate errors even though the regressions are solved approximately. Consequently, we also significantly improve the running time of [JNS13] from $\widetilde{O}(mnk^2 )$ to $\widetilde{O}(mnk )$ which is nearly linear in the problem size, as verifying the low rank approximation takes $O(mnk)$ time. Our core algorithmic building block is a high accuracy regression solver that solves the regression in nearly linear time per iteration.

Title: Semi-Supervised Approach for Early Stuck Sign Detection in Drilling Operations. (arXiv:2302.11135v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11135
Code URL: null
Copy Paste: [[2302.11135] Semi-Supervised Approach for Early Stuck Sign Detection in Drilling Operations](http://arxiv.org/abs/2302.11135) #robust
Summary:
A real-time stuck pipe prediction methodology is proposed in this paper. We assume early signs of stuck pipe to be apparent when the drilling data behavior deviates from that from normal drilling operations. The definition of normalcy changes with drill string configuration or geological conditions. Here, a depth-domain data representation is adopted to capture the localized normal behavior. Several models, based on auto-encoder and variational auto-encoders, are trained on regular drilling data extracted from actual drilling data. When the trained model is applied to data sets before stuck incidents, eight incidents showed large reconstruction errors. These results suggest better performance than the previously reported supervised approach. Inter-comparison of various models reveals the robustness of our approach. The model performance depends on the featured parameter suggesting the need for multiple models in actual operation.

Title: What Are Effective Labels for Augmented Data? Improving Calibration and Robustness with AutoLabel. (arXiv:2302.11188v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11188
Code URL: null
Copy Paste: [[2302.11188] What Are Effective Labels for Augmented Data? Improving Calibration and Robustness with AutoLabel](http://arxiv.org/abs/2302.11188) #robust
Summary:
A wide breadth of research has devised data augmentation approaches that can improve both accuracy and generalization performance for neural networks. However, augmented data can end up being far from the clean training data and what is the appropriate label is less clear. Despite this, most existing work simply uses one-hot labels for augmented data. In this paper, we show re-using one-hot labels for highly distorted data might run the risk of adding noise and degrading accuracy and calibration. To mitigate this, we propose a generic method AutoLabel to automatically learn the confidence in the labels for augmented data, based on the transformation distance between the clean distribution and augmented distribution. AutoLabel is built on label smoothing and is guided by the calibration-performance over a hold-out validation set. We successfully apply AutoLabel to three different data augmentation techniques: the state-of-the-art RandAug, AugMix, and adversarial training. Experiments on CIFAR-10, CIFAR-100 and ImageNet show that AutoLabel significantly improves existing data augmentation techniques over models' calibration and accuracy, especially under distributional shift.

Title: Distributionally Robust Recourse Action. (arXiv:2302.11211v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11211
Code URL: https://github.com/duykhuongnguyen/dirrac
Copy Paste: [[2302.11211] Distributionally Robust Recourse Action](http://arxiv.org/abs/2302.11211) #robust
Summary:
A recourse action aims to explain a particular algorithmic decision by showing one specific way in which the instance could be modified to receive an alternate outcome. Existing recourse generation methods often assume that the machine learning model does not change over time. However, this assumption does not always hold in practice because of data distribution shifts, and in this case, the recourse action may become invalid. To redress this shortcoming, we propose the Distributionally Robust Recourse Action (DiRRAc) framework, which generates a recourse action that has a high probability of being valid under a mixture of model shifts. We formulate the robustified recourse setup as a min-max optimization problem, where the max problem is specified by Gelbrich distance over an ambiguity set around the distribution of model parameters. Then we suggest a projected gradient descent algorithm to find a robust recourse according to the min-max objective. We show that our DiRRAc framework can be extended to hedge against the misspecification of the mixture weights. Numerical experiments with both synthetic and three real-world datasets demonstrate the benefits of our proposed framework over state-of-the-art recourse methods.

Title: Robust and Explainable Contextual Anomaly Detection using Quantile Regression Forests. (arXiv:2302.11239v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11239
Code URL: https://github.com/zhonglifr/qcad
Copy Paste: [[2302.11239] Robust and Explainable Contextual Anomaly Detection using Quantile Regression Forests](http://arxiv.org/abs/2302.11239) #robust
Summary:
Traditional anomaly detection methods aim to identify objects that deviate from most other objects by treating all features equally. In contrast, contextual anomaly detection methods aim to detect objects that deviate from other objects within a context of similar objects by dividing the features into contextual features and behavioral features. In this paper, we develop connections between dependency-based traditional anomaly detection methods and contextual anomaly detection methods. Based on resulting insights, we propose a novel approach to robust and inherently interpretable contextual anomaly detection that uses Quantile Regression Forests to model dependencies between features. Extensive experiments on various synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art anomaly detection methods in identifying contextual anomalies in terms of accuracy and robustness.

Title: Learning Dynamic Graph Embeddings with Neural Controlled Differential Equations. (arXiv:2302.11354v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11354
Code URL: null
Copy Paste: [[2302.11354] Learning Dynamic Graph Embeddings with Neural Controlled Differential Equations](http://arxiv.org/abs/2302.11354) #robust
Summary:
This paper focuses on representation learning for dynamic graphs with temporal interactions. A fundamental issue is that both the graph structure and the nodes own their own dynamics, and their blending induces intractable complexity in the temporal evolution over graphs. Drawing inspiration from the recent process of physical dynamic models in deep neural networks, we propose Graph Neural Controlled Differential Equation (GN-CDE) model, a generic differential model for dynamic graphs that characterise the continuously dynamic evolution of node embedding trajectories with a neural network parameterised vector field and the derivatives of interactions w.r.t. time. Our framework exhibits several desirable characteristics, including the ability to express dynamics on evolving graphs without integration by segments, the capability to calibrate trajectories with subsequent data, and robustness to missing observations. Empirical evaluation on a range of dynamic graph representation learning tasks demonstrates the superiority of our proposed approach compared to the baselines.

Title: Delving into Identify-Emphasize Paradigm for Combating Unknown Bias. (arXiv:2302.11414v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11414
Code URL: null
Copy Paste: [[2302.11414] Delving into Identify-Emphasize Paradigm for Combating Unknown Bias](http://arxiv.org/abs/2302.11414) #robust
Summary:
Dataset biases are notoriously detrimental to model robustness and generalization. The identify-emphasize paradigm appears to be effective in dealing with unknown biases. However, we discover that it is still plagued by two challenges: A, the quality of the identified bias-conflicting samples is far from satisfactory; B, the emphasizing strategies only produce suboptimal performance. In this paper, for challenge A, we propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy, along with two practical strategies -- peer-picking and epoch-ensemble. For challenge B, we point out that the gradient contribution statistics can be a reliable indicator to inspect whether the optimization is dominated by bias-aligned samples. Then, we propose gradient alignment (GA), which employs gradient statistics to balance the contributions of the mined bias-aligned and bias-conflicting samples dynamically throughout the learning process, forcing models to leverage intrinsic features to make fair decisions. Furthermore, we incorporate self-supervised (SS) pretext tasks into training, which enable models to exploit richer features rather than the simple shortcuts, resulting in more robust models. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases and achieve state-of-the-art performance.

biometric

steal

extraction

Title: Deep Kernel Principal Component Analysis for Multi-level Feature Learning. (arXiv:2302.11220v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11220
Code URL: null
Copy Paste: [[2302.11220] Deep Kernel Principal Component Analysis for Multi-level Feature Learning](http://arxiv.org/abs/2302.11220) #extraction
Summary:
Principal Component Analysis (PCA) and its nonlinear extension Kernel PCA (KPCA) are widely used across science and industry for data analysis and dimensionality reduction. Modern deep learning tools have achieved great empirical success, but a framework for deep principal component analysis is still lacking. Here we develop a deep kernel PCA methodology (DKPCA) to extract multiple levels of the most informative components of the data. Our scheme can effectively identify new hierarchical variables, called deep principal components, capturing the main characteristics of high-dimensional data through a simple and interpretable numerical optimization. We couple the principal components of multiple KPCA levels, theoretically showing that DKPCA creates both forward and backward dependency across levels, which has not been explored in kernel methods and yet is crucial to extract more informative features. Various experimental evaluations on multiple data types show that DKPCA finds more efficient and disentangled representations with higher explained variance in fewer principal components, compared to the shallow KPCA. We demonstrate that our method allows for effective hierarchical data exploration, with the ability to separate the key generative factors of the input data both for large datasets and when few training samples are available. Overall, DKPCA can facilitate the extraction of useful patterns from high-dimensional data by learning more informative features organized in different levels, giving diversified aspects to explore the variation factors in the data, while maintaining a simple mathematical formulation.

membership infer

federate

Title: Semi-decentralized Federated Ego Graph Learning for Recommendation. (arXiv:2302.10900v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10900
Code URL: null
Copy Paste: [[2302.10900] Semi-decentralized Federated Ego Graph Learning for Recommendation](http://arxiv.org/abs/2302.10900) #federate
Summary:
Collaborative filtering (CF) based recommender systems are typically trained based on personal interaction data (e.g., clicks and purchases) that could be naturally represented as ego graphs. However, most existing recommendation methods collect these ego graphs from all users to compose a global graph to obtain high-order collaborative information between users and items, and these centralized CF recommendation methods inevitably lead to a high risk of user privacy leakage. Although recently proposed federated recommendation systems can mitigate the privacy problem, they either restrict the on-device local training to an isolated ego graph or rely on an additional third-party server to access other ego graphs resulting in a cumbersome pipeline, which is hard to work in practice. In addition, existing federated recommendation systems require resource-limited devices to maintain the entire embedding tables resulting in high communication costs.

In light of this, we propose a semi-decentralized federated ego graph learning framework for on-device recommendations, named SemiDFEGL, which introduces new device-to-device collaborations to improve scalability and reduce communication costs and innovatively utilizes predicted interacted item nodes to connect isolated ego graphs to augment local subgraphs such that the high-order user-item collaborative information could be used in a privacy-preserving manner. Furthermore, the proposed framework is model-agnostic, meaning that it could be seamlessly integrated with existing graph neural network-based recommendation methods and privacy protection techniques. To validate the effectiveness of the proposed SemiDFEGL, extensive experiments are conducted on three public datasets, and the results demonstrate the superiority of the proposed SemiDFEGL compared to other federated recommendation methods.

Title: Revisiting Weighted Aggregation in Federated Learning with Neural Networks. (arXiv:2302.10911v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10911
Code URL: null
Copy Paste: [[2302.10911] Revisiting Weighted Aggregation in Federated Learning with Neural Networks](http://arxiv.org/abs/2302.10911) #federate
Summary:
In federated learning (FL), weighted aggregation of local models is conducted to generate a global model, and the aggregation weights are normalized (the sum of weights is 1) and proportional to the local data sizes. In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL. First, we find that the sum of weights can be smaller than 1, causing global weight shrinking effect (analogous to weight decay) and improving generalization. We explore how the optimal shrinking factor is affected by clients' data heterogeneity and local epochs. Second, we dive into the relative aggregation weights among clients to depict the clients' importance. We develop client coherence to study the learning dynamics and find a critical point that exists. Before entering the critical point, more coherent clients play more essential roles in generalization. Based on the above insights, we propose an effective method for Federated Learning with Learnable Aggregation Weights, named as FedLAW. Extensive experiments verify that our method can improve the generalization of the global model by a large margin on different datasets and models.

Title: Fusion of Global and Local Knowledge for Personalized Federated Learning. (arXiv:2302.11051v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11051
Code URL: https://github.com/huangtiansheng/fedslr
Copy Paste: [[2302.11051] Fusion of Global and Local Knowledge for Personalized Federated Learning](http://arxiv.org/abs/2302.11051) #federate
Summary:
Personalized federated learning, as a variant of federated learning, trains customized models for clients using their heterogeneously distributed data. However, it is still inconclusive about how to design personalized models with better representation of shared global knowledge and personalized pattern. To bridge the gap, we in this paper explore personalized models with low-rank and sparse decomposition. Specifically, we employ proper regularization to extract a low-rank global knowledge representation (GKR), so as to distill global knowledge into a compact representation. Subsequently, we employ a sparse component over the obtained GKR to fuse the personalized pattern into the global knowledge. As a solution, we propose a two-stage proximal-based algorithm named \textbf{Fed}erated learning with mixed \textbf{S}parse and \textbf{L}ow-\textbf{R}ank representation (FedSLR) to efficiently search for the mixed models. Theoretically, under proper assumptions, we show that the GKR trained by FedSLR can at least sub-linearly converge to a stationary point of the regularized problem, and that the sparse component being fused can converge to its stationary point under proper settings. Extensive experiments also demonstrate the superior empirical performance of FedSLR. Moreover, FedSLR reduces the number of parameters, and lowers the down-link communication complexity, which are all desirable for federated learning algorithms. Source code is available in \url{https://github.com/huangtiansheng/fedslr}.

Title: Efficient Training of Large-scale Industrial Fault Diagnostic Models through Federated Opportunistic Block Dropout. (arXiv:2302.11485v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11485
Code URL: null
Copy Paste: [[2302.11485] Efficient Training of Large-scale Industrial Fault Diagnostic Models through Federated Opportunistic Block Dropout](http://arxiv.org/abs/2302.11485) #federate
Summary:
Artificial intelligence (AI)-empowered industrial fault diagnostics is important in ensuring the safe operation of industrial applications. Since complex industrial systems often involve multiple industrial plants (possibly belonging to different companies or subsidiaries) with sensitive data collected and stored in a distributed manner, collaborative fault diagnostic model training often needs to leverage federated learning (FL). As the scale of the industrial fault diagnostic models are often large and communication channels in such systems are often not exclusively used for FL model training, existing deployed FL model training frameworks cannot train such models efficiently across multiple institutions. In this paper, we report our experience developing and deploying the Federated Opportunistic Block Dropout (FEDOBD) approach for industrial fault diagnostic model training. By decomposing large-scale models into semantic blocks and enabling FL participants to opportunistically upload selected important blocks in a quantized manner, it significantly reduces the communication overhead while maintaining model performance. Since its deployment in ENN Group in February 2022, FEDOBD has served two coal chemical plants across two cities in China to build industrial fault prediction models. It helped the company reduce the training communication overhead by over 70% compared to its previous AI Engine, while maintaining model performance at over 85% test F1 score. To our knowledge, it is the first successfully deployed dropout-based FL approach.

fair

Title: Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness. (arXiv:2302.10893v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10893
Code URL: null
Copy Paste: [[2302.10893] Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness](http://arxiv.org/abs/2302.10893) #fair
Summary:
Generative AI models have recently achieved astonishing results in quality and are consequently employed in a fast-growing number of applications. However, since they are highly data-driven, relying on billion-sized datasets randomly scraped from the internet, they also suffer from degenerated and biased human behavior, as we demonstrate. In fact, they may even reinforce such biases. To not only uncover but also combat these undesired effects, we present a novel strategy, called Fair Diffusion, to attenuate biases after the deployment of generative text-to-image models. Specifically, we demonstrate shifting a bias, based on human instructions, in any direction yielding arbitrarily new proportions for, e.g., identity groups. As our empirical evaluation demonstrates, this introduced control enables instructing generative image models on fairness, with no data filtering and additional training required.

Title: Towards End-to-end Semi-supervised Learning for One-stage Object Detection. (arXiv:2302.11299v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11299
Code URL: https://github.com/luogen1996/oneteacher
Copy Paste: [[2302.11299] Towards End-to-end Semi-supervised Learning for One-stage Object Detection](http://arxiv.org/abs/2302.11299) #fair
Summary:
Semi-supervised object detection (SSOD) is a research hot spot in computer vision, which can greatly reduce the requirement for expensive bounding-box annotations. Despite great success, existing progress mainly focuses on two-stage detection networks like FasterRCNN, while the research on one-stage detectors is often ignored. In this paper, we focus on the semi-supervised learning for the advanced and popular one-stage detection network YOLOv5. Compared with Faster-RCNN, the implementation of YOLOv5 is much more complex, and the various training techniques used in YOLOv5 can also reduce the benefit of SSOD. In addition to this challenge, we also reveal two key issues in one-stage SSOD, which are low-quality pseudo-labeling and multi-task optimization conflict, respectively. To address these issues, we propose a novel teacher-student learning recipe called OneTeacher with two innovative designs, namely Multi-view Pseudo-label Refinement (MPR) and Decoupled Semi-supervised Optimization (DSO). In particular, MPR improves the quality of pseudo-labels via augmented-view refinement and global-view filtering, and DSO handles the joint optimization conflicts via structure tweaks and task-specific pseudo-labeling. In addition, we also carefully revise the implementation of YOLOv5 to maximize the benefits of SSOD, which is also shared with the existing SSOD methods for fair comparison. To validate OneTeacher, we conduct extensive experiments on COCO and Pascal VOC. The extensive experiments show that OneTeacher can not only achieve superior performance than the compared methods, e.g., 15.0% relative AP gains over Unbiased Teacher, but also well handle the key issues in one-stage SSOD. Our source code is available at: https://github.com/luogen1996/OneTeacher.

Title: Uncovering Bias in Face Generation Models. (arXiv:2302.11562v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11562
Code URL: null
Copy Paste: [[2302.11562] Uncovering Bias in Face Generation Models](http://arxiv.org/abs/2302.11562) #fair
Summary:
Recent advancements in GANs and diffusion models have enabled the creation of high-resolution, hyper-realistic images. However, these models may misrepresent certain social groups and present bias. Understanding bias in these models remains an important research question, especially for tasks that support critical decision-making and could affect minorities. The contribution of this work is a novel analysis covering architectures and embedding spaces for fine-grained understanding of bias over three approaches: generators, attribute modifier, and post-processing bias mitigators. This work shows that generators suffer from bias across all social groups with attribute preferences such as between 75%-85% for whiteness and 60%-80% for the female gender (for all trained CelebA models) and low probabilities of generating children and older men. Modifier and mitigators work as post-processor and change the generator performance. For instance, attribute channel perturbation strategies modify the embedding spaces. We quantify the influence of this change on group fairness by measuring the impact on image quality and group features. Specifically, we use the Fr\'echet Inception Distance (FID), the Face Matching Error and the Self-Similarity score. For Interfacegan, we analyze one and two attribute channel perturbations and examine the effect on the fairness distribution and the quality of the image. Finally, we analyzed the post-processing bias mitigators, which are the fastest and most computationally efficient way to mitigate bias. We find that these mitigation techniques show similar results on KL divergence and FID score, however, self-similarity scores show a different feature concentration on the new groups of the data distribution. The weaknesses and ongoing challenges described in this work must be considered in the pursuit of creating fair and unbiased face generation models.

Title: Towards a responsible machine learning approach to identify forced labor in fisheries. (arXiv:2302.10987v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10987
Code URL: null
Copy Paste: [[2302.10987] Towards a responsible machine learning approach to identify forced labor in fisheries](http://arxiv.org/abs/2302.10987) #fair
Summary:
Many fishing vessels use forced labor, but identifying vessels that engage in this practice is challenging because few are regularly inspected. We developed a positive-unlabeled learning algorithm using vessel characteristics and movement patterns to estimate an upper bound of the number of positive cases of forced labor, with the goal of helping make accurate, responsible, and fair decisions. 89% of the reported cases of forced labor were correctly classified as positive (recall) while 98% of the vessels certified as having decent working conditions were correctly classified as negative. The recall was high for vessels from different regions using different gears, except for trawlers. We found that as much as ~28% of vessels may operate using forced labor, with the fraction much higher in squid jiggers and longlines. This model could inform risk-based port inspections as part of a broader monitoring, control, and surveillance regime to reduce forced labor.

* Translated versions of the English title and abstract are available in five languages in S1 Text: Spanish, French, Simplified Chinese, Traditional Chinese, and Indonesian.

Title: Fair Correlation Clustering in Forests. (arXiv:2302.11295v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11295
Code URL: null
Copy Paste: [[2302.11295] Fair Correlation Clustering in Forests](http://arxiv.org/abs/2302.11295) #fair
Summary:
The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented.

We discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view.

While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable. The most surprising insight to us is the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition.

Title: Drop Edges and Adapt: a Fairness Enforcing Fine-tuning for Graph Neural Networks. (arXiv:2302.11479v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11479
Code URL: null
Copy Paste: [[2302.11479] Drop Edges and Adapt: a Fairness Enforcing Fine-tuning for Graph Neural Networks](http://arxiv.org/abs/2302.11479) #fair
Summary:
The rise of graph representation learning as the primary solution for many different network science tasks led to a surge of interest in the fairness of this family of methods. Link prediction, in particular, has a substantial social impact. However, link prediction algorithms tend to increase the segregation in social networks by disfavoring the links between individuals in specific demographic groups. This paper proposes a novel way to enforce fairness on graph neural networks with a fine-tuning strategy. We Drop the unfair Edges and, simultaneously, we Adapt the model's parameters to those modifications, DEA in short. We introduce two covariance-based constraints designed explicitly for the link prediction task. We use these constraints to guide the optimization process responsible for learning the new "fair" adjacency matrix. One novelty of DEA is that we can use a discrete yet learnable adjacency matrix in our fine-tuning. We demonstrate the effectiveness of our approach on five real-world datasets and show that we can improve both the accuracy and the fairness of the link prediction tasks. In addition, we present an in-depth ablation study demonstrating that our training algorithm for the adjacency matrix can be used to improve link prediction performances during training. Finally, we compute the relevance of each component of our framework to show that the combination of both the constraints and the training of the adjacency matrix leads to optimal performances.

interpretability

Title: Benchmarking Interpretability Tools for Deep Neural Networks. (arXiv:2302.10894v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10894
Code URL: null
Copy Paste: [[2302.10894] Benchmarking Interpretability Tools for Deep Neural Networks](http://arxiv.org/abs/2302.10894) #interpretability
Summary:
Interpreting deep neural networks is the topic of much current research in AI. However, few interpretability techniques have shown to be competitive tools in practical applications. Inspired by how benchmarks tend to guide progress in AI, we make three contributions. First, we propose trojan rediscovery as a benchmarking task to evaluate how useful interpretability tools are for generating engineering-relevant insights. Second, we design two such approaches for benchmarking: one for feature attribution methods and one for feature synthesis methods. Third, we apply our benchmarks to evaluate 16 feature attribution/saliency methods and 9 feature synthesis methods. This approach finds large differences in the capabilities of these existing tools and shows significant room for improvement. Finally, we propose several directions for future work. Resources are available at https://github.com/thestephencasper/benchmarking_interpretability

Title: GLUECons: A Generic Benchmark for Learning Under Constraints. (arXiv:2302.10914v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10914
Code URL: null
Copy Paste: [[2302.10914] GLUECons: A Generic Benchmark for Learning Under Constraints](http://arxiv.org/abs/2302.10914) #interpretability
Summary:
Recent research has shown that integrating domain knowledge into deep learning architectures is effective -- it helps reduce the amount of required data, improves the accuracy of the models' decisions, and improves the interpretability of models. However, the research community is missing a convened benchmark for systematically evaluating knowledge integration methods. In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision. In all cases, we model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints. We report the results of these models using a new set of extended evaluation criteria in addition to the task performances for a more in-depth analysis. This effort provides a framework for a more comprehensive and systematic comparison of constraint integration techniques and for identifying related research challenges. It will facilitate further research for alleviating some problems of state-of-the-art neural models.

explainability

Title: Framework for Certification of AI-Based Systems. (arXiv:2302.11049v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11049
Code URL: null
Copy Paste: [[2302.11049] Framework for Certification of AI-Based Systems](http://arxiv.org/abs/2302.11049) #explainability
Summary:
The current certification process for aerospace software is not adapted to "AI-based" algorithms such as deep neural networks. Unlike traditional aerospace software, the precise parameters optimized during neural network training are as important as (or more than) the code processing the network and they are not directly mathematically understandable. Despite their lack of explainability such algorithms are appealing because for some applications they can exhibit high performance unattainable with any traditional explicit line-by-line software methods.

This paper proposes a framework and principles that could be used to establish certification methods for neural network models for which the current certification processes such as DO-178 cannot be applied. While it is not a magic recipe, it is a set of common sense steps that will allow the applicant and the regulator increase their confidence in the developed software, by demonstrating the capabilities to bring together, trace, and track the requirements, data, software, training process, and test results.

watermark

Title: Visual Watermark Removal Based on Deep Learning. (arXiv:2302.11338v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11338
Code URL: null
Copy Paste: [[2302.11338] Visual Watermark Removal Based on Deep Learning](http://arxiv.org/abs/2302.11338) #watermark
Summary:
In recent years as the internet age continues to grow, sharing images on social media has become a common occurrence. In certain cases, watermarks are used as protection for the ownership of the image, however, in more cases, one may wish to remove these watermark images to get the original image without obscuring. In this work, we proposed a deep learning method based technique for visual watermark removal. Inspired by the strong image translation performance of the U-structure, an end-to-end deep neural network model named AdvancedUnet is proposed to extract and remove the visual watermark simultaneously. On the other hand, we embed some effective RSU module instead of the common residual block used in UNet, which increases the depth of the whole architecture without significantly increasing the computational cost. The deep-supervised hybrid loss guides the network to learn the transformation between the input image and the ground truth in a multi-scale and three-level hierarchy. Comparison experiments demonstrate the effectiveness of our method.

Title: Saliency detection and quantization index modulation based high payload HDR image watermarking. (arXiv:2302.11361v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11361
Code URL: null
Copy Paste: [[2302.11361] Saliency detection and quantization index modulation based high payload HDR image watermarking](http://arxiv.org/abs/2302.11361) #watermark
Summary:
High-dynamic range (HDR) images are circulated rapidly over the internet with risks of being exploited for unauthenticated usage. To protect these images, some HDR image based watermarking (HDR-IW) methods were put forward. However, they inherited the same problem faced by conventional IW methods for standard dynamic range (SDR) images, where only trade-offs among conflicting requirements are managed instead of simultaneous improvement. In this paper, a novel saliency (eye-catching object) detection based trade-off independent HDR-IW is proposed, to simultaneously improve robustness, imperceptibility and payload capacity. First, the host image goes through our proposed salient object detection model to produce a saliency map, which is, in turn, exploited to segment the foreground and background of the host image. Next, binary watermark is partitioned into the foregrounds and backgrounds using the same mask and scrambled using the random permutation algorithm. Finally, the watermark segments are embedded into the corresponding host segments (i.e., selected bit-plane) using quantized indexed modulation. Experimental results suggest that the proposed work outperforms state-of-the-art methods in terms of improving the conflicting requirements.

diffusion

Title: Entity-Level Text-Guided Image Manipulation. (arXiv:2302.11383v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2302.11383
Code URL: null
Copy Paste: [[2302.11383] Entity-Level Text-Guided Image Manipulation](http://arxiv.org/abs/2302.11383) #diffusion
Summary:
Existing text-guided image manipulation methods aim to modify the appearance of the image or to edit a few objects in a virtual or simple scenario, which is far from practical applications. In this work, we study a novel task on text-guided image manipulation on the entity level in the real world (eL-TGIM). The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the entity-irrelevant regions, and (3) to merge the manipulated entity into the image naturally. To this end, we propose an elegant framework, dubbed as SeMani, forming the Semantic Manipulation of real-world images that can not only edit the appearance of entities but also generate new entities corresponding to the text guidance. To solve eL-TGIM, SeMani decomposes the task into two phases: the semantic alignment phase and the image manipulation phase. In the semantic alignment phase, SeMani incorporates a semantic alignment module to locate the entity-relevant region to be manipulated. In the image manipulation phase, SeMani adopts a generative model to synthesize new images conditioned on the entity-irrelevant regions and target text descriptions. We discuss and propose two popular generation processes that can be utilized in SeMani, the discrete auto-regressive generation with transformers and the continuous denoising generation with diffusion models, yielding SeMani-Trans and SeMani-Diff, respectively. We conduct extensive experiments on the real datasets CUB, Oxford, and COCO datasets to verify that SeMani can distinguish the entity-relevant and -irrelevant regions and achieve more precise and flexible manipulation in a zero-shot manner compared with baseline methods. Our codes and models will be released at https://github.com/Yikai-Wang/SeMani.

Title: Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC. (arXiv:2302.11552v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11552
Code URL: null
Copy Paste: [[2302.11552] Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC](http://arxiv.org/abs/2302.11552) #diffusion
Summary:
Since their introduction, diffusion models have quickly become the prevailing approach to generative modeling in many domains. They can be interpreted as learning the gradients of a time-varying sequence of log-probability density functions. This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance. In particular, we investigate why certain types of composition fail using current techniques and present a number of solutions. We conclude that the sampler (not the model) is responsible for this failure and propose new samplers, inspired by MCMC, which enable successful compositional generation. Further, we propose an energy-based parameterization of diffusion models which enables the use of new compositional operators and more sophisticated, Metropolis-corrected samplers. Intriguingly we find these samplers lead to notable improvements in compositional generation across a wide set of problems such as classifier-guided ImageNet modeling and compositional text-to-image generation.

Title: Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action. (arXiv:2302.10907v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10907
Code URL: null
Copy Paste: [[2302.10907] Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action](http://arxiv.org/abs/2302.10907) #diffusion
Summary:
Denoising diffusion models have emerged as one of the most powerful generative models in recent years. They have achieved remarkable success in many fields, such as computer vision, natural language processing (NLP), and bioinformatics. Although there are a few excellent reviews on diffusion models and their applications in computer vision and NLP, there is a lack of an overview of their applications in bioinformatics. This review aims to provide a rather thorough overview of the applications of diffusion models in bioinformatics to aid their further development in bioinformatics and computational biology. We start with an introduction of the key concepts and theoretical foundations of three cornerstone diffusion modeling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks, and stochastic differential equations), followed by a comprehensive description of diffusion models employed in the different domains of bioinformatics, including cryo-EM data enhancement, single-cell data analysis, protein design and generation, drug and small molecule design, and protein-ligand interaction. The review is concluded with a summary of the potential new development and applications of diffusion models in bioinformatics.

Title: From paintbrush to pixel: A review of deep neural networks in AI-generated art. (arXiv:2302.10913v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.10913
Code URL: null
Copy Paste: [[2302.10913] From paintbrush to pixel: A review of deep neural networks in AI-generated art](http://arxiv.org/abs/2302.10913) #diffusion
Summary:
This paper delves into the fascinating field of AI-generated art and explores the various deep neural network architectures and models that have been utilized to create it. From the classic convolutional networks to the cutting-edge diffusion models, we examine the key players in the field. We explain the general structures and working principles of these neural networks. Then, we showcase examples of milestones, starting with the dreamy landscapes of DeepDream and moving on to the most recent developments, including Stable Diffusion and DALL-E 2, which produce mesmerizing images. A detailed comparison of these models is provided, highlighting their strengths and limitations. Thus, we examine the remarkable progress that deep neural networks have made so far in a short period of time. With a unique blend of technical explanations and insights into the current state of AI-generated art, this paper exemplifies how art and computer science interact.

Title: Aligned Diffusion Schr\"odinger Bridges. (arXiv:2302.11419v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2302.11419
Code URL: null
Copy Paste: [[2302.11419] Aligned Diffusion Schr\"odinger Bridges](http://arxiv.org/abs/2302.11419) #diffusion
Summary:
Diffusion Schr\"odinger bridges (DSB) have recently emerged as a powerful framework for recovering stochastic dynamics via their marginal observations at different time points. Despite numerous successful applications, existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data, which naturally arises in many biological phenomena. In this paper, we propose a novel algorithmic framework that, for the first time, solves DSBs while respecting the data alignment. Our approach hinges on a combination of two decades-old ideas: The classical Schr\"odinger bridge theory and Doob's $h$-transform. Compared to prior methods, our approach leads to a simpler training procedure with lower variance, which we further augment with principled regularization schemes. This ultimately leads to sizeable improvements across experiments on synthetic and real data, including the tasks of rigid protein docking and temporal evolution of cellular differentiation processes.