secure

Title: A detailed review of blockchain and cryptocurrency. (arXiv:2303.06008v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.06008
Code URL: null
Copy Paste: [[2303.06008] A detailed review of blockchain and cryptocurrency](http://arxiv.org/abs/2303.06008) #secure
Summary:
Cryptocurrency is something that we have all heard about recently, most likely preceded by bitcoin, and how much its prices have boomed over the decade. These cryptocurrencies are actually based on blockchain, a secure datatype, and recently popular form of technology. This paper gives a detailed review about the concept of blockchain and its potential applications, especially elaborating on cryptocurrency, and it also contains a detailed case study of blockchain Dubai.

security

Title: ICStega: Image Captioning-based Semantically Controllable Linguistic Steganography. (arXiv:2303.05830v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.05830
Code URL: null
Copy Paste: [[2303.05830] ICStega: Image Captioning-based Semantically Controllable Linguistic Steganography](http://arxiv.org/abs/2303.05830) #security
Summary:
Nowadays, social media has become the preferred communication platform for web users but brought security threats. Linguistic steganography hides secret data into text and sends it to the intended recipient to realize covert communication. Compared to edit-based linguistic steganography, generation-based approaches largely improve the payload capacity. However, existing methods can only generate stego text alone. Another common behavior in social media is sending semantically related image-text pairs. In this paper, we put forward a novel image captioning-based stegosystem, where the secret messages are embedded into the generated captions. Thus, the semantics of the stego text can be controlled and the secret data can be transmitted by sending semantically related image-text pairs. To balance the conflict between payload capacity and semantic preservation, we proposed a new sampling method called Two-Parameter Semantic Control Sampling to cutoff low-probability words. Experimental results have shown that our method can control diversity, payload capacity, security, and semantic accuracy at the same time.

privacy

Title: Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers. (arXiv:2303.05691v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05691
Code URL: null
Copy Paste: [[2303.05691] Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers](http://arxiv.org/abs/2303.05691) #privacy
Summary:
Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution for pose estimation from ambiguous pressure data. Our method comprises a spatio-temporal vision transformer with an encoder-decoder architecture. Detailed experiments on two popular public datasets reveal that our model outperforms existing solutions in the area. Moreover, we observe that increasing the number of temporal crops in the early stages of the network positively impacts the performance while pre-training the network in a self-supervised setting using a masked auto-encoder approach also further improves the results.

protect

defense

attack

Title: Boosting Adversarial Attacks by Leveraging Decision Boundary Information. (arXiv:2303.05719v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05719
Code URL: null
Copy Paste: [[2303.05719] Boosting Adversarial Attacks by Leveraging Decision Boundary Information](http://arxiv.org/abs/2303.05719) #attack
Summary:
Due to the gap between a substitute model and a victim model, the gradient-based noise generated from a substitute model may have low transferability for a victim model since their gradients are different. Inspired by the fact that the decision boundaries of different models do not differ much, we conduct experiments and discover that the gradients of different models are more similar on the decision boundary than in the original position. Moreover, since the decision boundary in the vicinity of an input image is flat along most directions, we conjecture that the boundary gradients can help find an effective direction to cross the decision boundary of the victim models. Based on it, we propose a Boundary Fitting Attack to improve transferability. Specifically, we introduce a method to obtain a set of boundary points and leverage the gradient information of these points to update the adversarial examples. Notably, our method can be combined with existing gradient-based methods. Extensive experiments prove the effectiveness of our method, i.e., improving the success rate by 5.6% against normally trained CNNs and 14.9% against defense CNNs on average compared to state-of-the-art transfer-based attacks. Further we compare transformers with CNNs, the results indicate that transformers are more robust than CNNs. However, our method still outperforms existing methods when attacking transformers. Specifically, when using CNNs as substitute models, our method obtains an average attack success rate of 58.2%, which is 10.8% higher than other state-of-the-art transfer-based attacks.

Title: TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets. (arXiv:2303.05762v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05762
Code URL: https://github.com/chenweixin107/trojdiff
Copy Paste: [[2303.05762] TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets](http://arxiv.org/abs/2303.05762) #attack
Summary:
Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.

Title: Exploring Adversarial Attacks on Neural Networks: An Explainable Approach. (arXiv:2303.06032v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.06032
Code URL: https://github.com/justusren/exploring-adversarial-attacks-on-neural-networks
Copy Paste: [[2303.06032] Exploring Adversarial Attacks on Neural Networks: An Explainable Approach](http://arxiv.org/abs/2303.06032) #attack
Summary:
Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise. In particular, we compare the network response layer by layer to determine where errors occurred. Several interesting findings are derived. First, compared to Gaussian random noise, intentionally generated adversarial noise causes severe behavior deviation by distracting the area of concentration in the networks. Second, in many cases, adversarial examples only need to compromise a few intermediate blocks to mislead the final decision. Third, our experiments revealed that specific blocks are more vulnerable and easier to exploit by adversarial examples. Finally, we demonstrate that the layers $Block4_conv1$ and $Block5_cov1$ of the VGG-16 model are more susceptible to adversarial attacks. Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models.

Title: Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation. (arXiv:2303.05593v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05593
Code URL: null
Copy Paste: [[2303.05593] Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation](http://arxiv.org/abs/2303.05593) #attack
Summary:
In recent years, knowledge distillation has become a cornerstone of efficiently deployed machine learning, with labs and industries using knowledge distillation to train models that are inexpensive and resource-optimized. Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models. Given the widespread use of knowledge distillation, in this work we seek to exploit the unlabelled data knowledge distillation process to embed Trojans in a student model without introducing conspicuous behavior in the teacher. We ultimately devise a Trojan attack that effectively reduces student accuracy, does not alter teacher performance, and is efficiently constructible in practice.

robust

Title: Semantic-Preserving Augmentation for Robust Image-Text Retrieval. (arXiv:2303.05692v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05692
Code URL: null
Copy Paste: [[2303.05692] Semantic-Preserving Augmentation for Robust Image-Text Retrieval](http://arxiv.org/abs/2303.05692) #robust
Summary:
Image text retrieval is a task to search for the proper textual descriptions of the visual world and vice versa. One challenge of this task is the vulnerability to input image and text corruptions. Such corruptions are often unobserved during the training, and degrade the retrieval model decision quality substantially. In this paper, we propose a novel image text retrieval technique, referred to as robust visual semantic embedding (RVSE), which consists of novel image-based and text-based augmentation techniques called semantic preserving augmentation for image (SPAugI) and text (SPAugT). Since SPAugI and SPAugT change the original data in a way that its semantic information is preserved, we enforce the feature extractors to generate semantic aware embedding vectors regardless of the corruption, improving the model robustness significantly. From extensive experiments using benchmark datasets, we show that RVSE outperforms conventional retrieval schemes in terms of image-text retrieval performance.

Title: Mode-locking Theory for Long-Range Interaction in Artificial Neural Networks. (arXiv:2303.05695v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05695
Code URL: null
Copy Paste: [[2303.05695] Mode-locking Theory for Long-Range Interaction in Artificial Neural Networks](http://arxiv.org/abs/2303.05695) #robust
Summary:
Visual long-range interaction refers to modeling dependencies between distant feature points or blocks within an image, which can significantly enhance the model's robustness. Both CNN and Transformer can establish long-range interactions through layering and patch calculations. However, the underlying mechanism of long-range interaction in visual space remains unclear. We propose the mode-locking theory as the underlying mechanism, which constrains the phase and wavelength relationship between waves to achieve mode-locked interference waveform. We verify this theory through simulation experiments and demonstrate the mode-locking pattern in real-world scene models. Our proposed theory of long-range interaction provides a comprehensive understanding of the mechanism behind this phenomenon in artificial neural networks. This theory can inspire the integration of the mode-locking pattern into models to enhance their robustness.

Title: Generative Model Based Noise Robust Training for Unsupervised Domain Adaptation. (arXiv:2303.05734v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05734
Code URL: null
Copy Paste: [[2303.05734] Generative Model Based Noise Robust Training for Unsupervised Domain Adaptation](http://arxiv.org/abs/2303.05734) #robust
Summary:
Target domain pseudo-labelling has shown effectiveness in unsupervised domain adaptation (UDA). However, pseudo-labels of unlabeled target domain data are inevitably noisy due to the distribution shift between source and target domains. This paper proposes a Generative model-based Noise-Robust Training method (GeNRT), which eliminates domain shift while mitigating label noise. GeNRT incorporates a Distribution-based Class-wise Feature Augmentation (D-CFA) and a Generative-Discriminative classifier Consistency (GDC), both based on the class-wise target distributions modelled by generative models. D-CFA minimizes the domain gap by augmenting the source data with distribution-sampled target features, and trains a noise-robust discriminative classifier by using target domain knowledge from the generative models. GDC regards all the class-wise generative models as generative classifiers and enforces a consistency regularization between the generative and discriminative classifiers. It exploits an ensemble of target knowledge from all the generative models to train a noise-robust discriminative classifier and eventually gets theoretically linked to the Ben-David domain adaptation theorem for reducing the domain gap. Extensive experiments on Office-Home, PACS, and Digit-Five show that our GeNRT achieves comparable performance to state-of-the-art methods under single-source and multi-source UDA settings.

Title: Boosting Semi-Supervised Few-Shot Object Detection with SoftER Teacher. (arXiv:2303.05739v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05739
Code URL: https://github.com/lexisnexis-risk-open-source/ledetection
Copy Paste: [[2303.05739] Boosting Semi-Supervised Few-Shot Object Detection with SoftER Teacher](http://arxiv.org/abs/2303.05739) #robust
Summary:
Few-shot object detection is an emerging problem aimed at detecting novel concepts from few exemplars. Existing approaches to few-shot detection assume abundant base labels to adapt to novel objects. This paper explores the task of semi-supervised few-shot detection by considering a realistic scenario which lacks abundant labels for both base and novel objects. Motivated by this unique problem, we introduce SoftER Teacher, a robust detector combining the advantages of pseudo-labeling with representation learning on region proposals. SoftER Teacher harnesses unlabeled data to jointly optimize for semi-supervised few-shot detection without explicitly relying on abundant base labels. Extensive experiments show that SoftER Teacher matches the novel class performance of a strong supervised detector using only 10% of base labels. Our work also sheds insight into a previously unknown relationship between semi-supervised and few-shot detection to suggest that a stronger semi-supervised detector leads to a more label-efficient few-shot detector. Code and models are available at https://github.com/lexisnexis-risk-open-source/ledetection

Title: Automatic Detection and Rectification of Paper Receipts on Smartphones. (arXiv:2303.05763v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05763
Code URL: null
Copy Paste: [[2303.05763] Automatic Detection and Rectification of Paper Receipts on Smartphones](http://arxiv.org/abs/2303.05763) #robust
Summary:
We describe the development of a real-time smartphone app that allows the user to digitize paper receipts in a novel way by "waving" their phone over the receipts and letting the app automatically detect and rectify the receipts for subsequent text recognition.

We show that traditional computer vision algorithms for edge and corner detection do not robustly detect the non-linear and discontinuous edges and corners of a typical paper receipt in real-world settings. This is particularly the case when the colors of the receipt and background are similar, or where other interfering rectangular objects are present. Inaccurate detection of a receipt's corner positions then results in distorted images when using an affine projective transformation to rectify the perspective.

We propose an innovative solution to receipt corner detection by treating each of the four corners as a unique "object", and training a Single Shot Detection MobileNet object detection model. We use a small amount of real data and a large amount of automatically generated synthetic data that is designed to be similar to real-world imaging scenarios.

We show that our proposed method robustly detects the four corners of a receipt, giving a receipt detection accuracy of 85.3% on real-world data, compared to only 36.9% with a traditional edge detection-based approach. Our method works even when the color of the receipt is virtually indistinguishable from the background.

Moreover, our method is trained to detect only the corners of the central target receipt and implicitly learns to ignore other receipts, and other rectangular objects. Including synthetic data allows us to train an even better model. These factors are a major advantage over traditional edge detection-based approaches, allowing us to deliver a much better experience to the user.

Title: Self-NeRF: A Self-Training Pipeline for Few-Shot Neural Radiance Fields. (arXiv:2303.05775v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05775
Code URL: null
Copy Paste: [[2303.05775] Self-NeRF: A Self-Training Pipeline for Few-Shot Neural Radiance Fields](http://arxiv.org/abs/2303.05775) #robust
Summary:
Recently, Neural Radiance Fields (NeRF) have emerged as a potent method for synthesizing novel views from a dense set of images. Despite its impressive performance, NeRF is plagued by its necessity for numerous calibrated views and its accuracy diminishes significantly in a few-shot setting. To address this challenge, we propose Self-NeRF, a self-evolved NeRF that iteratively refines the radiance fields with very few number of input views, without incorporating additional priors. Basically, we train our model under the supervision of reference and unseen views simultaneously in an iterative procedure. In each iteration, we label unseen views with the predicted colors or warped pixels generated by the model from the preceding iteration. However, these expanded pseudo-views are afflicted by imprecision in color and warping artifacts, which degrades the performance of NeRF. To alleviate this issue, we construct an uncertainty-aware NeRF with specialized embeddings. Some techniques such as cone entropy regularization are further utilized to leverage the pseudo-views in the most efficient manner. Through experiments under various settings, we verified that our Self-NeRF is robust to input with uncertainty and surpasses existing methods when trained on limited training data.

Title: Marginalia and machine learning: Handwritten text recognition for Marginalia Collections. (arXiv:2303.05929v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05929
Code URL: https://github.com/ektavats/project-marginalia
Copy Paste: [[2303.05929] Marginalia and machine learning: Handwritten text recognition for Marginalia Collections](http://arxiv.org/abs/2303.05929) #robust
Summary:
The pressing need for digitization of historical document collections has led to a strong interest in designing computerised image processing methods for automatic handwritten text recognition (HTR). Handwritten text possesses high variability due to different writing styles, languages and scripts. Training an accurate and robust HTR system calls for data-efficient approaches due to the unavailability of sufficient amounts of annotated multi-writer text. A case study on an ongoing project ``Marginalia and Machine Learning" is presented here that focuses on automatic detection and recognition of handwritten marginalia texts i.e., text written in margins or handwritten notes. Faster R-CNN network is used for detection of marginalia and AttentionHTR is used for word recognition. The data comes from early book collections (printed) found in the Uppsala University Library, with handwritten marginalia texts. Source code and pretrained models are available at https://github.com/ektavats/Project-Marginalia.

Title: Score-Based Generative Models for Medical Image Segmentation using Signed Distance Functions. (arXiv:2303.05966v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05966
Code URL: null
Copy Paste: [[2303.05966] Score-Based Generative Models for Medical Image Segmentation using Signed Distance Functions](http://arxiv.org/abs/2303.05966) #robust
Summary:
Medical image segmentation is a crucial task that relies on the ability to accurately identify and isolate regions of interest in images. Thereby, generative approaches allow to capture the statistical properties of segmentation masks that are dependent on the respective medical images. In this work we propose a conditional score-based generative modeling framework that leverages the signed distance function to represent an implicit and smoother distribution of segmentation masks. The score function of the conditional distribution of segmentation masks is learned in a conditional denoising process, which can be effectively used to generate accurate segmentation masks. Moreover, uncertainty maps can be generated, which can aid in further analysis and thus enhance the predictive robustness. We qualitatively and quantitatively illustrate competitive performance of the proposed method on a public nuclei and gland segmentation data set, highlighting its potential utility in medical image segmentation applications.

Title: Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception. (arXiv:2303.05970v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05970
Code URL: null
Copy Paste: [[2303.05970] Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception](http://arxiv.org/abs/2303.05970) #robust
Summary:
Long-term temporal fusion is a crucial but often overlooked technique in camera-based Bird's-Eye-View (BEV) 3D perception. Existing methods are mostly in a parallel manner. While parallel fusion can benefit from long-term information, it suffers from increasing computational and memory overheads as the fusion window size grows. Alternatively, BEVFormer adopts a recurrent fusion pipeline so that history information can be efficiently integrated, yet it fails to benefit from longer temporal frames. In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i.e., rich long-term information and efficient fusion pipeline. A temporal embedding module is further proposed to improve the model's robustness against occasionally missed frames in practical scenarios. We name this simple but effective fusing pipeline VideoBEV. Experimental results on the nuScenes benchmark show that VideoBEV obtains leading performance on various camera-based 3D perception tasks, including object detection (55.4% mAP and 62.9% NDS), segmentation (48.6% vehicle mIoU), tracking (54.8% AMOTA), and motion prediction (0.80m minADE and 0.463 EPA). Code will be available.

Title: StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces. (arXiv:2303.06146v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.06146
Code URL: https://github.com/williamyang1991/styleganex
Copy Paste: [[2303.06146] StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces](http://arxiv.org/abs/2303.06146) #robust
Summary:
Recent advances in face manipulation using StyleGAN have produced impressive results. However, StyleGAN is inherently limited to cropped aligned faces at a fixed image resolution it is pre-trained on. In this paper, we propose a simple and effective solution to this limitation by using dilated convolutions to rescale the receptive fields of shallow layers in StyleGAN, without altering any model parameters. This allows fixed-size small features at shallow layers to be extended into larger ones that can accommodate variable resolutions, making them more robust in characterizing unaligned faces. To enable real face inversion and manipulation, we introduce a corresponding encoder that provides the first-layer feature of the extended StyleGAN in addition to the latent style code. We validate the effectiveness of our method using unaligned face inputs of various resolutions in a diverse set of face manipulation tasks, including facial attribute editing, super-resolution, sketch/mask-to-face translation, and face toonification.

Title: Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss. (arXiv:2303.05958v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.05958
Code URL: null
Copy Paste: [[2303.05958] Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss](http://arxiv.org/abs/2303.05958) #robust
Summary:
This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNN-T architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data.

Title: On the Soundness of XAI in Prognostics and Health Management (PHM). (arXiv:2303.05517v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05517
Code URL: null
Copy Paste: [[2303.05517] On the Soundness of XAI in Prognostics and Health Management (PHM)](http://arxiv.org/abs/2303.05517) #robust
Summary:
The aim of Predictive Maintenance, within the field of Prognostics and Health Management (PHM), is to identify and anticipate potential issues in the equipment before these become critical. The main challenge to be addressed is to assess the amount of time a piece of equipment will function effectively before it fails, which is known as Remaining Useful Life (RUL). Deep Learning (DL) models, such as Deep Convolutional Neural Networks (DCNN) and Long Short-Term Memory (LSTM) networks, have been widely adopted to address the task, with great success. However, it is well known that this kind of black box models are opaque decision systems, and it may be hard to explain its outputs to stakeholders (experts in the industrial equipment). Due to the large number of parameters that determine the behavior of these complex models, understanding the reasoning behind the predictions is challenging. This work presents a critical and comparative revision on a number of XAI methods applied on time series regression model for PM. The aim is to explore XAI methods within time series regression, which have been less studied than those for time series classification. The model used during the experimentation is a DCNN trained to predict the RUL of an aircraft engine. The methods are reviewed and compared using a set of metrics that quantifies a number of desirable properties that any XAI method should fulfill. The results show that GRAD-CAM is the most robust method, and that the best layer is not the bottom one, as is commonly seen within the context of Image Processing.

Title: Variance-aware robust reinforcement learning with linear function approximation with heavy-tailed rewards. (arXiv:2303.05606v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05606
Code URL: null
Copy Paste: [[2303.05606] Variance-aware robust reinforcement learning with linear function approximation with heavy-tailed rewards](http://arxiv.org/abs/2303.05606) #robust
Summary:
This paper presents two algorithms, AdaOFUL and VARA, for online sequential decision-making in the presence of heavy-tailed rewards with only finite variances. For linear stochastic bandits, we address the issue of heavy-tailed rewards by modifying the adaptive Huber regression and proposing AdaOFUL. AdaOFUL achieves a state-of-the-art regret bound of $\widetilde{\mathcal{O}}\big(d\big(\sum_{t=1}^T \nu_{t}^2\big)^{1/2}+d\big)$ as if the rewards were uniformly bounded, where $\nu_{t}^2$ is the observed conditional variance of the reward at round $t$, $d$ is the feature dimension, and $\widetilde{\mathcal{O}}(\cdot)$ hides logarithmic dependence. Building upon AdaOFUL, we propose VARA for linear MDPs, which achieves a tighter variance-aware regret bound of $\widetilde{\mathcal{O}}(d\sqrt{H\mathcal{G}^K})$. Here, $H$ is the length of episodes, $K$ is the number of episodes, and $\mathcal{G}^$ is a smaller instance-dependent quantity that can be bounded by other instance-dependent quantities when additional structural conditions on the MDP are satisfied. Our regret bound is superior to the current state-of-the-art bounds in three ways: (1) it depends on a tighter instance-dependent quantity and has optimal dependence on $d$ and $H$, (2) we can obtain further instance-dependent bounds of $\mathcal{G}^*$ under additional structural conditions on the MDP, and (3) our regret bound is valid even when rewards have only finite variances, achieving a level of generality unmatched by previous works. Overall, our modified adaptive Huber regression algorithm may serve as a useful building block in the design of algorithms for online problems with heavy-tailed rewards.

Title: Distributionally Robust Optimization with Probabilistic Group. (arXiv:2303.05809v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05809
Code URL: null
Copy Paste: [[2303.05809] Distributionally Robust Optimization with Probabilistic Group](http://arxiv.org/abs/2303.05809) #robust
Summary:
Modern machine learning models may be susceptible to learning spurious correlations that hold on average but not for the atypical group of samples. To address the problem, previous approaches minimize the empirical worst-group risk. Despite the promise, they often assume that each sample belongs to one and only one group, which does not allow expressing the uncertainty in group labeling. In this paper, we propose a novel framework PG-DRO, which explores the idea of probabilistic group membership for distributionally robust optimization. Key to our framework, we consider soft group membership instead of hard group annotations. The group probabilities can be flexibly generated using either supervised learning or zero-shot approaches. Our framework accommodates samples with group membership ambiguity, offering stronger flexibility and generality than the prior art. We comprehensively evaluate PG-DRO on both image classification and natural language processing benchmarks, establishing superior performance

Title: Forecasting Solar Irradiance without Direct Observation: An Empirical Analysis. (arXiv:2303.06010v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.06010
Code URL: null
Copy Paste: [[2303.06010] Forecasting Solar Irradiance without Direct Observation: An Empirical Analysis](http://arxiv.org/abs/2303.06010) #robust
Summary:
As the use of solar power increases, having accurate and timely forecasters will be essential for smooth grid operators. There are many proposed methods for forecasting solar irradiance / solar power production. However, many of these methods formulate the problem as a time-series, relying on near real-time access to observations at the location of interest to generate forecasts. This requires both access to a real-time stream of data and enough historical observations for these methods to be deployed. In this paper, we conduct a thorough analysis of effective ways to formulate the forecasting problem comparing classical machine learning approaches to state-of-the-art deep learning. Using data from 20 locations distributed throughout the UK and commercially available weather data, we show that it is possible to build systems that do not require access to this data. Leveraging weather observations and measurements from other locations we show it is possible to create models capable of accurately forecasting solar irradiance at new locations. We utilise compare both satellite and ground observations (e.g. temperature, pressure) of weather data. This could facilitate use planning and optimisation for both newly deployed solar farms and domestic installations from the moment they come online. Additionally, we show that training a single global model for multiple locations can produce a more robust model with more consistent and accurate results across locations.

Title: Ignorance is Bliss: Robust Control via Information Gating. (arXiv:2303.06121v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.06121
Code URL: null
Copy Paste: [[2303.06121] Ignorance is Bliss: Robust Control via Information Gating](http://arxiv.org/abs/2303.06121) #robust
Summary:
Informational parsimony -- i.e., using the minimal information required for a task, -- provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations. We propose information gating in the pixel space as a way to learn more parsimonious representations. Information gating works by learning masks that capture only the minimal information required to solve a given task. Intuitively, our models learn to identify which visual cues actually matter for a given task. We gate information using a differentiable parameterization of the signal-to-noise ratio, which can be applied to arbitrary values in a network, e.g.~masking out pixels at the input layer. We apply our approach, which we call InfoGating, to various objectives such as: multi-step forward and inverse dynamics, Q-learning, behavior cloning, and standard self-supervised tasks. Our experiments show that learning to identify and use minimal information can improve generalization in downstream tasks -- e.g., policies based on info-gated images are considerably more robust to distracting/irrelevant visual features.

biometric

steal

extraction

Title: Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection. (arXiv:2303.05892v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05892
Code URL: null
Copy Paste: [[2303.05892] Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection](http://arxiv.org/abs/2303.05892) #extraction
Summary:
Open-vocabulary object detection aims to provide object detectors trained on a fixed set of object categories with the generalizability to detect objects described by arbitrary text queries. Previous methods adopt knowledge distillation to extract knowledge from Pretrained Vision-and-Language Models (PVLMs) and transfer it to detectors. However, due to the non-adaptive proposal cropping and single-level feature mimicking processes, they suffer from information destruction during knowledge extraction and inefficient knowledge transfer. To remedy these limitations, we propose an Object-Aware Distillation Pyramid (OADP) framework, including an Object-Aware Knowledge Extraction (OAKE) module and a Distillation Pyramid (DP) mechanism. When extracting object knowledge from PVLMs, the former adaptively transforms object proposals and adopts object-aware mask attention to obtain precise and complete knowledge of objects. The latter introduces global and block distillation for more comprehensive knowledge transfer to compensate for the missing relation information in object distillation. Extensive experiments show that our method achieves significant improvement compared to current methods. Especially on the MS-COCO dataset, our OADP framework reaches $35.6$ mAP$^{\text{N}}{50}$, surpassing the current state-of-the-art method by $3.3$ mAP$^{\text{N}}{50}$. Code is released at https://github.com/LutingWang/OADP.

Title: ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction. (arXiv:2303.05938v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05938
Code URL: https://github.com/zhengdiyu/arbitrary-hands-3d-reconstruction
Copy Paste: [[2303.05938] ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction](http://arxiv.org/abs/2303.05938) #extraction
Summary:
Reconstructing two hands from monocular RGB images is challenging due to frequent occlusion and mutual confusion. Existing methods mainly learn an entangled representation to encode two interacting hands, which are incredibly fragile to impaired interaction, such as truncated hands, separate hands, or external occlusion. This paper presents ACR (Attention Collaboration-based Regressor), which makes the first attempt to reconstruct hands in arbitrary scenarios. To achieve this, ACR explicitly mitigates interdependencies between hands and between parts by leveraging center and part-based attention for feature extraction. However, reducing interdependence helps release the input constraint while weakening the mutual reasoning about reconstructing the interacting hands. Thus, based on center attention, ACR also learns cross-hand prior that handle the interacting hands better. We evaluate our method on various types of hand reconstruction datasets. Our method significantly outperforms the best interacting-hand approaches on the InterHand2.6M dataset while yielding comparable performance with the state-of-the-art single-hand methods on the FreiHand dataset. More qualitative results on in-the-wild and hand-object interaction datasets and web images/videos further demonstrate the effectiveness of our approach for arbitrary hand reconstruction. Our code is available at https://github.com/ZhengdiYu/Arbitrary-Hands-3D-Reconstruction.

membership infer

federate

Title: An Evaluation of Non-Contrastive Self-Supervised Learning for Federated Medical Image Analysis. (arXiv:2303.05556v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05556
Code URL: null
Copy Paste: [[2303.05556] An Evaluation of Non-Contrastive Self-Supervised Learning for Federated Medical Image Analysis](http://arxiv.org/abs/2303.05556) #federate
Summary:
Privacy and annotation bottlenecks are two major issues that profoundly affect the practicality of machine learning-based medical image analysis. Although significant progress has been made in these areas, these issues are not yet fully resolved. In this paper, we seek to tackle these concerns head-on and systematically explore the applicability of non-contrastive self-supervised learning (SSL) algorithms under federated learning (FL) simulations for medical image analysis. We conduct thorough experimentation of recently proposed state-of-the-art non-contrastive frameworks under standard FL setups. With the SoTA Contrastive Learning algorithm, SimCLR as our comparative baseline, we benchmark the performances of our 4 chosen non-contrastive algorithms under non-i.i.d. data conditions and with a varying number of clients. We present a holistic evaluation of these techniques on 6 standardized medical imaging datasets. We further analyse different trends inferred from the findings of our research, with the aim to find directions for further research based on ours. To the best of our knowledge, ours is the first to perform such a thorough analysis of federated self-supervised learning for medical imaging. All of our source code will be made public upon acceptance of the paper.

Title: Vertical Federated Graph Neural Network for Recommender System. (arXiv:2303.05786v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.05786
Code URL: null
Copy Paste: [[2303.05786] Vertical Federated Graph Neural Network for Recommender System](http://arxiv.org/abs/2303.05786) #federate
Summary:
Conventional recommender systems are required to train the recommendation model using a centralized database. However, due to data privacy concerns, this is often impractical when multi-parties are involved in recommender system training. Federated learning appears as an excellent solution to the data isolation and privacy problem. Recently, Graph neural network (GNN) is becoming a promising approach for federated recommender systems. However, a key challenge is to conduct embedding propagation while preserving the privacy of the graph structure. Few studies have been conducted on the federated GNN-based recommender system. Our study proposes the first vertical federated GNN-based recommender system, called VerFedGNN. We design a framework to transmit: (i) the summation of neighbor embeddings using random projection, and (ii) gradients of public parameter perturbed by ternary quantization mechanism. Empirical studies show that VerFedGNN has competitive prediction accuracy with existing privacy preserving GNN frameworks while enhanced privacy protection for users' interaction information.

Title: On the Fusion Strategies for Federated Decision Making. (arXiv:2303.06109v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.06109
Code URL: null
Copy Paste: [[2303.06109] On the Fusion Strategies for Federated Decision Making](http://arxiv.org/abs/2303.06109) #federate
Summary:
We consider the problem of information aggregation in federated decision making, where a group of agents collaborate to infer the underlying state of nature without sharing their private data with the central processor or each other. We analyze the non-Bayesian social learning strategy in which agents incorporate their individual observations into their opinions (i.e., soft-decisions) with Bayes rule, and the central processor aggregates these opinions by arithmetic or geometric averaging. Building on our previous work, we establish that both pooling strategies result in asymptotic normality characterization of the system, which, for instance, can be utilized in order to give approximate expressions for the error probability. We verify the theoretical findings with simulations and compare both strategies.

fair

Title: Fairness-enhancing deep learning for ride-hailing demand prediction. (arXiv:2303.05698v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05698
Code URL: null
Copy Paste: [[2303.05698] Fairness-enhancing deep learning for ride-hailing demand prediction](http://arxiv.org/abs/2303.05698) #fair
Summary:
Short-term demand forecasting for on-demand ride-hailing services is one of the fundamental issues in intelligent transportation systems. However, previous travel demand forecasting research predominantly focused on improving prediction accuracy, ignoring fairness issues such as systematic underestimations of travel demand in disadvantaged neighborhoods. This study investigates how to measure, evaluate, and enhance prediction fairness between disadvantaged and privileged communities in spatial-temporal demand forecasting of ride-hailing services. A two-pronged approach is taken to reduce the demand prediction bias. First, we develop a novel deep learning model architecture, named socially aware neural network (SA-Net), to integrate the socio-demographics and ridership information for fair demand prediction through an innovative socially-aware convolution operation. Second, we propose a bias-mitigation regularization method to mitigate the mean percentage prediction error gap between different groups. The experimental results, validated on the real-world Chicago Transportation Network Company (TNC) data, show that the de-biasing SA-Net can achieve better predictive performance in both prediction accuracy and fairness. Specifically, the SA-Net improves prediction accuracy for both the disadvantaged and privileged groups compared with the state-of-the-art models. When coupled with the bias mitigation regularization method, the de-biasing SA-Net effectively bridges the mean percentage prediction error gap between the disadvantaged and privileged groups, and also protects the disadvantaged regions against systematic underestimation of TNC demand. Our proposed de-biasing method can be adopted in many existing short-term travel demand estimation models, and can be utilized for various other spatial-temporal prediction tasks such as crime incidents predictions.

interpretability

explainability

watermark

diffusion

Title: EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models. (arXiv:2303.05656v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05656
Code URL: https://github.com/sczzz3/ehrdiff
Copy Paste: [[2303.05656] EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models](http://arxiv.org/abs/2303.05656) #diffusion
Summary:
Electronic health records (EHR) contain vast biomedical knowledge and are rich resources for developing precise medicine systems. However, due to privacy concerns, there are limited high-quality EHR data accessible to researchers hence hindering the advancement of methodologies. Recent research has explored using generative modelling methods to synthesize realistic EHR data, and most proposed methods are based on the generative adversarial network (GAN) and its variants for EHR synthesis. Although GAN-style methods achieved state-of-the-art performance in generating high-quality EHR data, such methods are hard to train and prone to mode collapse. Diffusion models are recently proposed generative modelling methods and set cutting-edge performance in image generation. The performance of diffusion models in realistic EHR synthesis is rarely explored. In this work, we explore whether the superior performance of diffusion models can translate to the domain of EHR synthesis and propose a novel EHR synthesis method named EHRDiff. Through comprehensive experiments, EHRDiff achieves new state-of-the-art performance for the quality of synthetic EHR data and can better protect private information in real training EHRs in the meanwhile.

Title: Fast Diffusion Sampler for Inverse Problems by Geometric Decomposition. (arXiv:2303.05754v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.05754
Code URL: null
Copy Paste: [[2303.05754] Fast Diffusion Sampler for Inverse Problems by Geometric Decomposition](http://arxiv.org/abs/2303.05754) #diffusion
Summary:
Diffusion models have shown exceptional performance in solving inverse problems. However, one major limitation is the slow inference time. While faster diffusion samplers have been developed for unconditional sampling, there has been limited research on conditional sampling in the context of inverse problems. In this study, we propose a novel and efficient diffusion sampling strategy that employs the geometric decomposition of diffusion sampling. Specifically, we discover that the samples generated from diffusion models can be decomposed into two orthogonal components: a denoised" component obtained by projecting the sample onto the clean data manifold, and anoise" component that induces a transition to the next lower-level noisy manifold with the addition of stochastic noise. Furthermore, we prove that, under some conditions on the clean data manifold, the conjugate gradient update for imposing conditioning from the denoised signal belongs to the clean manifold, resulting in a much faster and more accurate diffusion sampling. Our method is applicable regardless of the parameterization and setting (i.e., VE, VP). Notably, we achieve state-of-the-art reconstruction quality on challenging real-world medical inverse imaging problems, including multi-coil MRI reconstruction and 3D CT reconstruction. Moreover, our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.

Title: GECCO: Geometrically-Conditioned Point Diffusion Models. (arXiv:2303.05916v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.05916
Code URL: null
Copy Paste: [[2303.05916] GECCO: Geometrically-Conditioned Point Diffusion Models](http://arxiv.org/abs/2303.05916) #diffusion
Summary:
Diffusion models generating images conditionally on text, such as Dall-E 2 and Stable Diffusion, have recently made a splash far beyond the computer vision community. Here, we tackle the related problem of generating point clouds, both unconditionally, and conditionally with images. For the latter, we introduce a novel geometrically-motivated conditioning scheme based on projecting sparse image features into the point cloud and attaching them to each individual point, at every step in the denoising process. This approach improves geometric consistency and yields greater fidelity than current methods relying on unstructured, global latent codes. Additionally, we show how to apply recent continuous-time diffusion schemes. Our method performs on par or above the state of art on conditional and unconditional experiments on synthetic data, while being faster, lighter, and delivering tractable likelihoods. We show it can also scale to diverse indoors scenes.