secure
Title: How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?. (arXiv:2208.02304v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02304
- Code URL: null
- Copy Paste:
[[2208.02304] How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?](http://arxiv.org/abs/2208.02304)
- Summary:
Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users while avoiding moving the data off-device. However, while data never leaves users' devices, privacy still cannot be guaranteed since significant computations on users' training data are shared in the form of trained local models. These local models have recently been shown to pose a substantial privacy threat through different privacy attacks such as model inversion attacks. As a remedy, Secure Aggregation (SA) has been developed as a framework to preserve privacy in FL, by guaranteeing the server can only learn the global aggregated model update but not the individual model updates. While SA ensures no additional information is leaked about the individual model update beyond the aggregated model update, there are no formal guarantees on how much privacy FL with SA can actually offer; as information about the individual dataset can still potentially leak through the aggregated model computed at the server. In this work, we perform a first analysis of the formal privacy guarantees for FL with SA. Specifically, we use Mutual Information (MI) as a quantification metric and derive upper bounds on how much information about each user's dataset can leak through the aggregated model update. When using the FedSGD aggregation algorithm, our theoretical bounds show that the amount of privacy leakage reduces linearly with the number of users participating in FL with SA. To validate our theoretical bounds, we use an MI Neural Estimator to empirically evaluate the privacy leakage under different FL setups on both the MNIST and CIFAR10 datasets. Our experiments verify our theoretical bounds for FedSGD, which show a reduction in privacy leakage as the number of users and local batch size grow, and an increase in privacy leakage with the number of training rounds.
Title: Design of secure and robust cognitive system for malware detection. (arXiv:2208.02310v1 [cs.CR])
- Paper URL: http://arxiv.org/abs/2208.02310
- Code URL: null
- Copy Paste:
[[2208.02310] Design of secure and robust cognitive system for malware detection](http://arxiv.org/abs/2208.02310)
- Summary:
Machine learning based malware detection techniques rely on grayscale images of malware and tends to classify malware based on the distribution of textures in graycale images. Albeit the advancement and promising results shown by machine learning techniques, attackers can exploit the vulnerabilities by generating adversarial samples. Adversarial samples are generated by intelligently crafting and adding perturbations to the input samples. There exists majority of the software based adversarial attacks and defenses. To defend against the adversaries, the existing malware detection based on machine learning and grayscale images needs a preprocessing for the adversarial data. This can cause an additional overhead and can prolong the real-time malware detection. So, as an alternative to this, we explore RRAM (Resistive Random Access Memory) based defense against adversaries. Therefore, the aim of this thesis is to address the above mentioned critical system security issues. The above mentioned challenges are addressed by demonstrating proposed techniques to design a secure and robust cognitive system. First, a novel technique to detect stealthy malware is proposed. The technique uses malware binary images and then extract different features from the same and then employ different ML-classifiers on the dataset thus obtained. Results demonstrate that this technique is successful in differentiating classes of malware based on the features extracted. Secondly, I demonstrate the effects of adversarial attacks on a reconfigurable RRAM-neuromorphic architecture with different learning algorithms and device characteristics. I also propose an integrated solution for mitigating the effects of the adversarial attack using the reconfigurable RRAM architecture.
Title: Design Considerations and Architecture for a Resilient Risk based Adaptive Authentication and Authorization (RAD-AA) Framework. (arXiv:2208.02592v1 [cs.CR])
- Paper URL: http://arxiv.org/abs/2208.02592
- Code URL: null
- Copy Paste:
[[2208.02592] Design Considerations and Architecture for a Resilient Risk based Adaptive Authentication and Authorization (RAD-AA) Framework](http://arxiv.org/abs/2208.02592)
- Summary:
A strong cyber attack is capable of degrading the performance of any Information Technology (IT) or Operational Technology (OT) system. In recent cyber attacks, credential theft emerged as one of the primary vectors of gaining entry into the system. Once, an attacker has a foothold in the system, they use token manipulation techniques to elevate the privileges and access protected resources. This makes authentication and authorization a critical component for a secure and resilient cyber system. In this paper we consider the design considerations for such a secure and resilient authentication and authorization framework capable of self-adapting based on the risk scores and trust profiles. We compare this design with the existing standards such as OAuth 2.0, OpenID Connect and SAML 2.0. We then study popular threat models such as STRIDE and PASTA and summarize the resilience of the proposed architecture against common and relevant threat vectors. We call this framework Resilient Risk-based Adaptive Authentication and Authorization (RAD-AA). The proposed framework excessively increases the cost for an adversary to launch any cyber attack and provides much-needed strength to critical infrastructure.
Title: Information Flow Control-by-Construction for an Object-Oriented Language Using Type Modifiers. (arXiv:2208.02672v1 [cs.CR])
- Paper URL: http://arxiv.org/abs/2208.02672
- Code URL: null
- Copy Paste:
[[2208.02672] Information Flow Control-by-Construction for an Object-Oriented Language Using Type Modifiers](http://arxiv.org/abs/2208.02672)
- Summary:
In security-critical software applications, confidential information must be prevented from leaking to unauthorized sinks. Static analysis techniques are widespread to enforce a secure information flow by checking a program after construction. A drawback of these systems is that incomplete programs during construction cannot be checked properly. The user is not guided to a secure program by most systems. We introduce IFbCOO, an approach that guides users incrementally to a secure implementation by using refinement rules. In each refinement step, confidentiality or integrity (or both) is guaranteed alongside the functional correctness of the program, such that insecure programs are declined by construction. In this work, we formalize IFbCOO and prove soundness of the refinement rules. We implement IFbCOO in the tool CorC and conduct a feasibility study by successfully implementing case studies.
security
Title: "Yeah, it does have a...Windows `98 Vibe'': Usability Study of Security Features in Programmable Logic Controllers. (arXiv:2208.02500v1 [cs.CR])
- Paper URL: http://arxiv.org/abs/2208.02500
- Code URL: null
- Copy Paste:
[[2208.02500] "Yeah, it does have a](http://arxiv.org/abs/2208.02500)
- Summary:
Programmable Logic Controllers (PLCs) drive industrial processes critical to society, e.g., water treatment and distribution, electricity and fuel networks. Search engines (e.g., Shodan) have highlighted that Programmable Logic Controllers (PLCs) are often left exposed to the Internet, one of the main reasons being the misconfigurations of security settings. This leads to the question -- why do these misconfigurations occur and, specifically, whether usability of security controls plays a part? To date, the usability of configuring PLC security mechanisms has not been studied. We present the first investigation through a task-based study and subsequent semi-structured interviews (N=19). We explore the usability of PLC connection configurations and two key security mechanisms (i.e., access levels and user administration). We find that the use of unfamiliar labels, layouts and misleading terminology exacerbates an already complex process of configuring security mechanisms. Our results uncover various (mis-) perceptions about the security controls and how design constraints, e.g., safety and lack of regular updates (due to long term nature of such systems), provide significant challenges to realization of modern HCI and usability principles. Based on these findings, we provide design recommendations to bring usable security in industrial settings at par with its IT counterpart.
Title: Deep VULMAN: A Deep Reinforcement Learning-Enabled Cyber Vulnerability Management Framework. (arXiv:2208.02369v1 [cs.AI])
- Paper URL: http://arxiv.org/abs/2208.02369
- Code URL: null
- Copy Paste:
[[2208.02369] Deep VULMAN: A Deep Reinforcement Learning-Enabled Cyber Vulnerability Management Framework](http://arxiv.org/abs/2208.02369)
- Summary:
Cyber vulnerability management is a critical function of a cybersecurity operations center (CSOC) that helps protect organizations against cyber-attacks on their computer and network systems. Adversaries hold an asymmetric advantage over the CSOC, as the number of deficiencies in these systems is increasing at a significantly higher rate compared to the expansion rate of the security teams to mitigate them in a resource-constrained environment. The current approaches are deterministic and one-time decision-making methods, which do not consider future uncertainties when prioritizing and selecting vulnerabilities for mitigation. These approaches are also constrained by the sub-optimal distribution of resources, providing no flexibility to adjust their response to fluctuations in vulnerability arrivals. We propose a novel framework, Deep VULMAN, consisting of a deep reinforcement learning agent and an integer programming method to fill this gap in the cyber vulnerability management process. Our sequential decision-making framework, first, determines the near-optimal amount of resources to be allocated for mitigation under uncertainty for a given system state and then determines the optimal set of prioritized vulnerability instances for mitigation. Our proposed framework outperforms the current methods in prioritizing the selection of important organization-specific vulnerabilities, on both simulated and real-world vulnerability data, observed over a one-year period.
Title: SROS2: Usable Cyber Security Tools for ROS 2. (arXiv:2208.02615v1 [cs.CR])
- Paper URL: http://arxiv.org/abs/2208.02615
- Code URL: null
- Copy Paste:
[[2208.02615] SROS2: Usable Cyber Security Tools for ROS 2](http://arxiv.org/abs/2208.02615)
- Summary:
ROS 2 is rapidly becoming a standard in the robotics industry. Built upon DDS as its default communication middleware and used in safety-critical scenarios, adding security to robots and ROS computational graphs is increasingly becoming a concern. The present work introduces SROS2, a series of developer tools and libraries that facilitate adding security to ROS 2 graphs. Focusing on a usability-centric approach in SROS2, we present a methodology for securing graphs systematically while following the DevSecOps model. We also demonstrate the use of our security tools by presenting an application case study that considers securing a graph using the popular Navigation2 and SLAM Toolbox stacks applied in a TurtleBot3 robot. We analyse the current capabilities of SROS2 and discuss the shortcomings, which provides insights for future contributions and extensions. Ultimately, we present SROS2 as usable security tools for ROS 2 and argue that without usability, security in robotics will be greatly impaired.
Title: Ellipsis: Towards Efficient System Auditing for Real-Time Systems. (arXiv:2208.02699v1 [cs.CR])
- Paper URL: http://arxiv.org/abs/2208.02699
- Code URL: https://bitbucket.org/sts-lab/ellipsis
- Copy Paste:
[[2208.02699] Ellipsis: Towards Efficient System Auditing for Real-Time Systems](http://arxiv.org/abs/2208.02699)
- Summary:
System auditing is a powerful tool that provides insight into the nature of suspicious events in computing systems, allowing machine operators to detect and subsequently investigate security incidents. While auditing has proven invaluable to the security of traditional computers, existing audit frameworks are rarely designed with consideration for Real-Time Systems (RTS). The transparency provided by system auditing would be of tremendous benefit in a variety of security-critical RTS domains, (e.g., autonomous vehicles); however, if audit mechanisms are not carefully integrated into RTS, auditing can be rendered ineffectual and violate the real-world temporal requirements of the RTS.
In this paper, we demonstrate how to adapt commodity audit frameworks to RTS. Using Linux Audit as a case study, we first demonstrate that the volume of audit events generated by commodity frameworks is unsustainable within the temporal and resource constraints of real-time (RT) applications. To address this, we present Ellipsis, a set of kernel-based reduction techniques that leverage the periodic repetitive nature of RT applications to aggressively reduce the costs of system-level auditing. Ellipsis generates succinct descriptions of RT applications' expected activity while retaining a detailed record of unexpected activities, enabling analysis of suspicious activity while meeting temporal constraints. Our evaluation of Ellipsis, using ArduPilot (an open-source autopilot application suite) demonstrates up to 93% reduction in audit log generation.
privacy
Title: Privacy-Preserving Action Recognition via Motion Difference Quantization. (arXiv:2208.02459v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02459
- Code URL: null
- Copy Paste:
[[2208.02459] Privacy-Preserving Action Recognition via Motion Difference Quantization](http://arxiv.org/abs/2208.02459)
- Summary:
The widespread use of smart computer vision systems in our personal spaces has led to an increased consciousness about the privacy and security risks that these systems pose. On the one hand, we want these systems to assist in our daily lives by understanding their surroundings, but on the other hand, we want them to do so without capturing any sensitive information. Towards this direction, this paper proposes a simple, yet robust privacy-preserving encoder called BDQ for the task of privacy-preserving human action recognition that is composed of three modules: Blur, Difference, and Quantization. First, the input scene is passed to the Blur module to smoothen the edges. This is followed by the Difference module to apply a pixel-wise intensity subtraction between consecutive frames to highlight motion features and suppress obvious high-level privacy attributes. Finally, the Quantization module is applied to the motion difference frames to remove the low-level privacy attributes. The BDQ parameters are optimized in an end-to-end fashion via adversarial training such that it learns to allow action recognition attributes while inhibiting privacy attributes. Our experiments on three benchmark datasets show that the proposed encoder design can achieve state-of-the-art trade-off when compared with previous works. Furthermore, we show that the trade-off achieved is at par with the DVS sensor-based event cameras. Code available at: https://github.com/suakaw/BDQ_PrivacyAR.
Title: Privacy Safe Representation Learning via Frequency Filtering Encoder. (arXiv:2208.02482v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02482
- Code URL: null
- Copy Paste:
[[2208.02482] Privacy Safe Representation Learning via Frequency Filtering Encoder](http://arxiv.org/abs/2208.02482)
- Summary:
Deep learning models are increasingly deployed in real-world applications. These models are often deployed on the server-side and receive user data in an information-rich representation to solve a specific task, such as image classification. Since images can contain sensitive information, which users might not be willing to share, privacy protection becomes increasingly important. Adversarial Representation Learning (ARL) is a common approach to train an encoder that runs on the client-side and obfuscates an image. It is assumed, that the obfuscated image can safely be transmitted and used for the task on the server without privacy concerns. However, in this work, we find that training a reconstruction attacker can successfully recover the original image of existing ARL methods. To this end, we introduce a novel ARL method enhanced through low-pass filtering, limiting the available information amount to be encoded in the frequency domain. Our experimental results reveal that our approach withstands reconstruction attacks while outperforming previous state-of-the-art methods regarding the privacy-utility trade-off. We further conduct a user study to qualitatively assess our defense of the reconstruction attack.
Title: Privacy-Preserving Image Classification Using ConvMixer with Adaptive Permutation Matrix. (arXiv:2208.02556v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02556
- Code URL: null
- Copy Paste:
[[2208.02556] Privacy-Preserving Image Classification Using ConvMixer with Adaptive Permutation Matrix](http://arxiv.org/abs/2208.02556)
- Summary:
In this paper, we propose a privacy-preserving image classification method using encrypted images under the use of the ConvMixer structure. Block-wise scrambled images, which are robust enough against various attacks, have been used for privacy-preserving image classification tasks, but the combined use of a classification network and an adaptation network is needed to reduce the influence of image encryption. However, images with a large size cannot be applied to the conventional method with an adaptation network because the adaptation network has so many parameters. Accordingly, we propose a novel method, which allows us not only to apply block-wise scrambled images to ConvMixer for both training and testing without the adaptation network, but also to provide a higher classification accuracy than conventional methods.
Title: Privacy-Preserving Chaotic Extreme Learning Machine with Fully Homomorphic Encryption. (arXiv:2208.02587v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02587
- Code URL: null
- Copy Paste:
[[2208.02587] Privacy-Preserving Chaotic Extreme Learning Machine with Fully Homomorphic Encryption](http://arxiv.org/abs/2208.02587)
- Summary:
The Machine Learning and Deep Learning Models require a lot of data for the training process, and in some scenarios, there might be some sensitive data, such as customer information involved, which the organizations might be hesitant to outsource for model building. Some of the privacy-preserving techniques such as Differential Privacy, Homomorphic Encryption, and Secure Multi-Party Computation can be integrated with different Machine Learning and Deep Learning algorithms to provide security to the data as well as the model. In this paper, we propose a Chaotic Extreme Learning Machine and its encrypted form using Fully Homomorphic Encryption where the weights and biases are generated using a logistic map instead of uniform distribution. Our proposed method has performed either better or similar to the Traditional Extreme Learning Machine on most of the datasets.
protect
Title: Customs Import Declaration Datasets. (arXiv:2208.02484v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02484
- Code URL: null
- Copy Paste:
[[2208.02484] Customs Import Declaration Datasets](http://arxiv.org/abs/2208.02484)
- Summary:
Given the huge volume of cross-border flows, effective and efficient control of trades becomes more crucial in protecting people and society from illicit trades while facilitating legitimate trades. However, limited accessibility of the transaction-level trade datasets hinders the progress of open research, and lots of customs administrations have not benefited from the recent progress in data-based risk management. In this paper, we introduce an import declarations dataset to facilitate the collaboration between the domain experts in customs administrations and data science researchers. The dataset contains 54,000 artificially generated trades with 22 key attributes, and it is synthesized with CTGAN while maintaining correlated features. Synthetic data has several advantages. First, releasing the dataset is free from restrictions that do not allow disclosing the original import data. Second, the fabrication step minimizes the possible identity risk which may exist in trade statistics. Lastly, the published data follow a similar distribution to the source data so that it can be used in various downstream tasks. With the provision of data and its generation process, we open baseline codes for fraud detection tasks, as we empirically show that more advanced algorithms can better detect frauds.
defense
attack
Title: A New Kind of Adversarial Example. (arXiv:2208.02430v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02430
- Code URL: null
- Copy Paste:
[[2208.02430] A New Kind of Adversarial Example](http://arxiv.org/abs/2208.02430)
- Summary:
Almost all adversarial attacks are formulated to add an imperceptible perturbation to an image in order to fool a model. Here, we consider the opposite which is adversarial examples that can fool a human but not a model. A large enough and perceptible perturbation is added to an image such that a model maintains its original decision, whereas a human will most likely make a mistake if forced to decide (or opt not to decide at all). Existing targeted attacks can be reformulated to synthesize such adversarial examples. Our proposed attack, dubbed NKE, is similar in essence to the fooling images, but is more efficient since it uses gradient descent instead of evolutionary algorithms. It also offers a new and unified perspective into the problem of adversarial vulnerability. Experimental results over MNIST and CIFAR-10 datasets show that our attack is quite efficient in fooling deep neural networks. Code is available at https://github.com/aliborji/NKE.
Title: Artificial Image Tampering Distorts Spatial Distribution of Texture Landmarks and Quality Characteristics. (arXiv:2208.02710v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02710
- Code URL: null
- Copy Paste:
[[2208.02710] Artificial Image Tampering Distorts Spatial Distribution of Texture Landmarks and Quality Characteristics](http://arxiv.org/abs/2208.02710)
- Summary:
Advances in AI based computer vision has led to a significant growth in synthetic image generation and artificial image tampering with serious implications for unethical exploitations that undermine person identification and could make render AI predictions less explainable.Morphing, Deepfake and other artificial generation of face photographs undermine the reliability of face biometrics authentication using different electronic ID documents.Morphed face photographs on e-passports can fool automated border control systems and human guards.This paper extends our previous work on using the persistent homology (PH) of texture landmarks to detect morphing attacks.We demonstrate that artificial image tampering distorts the spatial distribution of texture landmarks (i.e. their PH) as well as that of a set of image quality characteristics.We shall demonstrate that the tamper caused distortion of these two slim feature vectors provide significant potentials for building explainable (Handcrafted) tamper detectors with low error rates and suitable for implementation on constrained devices.
Title: Prompt Tuning for Generative Multimodal Pretrained Models. (arXiv:2208.02532v1 [cs.CL])
- Paper URL: http://arxiv.org/abs/2208.02532
- Code URL: https://github.com/ofa-sys/ofa
- Copy Paste:
[[2208.02532] Prompt Tuning for Generative Multimodal Pretrained Models](http://arxiv.org/abs/2208.02532)
- Summary:
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural language pretraining and even vision pretraining. In this work, we explore the transfer of prompt tuning to multimodal pretraining, with a focus on generative multimodal pretrained models, instead of contrastive ones. Specifically, we implement prompt tuning on the unified sequence-to-sequence pretrained model adaptive to both understanding and generation tasks. Experimental results demonstrate that the light-weight prompt tuning can achieve comparable performance with finetuning and surpass other light-weight tuning methods. Besides, in comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks. We further figure out that experimental factors, including the prompt length, prompt depth, and reparameteratization, have great impacts on the model performance, and thus we empirically provide a recommendation for the setups of prompt tuning. Despite the observed advantages, we still find some limitations in prompt tuning, and we correspondingly point out the directions for future studies. Codes are available at \url{https://github.com/OFA-Sys/OFA}
Title: On False Data Injection Attack against Building Automation Systems. (arXiv:2208.02733v1 [cs.CR])
- Paper URL: http://arxiv.org/abs/2208.02733
- Code URL: null
- Copy Paste:
[[2208.02733] On False Data Injection Attack against Building Automation Systems](http://arxiv.org/abs/2208.02733)
- Summary:
KNX is one of the most popular protocols for a building automation system (BAS). However, its lack of security makes it subject to a variety of attacks. In this paper, we perform the first study of false data injection attack against a KNX based BAS. We design a man-in-the-middle (MITM) attack to change the data from a temperature sensor and inject false data to the BAS. We model the BAS system and formally analyze the impact of the false data injection attack on the system in term of energy cost. We find a small amount of erroneous input can incur significant energy cost, but is very hard to detect based on sensor data such as temperature alone. Since the MITM attack may disturb the KNX traffic pattern, we design a machine learning (ML) based detection strategy to detect the false data injection attack based on sophisticated features of the KNX telegram inter-arrival time. We perform real-world experiments and validate the presented false data injection attacks and ML detection strategy. We also simulate a BAS system and show that our proposed attack strategies can have a huge impact on BAS power consumption.
robust
Title: Estimating Visual Information From Audio Through Manifold Learning. (arXiv:2208.02337v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02337
- Code URL: null
- Copy Paste:
[[2208.02337] Estimating Visual Information From Audio Through Manifold Learning](http://arxiv.org/abs/2208.02337)
- Summary:
We propose a new framework for extracting visual information about a scene only using audio signals. Audio-based methods can overcome some of the limitations of vision-based methods i.e., they do not require "line-of-sight", are robust to occlusions and changes in illumination, and can function as a backup in case vision/lidar sensors fail. Therefore, audio-based methods can be useful even for applications in which only visual information is of interest Our framework is based on Manifold Learning and consists of two steps. First, we train a Vector-Quantized Variational Auto-Encoder to learn the data manifold of the particular visual modality we are interested in. Second, we train an Audio Transformation network to map multi-channel audio signals to the latent representation of the corresponding visual sample. We show that our method is able to produce meaningful images from audio using a publicly available audio/visual dataset. In particular, we consider the prediction of the following visual modalities from audio: depth and semantic segmentation. We hope the findings of our work can facilitate further research in visual information extraction from audio. Code is available at: https://github.com/ubc-vision/audio_manifold.
Title: A Multibranch Convolutional Neural Network for Hyperspectral Unmixing. (arXiv:2208.02361v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02361
- Code URL: null
- Copy Paste:
[[2208.02361] A Multibranch Convolutional Neural Network for Hyperspectral Unmixing](http://arxiv.org/abs/2208.02361)
- Summary:
Hyperspectral unmixing remains one of the most challenging tasks in the analysis of such data. Deep learning has been blooming in the field and proved to outperform other classic unmixing techniques, and can be effectively deployed onboard Earth observation satellites equipped with hyperspectral imagers. In this letter, we follow this research pathway and propose a multi-branch convolutional neural network that benefits from fusing spectral, spatial, and spectral-spatial features in the unmixing process. The results of our experiments, backed up with the ablation study, revealed that our techniques outperform others from the literature and lead to higher-quality fractional abundance estimation. Also, we investigated the influence of reducing the training sets on the capabilities of all algorithms and their robustness against noise, as capturing large and representative ground-truth sets is time-consuming and costly in practice, especially in emerging Earth observation scenarios.
Title: Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing. (arXiv:2208.02397v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02397
- Code URL: null
- Copy Paste:
[[2208.02397] Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing](http://arxiv.org/abs/2208.02397)
- Summary:
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. First, a region proposal algorithm detects object candidates in the document page images. Next, deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. Finally, candidate images are ranked by computing the feature similarity with a given input query. A robust experimental protocol evaluates the proposed approach considering each representation scheme (real-valued and binary code) on the DocExplore image database. The experimental results show that the proposed deep models compare favorably to the state-of-the-art image retrieval approaches for images of historical documents, outperforming other deep models by 2.56 percentage points using the same techniques for pattern spotting. Besides, the proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works based on real-valued representations.
Title: CFARnet: deep learning for target detection with constant false alarm rate. (arXiv:2208.02474v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02474
- Code URL: null
- Copy Paste:
[[2208.02474] CFARnet: deep learning for target detection with constant false alarm rate](http://arxiv.org/abs/2208.02474)
- Summary:
We consider the problem of learning detectors with a Constant False Alarm Rate (CFAR). Classical model-based solutions to composite hypothesis testing are sensitive to imperfect models and are often computationally expensive. In contrast, data-driven machine learning is often more robust and yields classifiers with fixed computational complexity. Learned detectors usually do not have a CFAR as required in many applications. To close this gap, we introduce CFARnet where the loss function is penalized to promote similar distributions of the detector under any null hypothesis scenario. Asymptotic analysis in the case of linear models with general Gaussian noise reveals that the classical generalized likelihood ratio test (GLRT) is actually a minimizer of the CFAR constrained Bayes risk. Experiments in both synthetic data and real hyper-spectral images show that CFARnet leads to near CFAR detectors with similar accuracy as their competitors.
Title: Heart rate estimation in intense exercise videos. (arXiv:2208.02509v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02509
- Code URL: null
- Copy Paste:
[[2208.02509] Heart rate estimation in intense exercise videos](http://arxiv.org/abs/2208.02509)
- Summary:
Estimating heart rate from video allows non-contact health monitoring with applications in patient care, human interaction, and sports. Existing work can robustly measure heart rate under some degree of motion by face tracking. However, this is not always possible in unconstrained settings, as the face might be occluded or even outside the camera. Here, we present IntensePhysio: a challenging video heart rate estimation dataset with realistic face occlusions, severe subject motion, and ample heart rate variation. To ensure heart rate variation in a realistic setting we record each subject for around 1-2 hours. The subject is exercising (at a moderate to high intensity) on a cycling ergometer with an attached video camera and is given no instructions regarding positioning or movement. We have 11 subjects, and approximately 20 total hours of video. We show that the existing remote photo-plethysmography methods have difficulty in estimating heart rate in this setting. In addition, we present IBIS-CNN, a new baseline using spatio-temporal superpixels, which improves on existing models by eliminating the need for a visible face/face tracking. We will make the code and data publically available soon.
Title: MVSFormer: Learning Robust Image Representations via Transformers and Temperature-based Depth for Multi-View Stereo. (arXiv:2208.02541v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02541
- Code URL: null
- Copy Paste:
[[2208.02541] MVSFormer: Learning Robust Image Representations via Transformers and Temperature-based Depth for Multi-View Stereo](http://arxiv.org/abs/2208.02541)
- Summary:
Feature representation learning is the key recipe for learning-based Multi-View Stereo (MVS). As the common feature extractor of learning-based MVS, vanilla Feature Pyramid Networks (FPN) suffers from discouraged feature representations for reflection and texture-less areas, which limits the generalization of MVS. Even FPNs worked with pre-trained Convolutional Neural Networks (CNNs) fail to tackle these issues. On the other hand, Vision Transformers (ViTs) have achieved prominent success in many 2D vision tasks. Thus we ask whether ViTs can facilitate the feature learning in MVS? In this paper, we propose a pre-trained ViT enhanced MVS network called MVSFormer, which can learn more reliable feature representations benefited by informative priors from ViT. Then MVSFormer-P and MVSFormer-H are further proposed with fixed ViT weights and trainable ones respectively. MVSFormer-P is more efficient while MVSFormer-H can achieve superior performance. To make ViTs robust to arbitrary resolutions for MVS tasks, we propose to use an efficient multi-scale training with gradient accumulation. Moreover, we discuss the merits and drawbacks of classification and regression-based MVS methods, and further propose to unify them with a temperature-based strategy. MVSFormer achieves state-of-the-art performance on the DTU dataset. Particularly, our anonymous submission of MVSFormer is ranked in the Top-1 position on both intermediate and advanced sets of the highly competitive Tanks-and-Temples leaderboard on the day of submission compared with other published works. Codes and models will be released.
Title: Semantic Interleaving Global Channel Attention for Multilabel Remote Sensing Image Classification. (arXiv:2208.02613v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02613
- Code URL: null
- Copy Paste:
[[2208.02613] Semantic Interleaving Global Channel Attention for Multilabel Remote Sensing Image Classification](http://arxiv.org/abs/2208.02613)
- Summary:
Multi-Label Remote Sensing Image Classification (MLRSIC) has received increasing research interest. Taking the cooccurrence relationship of multiple labels as additional information helps to improve the performance of this task. Current methods focus on using it to constrain the final feature output of a Convolutional Neural Network (CNN). On the one hand, these methods do not make full use of label correlation to form feature representation. On the other hand, they increase the label noise sensitivity of the system, resulting in poor robustness. In this paper, a novel method called Semantic Interleaving Global Channel Attention (SIGNA) is proposed for MLRSIC. First, the label co-occurrence graph is obtained according to the statistical information of the data set. The label co-occurrence graph is used as the input of the Graph Neural Network (GNN) to generate optimal feature representations. Then, the semantic features and visual features are interleaved, to guide the feature expression of the image from the original feature space to the semantic feature space with embedded label relations. SIGNA triggers global attention of feature maps channels in a new semantic feature space to extract more important visual features. Multihead SIGNA based feature adaptive weighting networks are proposed to act on any layer of CNN in a plug-and-play manner. For remote sensing images, better classification performance can be achieved by inserting CNN into the shallow layer. We conduct extensive experimental comparisons on three data sets: UCM data set, AID data set, and DFC15 data set. Experimental results demonstrate that the proposed SIGNA achieves superior classification performance compared to state-of-the-art (SOTA) methods. It is worth mentioning that the codes of this paper will be open to the community for reproducibility research. Our codes are available at https://github.com/kyle-one/SIGNA.
Title: DropKey. (arXiv:2208.02646v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02646
- Code URL: null
- Copy Paste:
[[2208.02646] DropKey](http://arxiv.org/abs/2208.02646)
- Summary:
In this paper, we focus on analyzing and improving the dropout technique for self-attention layers of Vision Transformer, which is important while surprisingly ignored by prior works. In particular, we conduct researches on three core questions: First, what to drop in self-attention layers? Different from dropping attention weights in literature, we propose to move dropout operations forward ahead of attention matrix calculation and set the Key as the dropout unit, yielding a novel dropout-before-softmax scheme. We theoretically verify that this scheme helps keep both regularization and probability features of attention weights, alleviating the overfittings problem to specific patterns and enhancing the model to globally capture vital information; Second, how to schedule the drop ratio in consecutive layers? In contrast to exploit a constant drop ratio for all layers, we present a new decreasing schedule that gradually decreases the drop ratio along the stack of self-attention layers. We experimentally validate the proposed schedule can avoid overfittings in low-level features and missing in high-level semantics, thus improving the robustness and stableness of model training; Third, whether need to perform structured dropout operation as CNN? We attempt patch-based block-version of dropout operation and find that this useful trick for CNN is not essential for ViT. Given exploration on the above three questions, we present the novel DropKey method that regards Key as the drop unit and exploits decreasing schedule for drop ratio, improving ViTs in a general way. Comprehensive experiments demonstrate the effectiveness of DropKey for various ViT architectures, \emph{e.g.} T2T and VOLO, as well as for various vision tasks, \emph{e.g.}, image classification, object detection, human-object interaction detection and human body shape recovery. Codes will be released upon acceptance.
Title: Globally Consistent Video Depth and Pose Estimation with Efficient Test-Time Training. (arXiv:2208.02709v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02709
- Code URL: null
- Copy Paste:
[[2208.02709] Globally Consistent Video Depth and Pose Estimation with Efficient Test-Time Training](http://arxiv.org/abs/2208.02709)
- Summary:
Dense depth and pose estimation is a vital prerequisite for various video applications. Traditional solutions suffer from the robustness of sparse feature tracking and insufficient camera baselines in videos. Therefore, recent methods utilize learning-based optical flow and depth prior to estimate dense depth. However, previous works require heavy computation time or yield sub-optimal depth results. We present GCVD, a globally consistent method for learning-based video structure from motion (SfM) in this paper. GCVD integrates a compact pose graph into the CNN-based optimization to achieve globally consistent estimation from an effective keyframe selection mechanism. It can improve the robustness of learning-based methods with flow-guided keyframes and well-established depth prior. Experimental results show that GCVD outperforms the state-of-the-art methods on both depth and pose estimation. Besides, the runtime experiments reveal that it provides strong efficiency in both short- and long-term videos with global consistency provided.
Title: Bayesian regularization of empirical MDPs. (arXiv:2208.02362v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02362
- Code URL: null
- Copy Paste:
[[2208.02362] Bayesian regularization of empirical MDPs](http://arxiv.org/abs/2208.02362)
- Summary:
In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.
biometric
Title: OCFR 2022: Competition on Occluded Face Recognition From Synthetically Generated Structure-Aware Occlusions. (arXiv:2208.02760v1 [cs.CV])
- Paper URL: http://arxiv.org/abs/2208.02760
- Code URL: null
- Copy Paste:
[[2208.02760] OCFR 2022: Competition on Occluded Face Recognition From Synthetically Generated Structure-Aware Occlusions](http://arxiv.org/abs/2208.02760)
- Summary:
This work summarizes the IJCB Occluded Face Recognition Competition 2022 (IJCB-OCFR-2022) embraced by the 2022 International Joint Conference on Biometrics (IJCB 2022). OCFR-2022 attracted a total of 3 participating teams, from academia. Eventually, six valid submissions were submitted and then evaluated by the organizers. The competition was held to address the challenge of face recognition in the presence of severe face occlusions. The participants were free to use any training data and the testing data was built by the organisers by synthetically occluding parts of the face images using a well-known dataset. The submitted solutions presented innovations and performed very competitively with the considered baseline. A major output of this competition is a challenging, realistic, and diverse, and publicly available occluded face recognition benchmark with well defined evaluation protocols.
steal
extraction
Title: GROWN+UP: A Graph Representation Of a Webpage Network Utilizing Pre-training. (arXiv:2208.02252v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02252
- Code URL: null
- Copy Paste:
[[2208.02252] GROWN+UP: A Graph Representation Of a Webpage Network Utilizing Pre-training](http://arxiv.org/abs/2208.02252)
- Summary:
Large pre-trained neural networks are ubiquitous and critical to the success of many downstream tasks in natural language processing and computer vision. However, within the field of web information retrieval, there is a stark contrast in the lack of similarly flexible and powerful pre-trained models that can properly parse webpages. Consequently, we believe that common machine learning tasks like content extraction and information mining from webpages have low-hanging gains that yet remain untapped.
We aim to close the gap by introducing an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually.
Finally, we show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification, thus lending support to its potential application in diverse downstream tasks.
membership infer
federate
Title: FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning. (arXiv:2208.02442v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02442
- Code URL: null
- Copy Paste:
[[2208.02442] FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning](http://arxiv.org/abs/2208.02442)
- Summary:
The uneven distribution of local data across different edge devices (clients) results in slow model training and accuracy reduction in federated learning. Naive federated learning (FL) strategy and most alternative solutions attempted to achieve more fairness by weighted aggregating deep learning models across clients. This work introduces a novel non-IID type encountered in real-world datasets, namely cluster-skew, in which groups of clients have local data with similar distributions, causing the global model to converge to an over-fitted solution. To deal with non-IID data, particularly the cluster-skewed data, we propose FedDRL, a novel FL model that employs deep reinforcement learning to adaptively determine each client's impact factor (which will be used as the weights in the aggregation process). Extensive experiments on a suite of federated datasets confirm that the proposed FedDRL improves favorably against FedAvg and FedProx methods, e.g., up to 4.05% and 2.17% on average for the CIFAR-100 dataset, respectively.
Title: ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity. (arXiv:2208.02507v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02507
- Code URL: null
- Copy Paste:
[[2208.02507] ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity](http://arxiv.org/abs/2208.02507)
- Summary:
When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.
fair
Title: Invariant Representations with Stochastically Quantized Neural Networks. (arXiv:2208.02656v1 [cs.LG])
- Paper URL: http://arxiv.org/abs/2208.02656
- Code URL: null
- Copy Paste:
[[2208.02656] Invariant Representations with Stochastically Quantized Neural Networks](http://arxiv.org/abs/2208.02656)
- Summary:
Representation learning algorithms offer the opportunity to learn invariant representations of the input data with regard to nuisance factors. Many authors have leveraged such strategies to learn fair representations, i.e., vectors where information about sensitive attributes is removed. These methods are attractive as they may be interpreted as minimizing the mutual information between a neural layer's activations and a sensitive attribute. However, the theoretical grounding of such methods relies either on the computation of infinitely accurate adversaries or on minimizing a variational upper bound of a mutual information estimate. In this paper, we propose a methodology for direct computation of the mutual information between a neural layer and a sensitive attribute. We employ stochastically-activated binary neural networks, which lets us treat neurons as random variables. We are then able to compute (not bound) the mutual information between a layer and a sensitive attribute and use this information as a regularization factor during gradient descent. We show that this method compares favorably with the state of the art in fair representation learning and that the learned representations display a higher level of invariance compared to full-precision neural networks.