secure

Title: Light Weight Cryptographic Address Generation Using System State Entropy Gathering for IPv6 Based MANETs. (arXiv:2303.17914v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17914
Code URL: null
Copy Paste: [[2303.17914] Light Weight Cryptographic Address Generation Using System State Entropy Gathering for IPv6 Based MANETs](http://arxiv.org/abs/2303.17914) #secure
Summary:
In IPv6 based MANETs, the neighbor discovery enables nodes to self-configure and communicate with neighbor nodes through autoconfiguration. The Stateless address autoconfiguration (SLAAC) has proven to face several security issues. Even though the Secure Neighbor Discovery (SeND) uses Cryptographically Generated Addresses (CGA) to address these issues, it creates other concerns such as need for CA to authenticate hosts, exposure to CPU exhaustion attacks and high computational intensity. These issues are major concern for MANETs as it possesses limited bandwidth and processing power. The paper proposes empirically strong Light Weight Cryptographic Address Generation (LW-CGA) using entropy gathered from system states. Even the system users cannot monitor these system states; hence LW-CGA provides high security with minimal computational complexity and proves to be more suitable for MANETs. The LW-CGA and SeND are implemented and tested to study the performances. The evaluation shows that LW-CGA with good runtime throughput takes minimal address generation latency.

Title: MARTSIA: Enabling Data Confidentiality for Blockchain-based Process Execution. (arXiv:2303.17977v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17977
Code URL: null
Copy Paste: [[2303.17977] MARTSIA: Enabling Data Confidentiality for Blockchain-based Process Execution](http://arxiv.org/abs/2303.17977) #secure
Summary:
Multi-party business processes rely on the collaboration of various players in a decentralized setting. Blockchain technology can facilitate the automation of these processes, even in cases where trust among participants is limited. Transactions are stored in a ledger, a replica of which is retained by every node of the blockchain network. The operations saved thereby are thus publicly accessible. While this enhances transparency, reliability, and persistence, it hinders the utilization of public blockchains for process automation as it violates typical confidentiality requirements in corporate settings. In this paper, we propose MARTSIA: A Multi-Authority Approach to Transaction Systems for Interoperating Applications. MARTSIA enables precise control over process data at the level of message parts. Based on Multi-Authority Attribute-Based Encryption (MA-ABE), MARTSIA realizes a number of desirable properties, including confidentiality, transparency, and auditability. We implemented our approach in proof-of-concept prototypes, with which we conduct a case study in the area of supply chain management. Also, we show the integration of MARTSIA with a state-of-the-art blockchain-based process execution engine to secure the data flow.

Title: Towards A Sustainable and Ethical Supply Chain Management: The Potential of IoT Solutions. (arXiv:2303.18135v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18135
Code URL: null
Copy Paste: [[2303.18135] Towards A Sustainable and Ethical Supply Chain Management: The Potential of IoT Solutions](http://arxiv.org/abs/2303.18135) #secure
Summary:
Globalization has introduced many new challenges making Supply chain management (SCM) complex and huge, for which improvement is needed in many industries. The Internet of Things (IoT) has solved many problems by providing security and traceability with a promising solution for supply chain management. SCM is segregated into different processes, each requiring different types of solutions. IoT devices can solve distributed system problems by creating trustful relationships. Since the whole business industry depends on the trust between different supply chain actors, IoT can provide this trust by making the entire ecosystem much more secure, reliable, and traceable. This paper will discuss how IoT technology has solved problems related to SCM in different areas. Supply chains in different industries, from pharmaceuticals to agriculture supply chain, have different issues and require different solutions. We will discuss problems such as security, tracking, traceability, and warehouse issues. All challenges faced by independent industries regarding the supply chain and how the amalgamation of IoT with other technology will be provided with solutions.

security

Title: Benchmarking FedAvg and FedCurv for Image Classification Tasks. (arXiv:2303.17942v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17942
Code URL: null
Copy Paste: [[2303.17942] Benchmarking FedAvg and FedCurv for Image Classification Tasks](http://arxiv.org/abs/2303.17942) #security
Summary:
Classic Machine Learning techniques require training on data available in a single data lake. However, aggregating data from different owners is not always convenient for different reasons, including security, privacy and secrecy. Data carry a value that might vanish when shared with others; the ability to avoid sharing the data enables industrial applications where security and privacy are of paramount importance, making it possible to train global models by implementing only local policies which can be run independently and even on air-gapped data centres. Federated Learning (FL) is a distributed machine learning approach which has emerged as an effective way to address privacy concerns by only sharing local AI models while keeping the data decentralized. Two critical challenges of Federated Learning are managing the heterogeneous systems in the same federated network and dealing with real data, which are often not independently and identically distributed (non-IID) among the clients. In this paper, we focus on the second problem, i.e., the problem of statistical heterogeneity of the data in the same federated network. In this setting, local models might be strayed far from the local optimum of the complete dataset, thus possibly hindering the convergence of the federated model. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv), aiming at tackling the non-IID setting, have already been proposed. This work provides an empirical assessment of the behaviour of FedAvg and FedCurv in common non-IID scenarios. Results show that the number of epochs per round is an important hyper-parameter that, when tuned appropriately, can lead to significant performance gains while reducing the communication cost. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.

Title: AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking. (arXiv:2303.18131v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18131
Code URL: null
Copy Paste: [[2303.18131] AdvCheck: Characterizing Adversarial Examples via Local Gradient Checking](http://arxiv.org/abs/2303.18131) #security
Summary:
Deep neural networks (DNNs) are vulnerable to adversarial examples, which may lead to catastrophe in security-critical domains. Numerous detection methods are proposed to characterize the feature uniqueness of adversarial examples, or to distinguish DNN's behavior activated by the adversarial examples. Detections based on features cannot handle adversarial examples with large perturbations. Besides, they require a large amount of specific adversarial examples. Another mainstream, model-based detections, which characterize input properties by model behaviors, suffer from heavy computation cost. To address the issues, we introduce the concept of local gradient, and reveal that adversarial examples have a quite larger bound of local gradient than the benign ones. Inspired by the observation, we leverage local gradient for detecting adversarial examples, and propose a general framework AdvCheck. Specifically, by calculating the local gradient from a few benign examples and noise-added misclassified examples to train a detector, adversarial examples and even misclassified natural inputs can be precisely distinguished from benign ones. Through extensive experiments, we have validated the AdvCheck's superior performance to the state-of-the-art (SOTA) baselines, with detection rate ($\sim \times 1.2$) on general adversarial attacks and ($\sim \times 1.4$) on misclassified natural inputs on average, with average 1/500 time cost. We also provide interpretable results for successful detection.

Title: A Comparative Analysis on Volatility and Scalability Properties of Blockchain Compression Protocols. (arXiv:2303.17643v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17643
Code URL: null
Copy Paste: [[2303.17643] A Comparative Analysis on Volatility and Scalability Properties of Blockchain Compression Protocols](http://arxiv.org/abs/2303.17643) #security
Summary:
Increasing popularity of trading digital assets can lead to significant delays in Blockchain networks when processing transactions. When transaction fees become miners' primary revenue, an imbalance in reward may lead to miners adopting deviant mining strategies. Scaling the block capacity is one of the potential approaches to alleviate the problem. To address this issue, this paper reviews and evaluates six state-of-the-art compression protocols for Blockchains. Specifically, we designed a Monte Carlo simulation to simulate two of the six protocols to observe their compression performance under larger block capacities. Furthermore, extensive simulation experiments were conducted to observe the mining behaviour when the block capacity is increased. Experimental results reveal an interesting trade-off between volatility and scalability. When the throughput is higher than a critical point, it worsens the volatility and threatens Blockchain security. In the experiments, we further analyzed the relationship between volatility and scalability properties with respect to the distribution of transaction values. Based on the analysis results, we proposed the recommended maximum block size for each protocol. At last, we discuss the further improvement of the compression protocols.

Title: Pentimento: Data Remanence in Cloud FPGAs. (arXiv:2303.17881v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17881
Code URL: null
Copy Paste: [[2303.17881] Pentimento: Data Remanence in Cloud FPGAs](http://arxiv.org/abs/2303.17881) #security
Summary:
Cloud FPGAs strike an alluring balance between computational efficiency, energy efficiency, and cost. It is the flexibility of the FPGA architecture that enables these benefits, but that very same flexibility that exposes new security vulnerabilities. We show that a remote attacker can recover "FPGA pentimenti" - long-removed secret data belonging to a prior user of a cloud FPGA. The sensitive data constituting an FPGA pentimento is an analog imprint from bias temperature instability (BTI) effects on the underlying transistors. We demonstrate how this slight degradation can be measured using a time-to-digital (TDC) converter when an adversary programs one into the target cloud FPGA.

This technique allows an attacker to ascertain previously safe information on cloud FPGAs, even after it is no longer explicitly present. Notably, it can allow an attacker who knows a non-secret "skeleton" (the physical structure, but not the contents) of the victim's design to (1) extract proprietary details from an encrypted FPGA design image available on the AWS marketplace and (2) recover data loaded at runtime by a previous user of a cloud FPGA using a known design. Our experiments show that BTI degradation (burn-in) and recovery are measurable and constitute a security threat to commercial cloud FPGAs.

Title: Machine learning for discovering laws of nature. (arXiv:2303.17607v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17607
Code URL: null
Copy Paste: [[2303.17607] Machine learning for discovering laws of nature](http://arxiv.org/abs/2303.17607) #security
Summary:
A microscopic particle obeys the principles of quantum mechanics -- so where is the sharp boundary between the macroscopic and microscopic worlds? It was this "interpretation problem" that prompted Schr\"odinger to propose his famous thought experiment (a cat that is simultaneously both dead and alive) and sparked a great debate about the quantum measurement problem, and there is still no satisfactory answer yet. This is precisely the inadequacy of rigorous mathematical models in describing the laws of nature. We propose a computational model to describe and understand the laws of nature based on Darwin's natural selection. In fact, whether it's a macro particle, a micro electron or a security, they can all be considered as an entity, the change of this entity over time can be described by a data series composed of states and values. An observer can learn from this data series to construct theories (usually consisting of functions and differential equations). We don't model with the usual functions or differential equations, but with a state Decision Tree (determines the state of an entity) and a value Function Tree (determines the distance between two points of an entity). A state Decision Tree and a value Function Tree together can reconstruct an entity's trajectory and make predictions about its future trajectory. Our proposed algorithmic model discovers laws of nature by only learning observed historical data (sequential measurement of observables) based on maximizing the observer's expected value. There is no differential equation in our model; our model has an emphasis on machine learning, where the observer builds up his/her experience by being rewarded or punished for each decision he/she makes, and eventually leads to rediscovering Newton's law, the Born rule (quantum mechanics) and the efficient market hypothesis (financial market).

privacy

Title: SOSR: Source-Free Image Super-Resolution with Wavelet Augmentation Transformer. (arXiv:2303.17783v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17783
Code URL: null
Copy Paste: [[2303.17783] SOSR: Source-Free Image Super-Resolution with Wavelet Augmentation Transformer](http://arxiv.org/abs/2303.17783) #privacy
Summary:
Real-world images taken by different cameras with different degradation kernels often result in a cross-device domain gap in image super-resolution. A prevalent attempt to this issue is unsupervised domain adaptation (UDA) that needs to access source data. Considering privacy policies or transmission restrictions of data in many practical applications, we propose a SOurce-free image Super-Resolution framework (SOSR) to address this issue, i.e., adapt a model pre-trained on labeled source data to a target domain with only unlabeled target data. SOSR leverages the source model to generate refined pseudo-labels for teacher-student learning. To better utilize the pseudo-labels, this paper proposes a novel wavelet-based augmentation method, named Wavelet Augmentation Transformer (WAT), which can be flexibly incorporated with existing networks, to implicitly produce useful augmented data. WAT learns low-frequency information of varying levels across diverse samples, which is aggregated efficiently via deformable attention. Furthermore, an uncertainty-aware self-training mechanism is proposed to improve the accuracy of pseudo-labels, with inaccurate predictions being rectified by uncertainty estimation. To acquire better SR results and avoid overfitting pseudo-labels, several regularization losses are proposed to constrain the frequency information between target LR and SR images. Experiments show that without accessing source data, SOSR achieves superior results to the state-of-the-art UDA methods.

Title: Automatic Detection of Out-of-body Frames in Surgical Videos for Privacy Protection Using Self-supervised Learning and Minimal Labels. (arXiv:2303.18106v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18106
Code URL: null
Copy Paste: [[2303.18106] Automatic Detection of Out-of-body Frames in Surgical Videos for Privacy Protection Using Self-supervised Learning and Minimal Labels](http://arxiv.org/abs/2303.18106) #privacy
Summary:
Endoscopic video recordings are widely used in minimally invasive robot-assisted surgery, but when the endoscope is outside the patient's body, it can capture irrelevant segments that may contain sensitive information. To address this, we propose a framework that accurately detects out-of-body frames in surgical videos by leveraging self-supervision with minimal data labels. We use a massive amount of unlabeled endoscopic images to learn meaningful representations in a self-supervised manner. Our approach, which involves pre-training on an auxiliary task and fine-tuning with limited supervision, outperforms previous methods for detecting out-of-body frames in surgical videos captured from da Vinci X and Xi surgical systems. The average F1 scores range from 96.00 to 98.02. Remarkably, using only 5% of the training labels, our approach still maintains an average F1 score performance above 97, outperforming fully-supervised methods with 95% fewer labels. These results demonstrate the potential of our framework to facilitate the safe handling of surgical video recordings and enhance data privacy protection in minimally invasive surgery.

Title: A CI-based Auditing Framework for Data Collection Practices. (arXiv:2303.17740v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17740
Code URL: null
Copy Paste: [[2303.17740] A CI-based Auditing Framework for Data Collection Practices](http://arxiv.org/abs/2303.17740) #privacy
Summary:
Apps and devices (mobile devices, web browsers, IoT, VR, voice assistants, etc.) routinely collect user data, and send them to first- and third-party servers through the network. Recently, there is a lot of interest in (1) auditing the actual data collection practices of those systems; and also in (2) checking the consistency of those practices against the statements made in the corresponding privacy policies. In this paper, we argue that the contextual integrity (CI) tuple can be the basic building block for defining and implementing such an auditing framework. We elaborate on the special case where the tuple is partially extracted from the network traffic generated by the end-device of interest, and partially from the corresponding privacy policies using natural language processing (NLP) techniques. Along the way, we discuss related bodies of work and representative examples that fit into that framework. More generally, we believe that CI can be the building block not only for auditing at the edge, but also for specifying privacy policies and system APIs. We also discuss limitations and directions for future work.

Title: On R\'{e}nyi Differential Privacy in Statistics-Based Synthetic Data Generation. (arXiv:2303.17849v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17849
Code URL: null
Copy Paste: [[2303.17849] On R\'{e}nyi Differential Privacy in Statistics-Based Synthetic Data Generation](http://arxiv.org/abs/2303.17849) #privacy
Summary:
Privacy protection with synthetic data generation often uses differentially private statistics and model parameters to quantitatively express theoretical security. However, these methods do not take into account privacy protection due to the randomness of data generation. In this paper, we theoretically evaluate R\'{e}nyi differential privacy of the randomness in data generation of a synthetic data generation method that uses the mean vector and the covariance matrix of an original dataset. Specifically, for a fixed $\alpha > 1$, we show the condition of $\varepsilon$ such that the synthetic data generation satisfies $(\alpha, \varepsilon)$-R\'{e}nyi differential privacy under a bounded neighboring condition and an unbounded neighboring condition, respectively. In particular, under the unbounded condition, when the size of the original dataset and synthetic datase is 10 million, the mechanism satisfies $(4, 0.576)$-R\'{e}nyi differential privacy. We also show that when we translate it into the traditional $(\varepsilon, \delta)$-differential privacy, the mechanism satisfies $(4.00, 10^{-10})$-differential privacy.

Title: Differentially Private Stream Processing at Scale. (arXiv:2303.18086v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18086
Code URL: null
Copy Paste: [[2303.18086] Differentially Private Stream Processing at Scale](http://arxiv.org/abs/2303.18086) #privacy
Summary:
We design, to the best of our knowledge, the first differentially private (DP) stream processing system at scale. Our system --Differential Privacy SQL Pipelines (DP-SQLP)-- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google.

Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider.

Title: PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences. (arXiv:2303.18200v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18200
Code URL: null
Copy Paste: [[2303.18200] PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences](http://arxiv.org/abs/2303.18200) #privacy
Summary:
Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.

protect

Title: A Desynchronization-Based Countermeasure Against Side-Channel Analysis of Neural Networks. (arXiv:2303.18132v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18132
Code URL: null
Copy Paste: [[2303.18132] A Desynchronization-Based Countermeasure Against Side-Channel Analysis of Neural Networks](http://arxiv.org/abs/2303.18132) #protect
Summary:
Model extraction attacks have been widely applied, which can normally be used to recover confidential parameters of neural networks for multiple layers. Recently, side-channel analysis of neural networks allows parameter extraction even for networks with several multiple deep layers with high effectiveness. It is therefore of interest to implement a certain level of protection against these attacks. In this paper, we propose a desynchronization-based countermeasure that makes the timing analysis of activation functions harder. We analyze the timing properties of several activation functions and design the desynchronization in a way that the dependency on the input and the activation type is hidden. We experimentally verify the effectiveness of the countermeasure on a 32-bit ARM Cortex-M4 microcontroller and employ a t-test to show the side-channel information leakage. The overhead ultimately depends on the number of neurons in the fully-connected layer, for example, in the case of 4096 neurons in VGG-19, the overheads are between 2.8% and 11%.

Title: BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection. (arXiv:2303.18138v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18138
Code URL: https://github.com/git-disl/bert4eth
Copy Paste: [[2303.18138] BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection](http://arxiv.org/abs/2303.18138) #protect
Summary:
As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.

Title: Robust and IP-Protecting Vertical Federated Learning against Unexpected Quitting of Parties. (arXiv:2303.18178v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18178
Code URL: null
Copy Paste: [[2303.18178] Robust and IP-Protecting Vertical Federated Learning against Unexpected Quitting of Parties](http://arxiv.org/abs/2303.18178) #protect
Summary:
Vertical federated learning (VFL) enables a service provider (i.e., active party) who owns labeled features to collaborate with passive parties who possess auxiliary features to improve model performance. Existing VFL approaches, however, have two major vulnerabilities when passive parties unexpectedly quit in the deployment phase of VFL - severe performance degradation and intellectual property (IP) leakage of the active party's labels. In this paper, we propose \textbf{Party-wise Dropout} to improve the VFL model's robustness against the unexpected exit of passive parties and a defense method called \textbf{DIMIP} to protect the active party's IP in the deployment phase. We evaluate our proposed methods on multiple datasets against different inference attacks. The results show that Party-wise Dropout effectively maintains model performance after the passive party quits, and DIMIP successfully disguises label information from the passive party's feature extractor, thereby mitigating IP leakage.

defense

attack

Title: Fooling Polarization-based Vision using Locally Controllable Polarizing Projection. (arXiv:2303.17890v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17890
Code URL: null
Copy Paste: [[2303.17890] Fooling Polarization-based Vision using Locally Controllable Polarizing Projection](http://arxiv.org/abs/2303.17890) #attack
Summary:
Polarization is a fundamental property of light that encodes abundant information regarding surface shape, material, illumination and viewing geometry. The computer vision community has witnessed a blossom of polarization-based vision applications, such as reflection removal, shape-from-polarization, transparent object segmentation and color constancy, partially due to the emergence of single-chip mono/color polarization sensors that make polarization data acquisition easier than ever. However, is polarization-based vision vulnerable to adversarial attacks? If so, is that possible to realize these adversarial attacks in the physical world, without being perceived by human eyes? In this paper, we warn the community of the vulnerability of polarization-based vision, which can be more serious than RGB-based vision. By adapting a commercial LCD projector, we achieve locally controllable polarizing projection, which is successfully utilized to fool state-of-the-art polarization-based vision algorithms for glass segmentation and color constancy. Compared with existing physical attacks on RGB-based vision, which always suffer from the trade-off between attack efficacy and eye conceivability, the adversarial attackers based on polarizing projection are contact-free and visually imperceptible, since naked human eyes can rarely perceive the difference of viciously manipulated polarizing light and ordinary illumination. This poses unprecedented risks on polarization-based vision, both in the monochromatic and trichromatic domain, for which due attentions should be paid and counter measures be considered.

Title: The Blockchain Imitation Game. (arXiv:2303.17877v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17877
Code URL: null
Copy Paste: [[2303.17877] The Blockchain Imitation Game](http://arxiv.org/abs/2303.17877) #attack
Summary:
The use of blockchains for automated and adversarial trading has become commonplace. However, due to the transparent nature of blockchains, an adversary is able to observe any pending, not-yet-mined transactions, along with their execution logic. This transparency further enables a new type of adversary, which copies and front-runs profitable pending transactions in real-time, yielding significant financial gains.

Shedding light on such "copy-paste" malpractice, this paper introduces the Blockchain Imitation Game and proposes a generalized imitation attack methodology called Ape. Leveraging dynamic program analysis techniques, Ape supports the automatic synthesis of adversarial smart contracts. Over a timeframe of one year (1st of August, 2021 to 31st of July, 2022), Ape could have yielded 148.96M USD in profit on Ethereum, and 42.70M USD on BNB Smart Chain (BSC).

Not only as a malicious attack, we further show the potential of transaction and contract imitation as a defensive strategy. Within one year, we find that Ape could have successfully imitated 13 and 22 known Decentralized Finance (DeFi) attacks on Ethereum and BSC, respectively. Our findings suggest that blockchain validators can imitate attacks in real-time to prevent intrusions in DeFi.

Title: Machine-learned Adversarial Attacks against Fault Prediction Systems in Smart Electrical Grids. (arXiv:2303.18136v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18136
Code URL: null
Copy Paste: [[2303.18136] Machine-learned Adversarial Attacks against Fault Prediction Systems in Smart Electrical Grids](http://arxiv.org/abs/2303.18136) #attack
Summary:
In smart electrical grids, fault detection tasks may have a high impact on society due to their economic and critical implications. In the recent years, numerous smart grid applications, such as defect detection and load forecasting, have embraced data-driven methodologies. The purpose of this study is to investigate the challenges associated with the security of machine learning (ML) applications in the smart grid scenario. Indeed, the robustness and security of these data-driven algorithms have not been extensively studied in relation to all power grid applications. We demonstrate first that the deep neural network method used in the smart grid is susceptible to adversarial perturbation. Then, we highlight how studies on fault localization and type classification illustrate the weaknesses of present ML algorithms in smart grids to various adversarial attacks

robust

Title: Establishing baselines and introducing TernaryMixOE for fine-grained out-of-distribution detection. (arXiv:2303.17658v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17658
Code URL: null
Copy Paste: [[2303.17658] Establishing baselines and introducing TernaryMixOE for fine-grained out-of-distribution detection](http://arxiv.org/abs/2303.17658) #robust
Summary:
Machine learning models deployed in the open world may encounter observations that they were not trained to recognize, and they risk misclassifying such observations with high confidence. Therefore, it is essential that these models are able to ascertain what is in-distribution (ID) and out-of-distribution (OOD), to avoid this misclassification. In recent years, huge strides have been made in creating models that are robust to this distinction. As a result, the current state-of-the-art has reached near perfect performance on relatively coarse-grained OOD detection tasks, such as distinguishing horses from trucks, while struggling with finer-grained classification, like differentiating models of commercial aircraft. In this paper, we describe a new theoretical framework for understanding fine- and coarse-grained OOD detection, we re-conceptualize fine grained classification into a three part problem, and we propose a new baseline task for OOD models on two fine-grained hierarchical data sets, two new evaluation methods to differentiate fine- and coarse-grained OOD performance, along with a new loss function for models in this task.

Title: Learning Garment DensePose for Robust Warping in Virtual Try-On. (arXiv:2303.17688v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17688
Code URL: null
Copy Paste: [[2303.17688] Learning Garment DensePose for Robust Warping in Virtual Try-On](http://arxiv.org/abs/2303.17688) #robust
Summary:
Virtual try-on, i.e making people virtually try new garments, is an active research area in computer vision with great commercial applications. Current virtual try-on methods usually work in a two-stage pipeline. First, the garment image is warped on the person's pose using a flow estimation network. Then in the second stage, the warped garment is fused with the person image to render a new try-on image. Unfortunately, such methods are heavily dependent on the quality of the garment warping which often fails when dealing with hard poses (e.g., a person lifting or crossing arms). In this work, we propose a robust warping method for virtual try-on based on a learned garment DensePose which has a direct correspondence with the person's DensePose. Due to the lack of annotated data, we show how to leverage an off-the-shelf person DensePose model and a pretrained flow model to learn the garment DensePose in a weakly supervised manner. The garment DensePose allows a robust warping to any person's pose without any additional computation. Our method achieves the state-of-the-art equivalent on virtual try-on benchmarks and shows warping robustness on in-the-wild person images with hard poses, making it more suited for real-world virtual try-on applications.

Title: Generating Adversarial Samples in Mini-Batches May Be Detrimental To Adversarial Robustness. (arXiv:2303.17720v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17720
Code URL: null
Copy Paste: [[2303.17720] Generating Adversarial Samples in Mini-Batches May Be Detrimental To Adversarial Robustness](http://arxiv.org/abs/2303.17720) #robust
Summary:
Neural networks have been proven to be both highly effective within computer vision, and highly vulnerable to adversarial attacks. Consequently, as the use of neural networks increases due to their unrivaled performance, so too does the threat posed by adversarial attacks. In this work, we build towards addressing the challenge of adversarial robustness by exploring the relationship between the mini-batch size used during adversarial sample generation and the strength of the adversarial samples produced. We demonstrate that an increase in mini-batch size results in a decrease in the efficacy of the samples produced, and we draw connections between these observations and the phenomenon of vanishing gradients. Next, we formulate loss functions such that adversarial sample strength is not degraded by mini-batch size. Our findings highlight a potential risk for underestimating the true (practical) strength of adversarial attacks, and a risk of overestimating a model's robustness. We share our codes to let others replicate our experiments and to facilitate further exploration of the connections between batch size and adversarial sample strength.

Title: Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning. (arXiv:2303.17842v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17842
Code URL: https://github.com/object-understanding/slash
Copy Paste: [[2303.17842] Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning](http://arxiv.org/abs/2303.17842) #robust
Summary:
Object-centric learning (OCL) aspires general and compositional understanding of scenes by representing a scene as a collection of object-centric representations. OCL has also been extended to multi-view image and video datasets to apply various data-driven inductive biases by utilizing geometric or temporal information in the multi-image data. Single-view images carry less information about how to disentangle a given scene than videos or multi-view images do. Hence, owing to the difficulty of applying inductive biases, OCL for single-view images remains challenging, resulting in inconsistent learning of object-centric representation. To this end, we introduce a novel OCL framework for single-view images, SLot Attention via SHepherding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention. The new modules, Attention Refining Kernel (ARK) and Intermediate Point Predictor and Encoder (IPPE), respectively, prevent slots from being distracted by the background noise and indicate locations for slots to focus on to facilitate learning of object-centric representation. We also propose a weak semi-supervision approach for OCL, whilst our proposed framework can be used without any assistant annotation during the inference. Experiments show that our proposed method enables consistent learning of object-centric representation and achieves strong performance across four datasets. Code is available at \url{https://github.com/object-understanding/SLASH}.

Title: WSense: A Robust Feature Learning Module for Lightweight Human Activity Recognition. (arXiv:2303.17845v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17845
Code URL: null
Copy Paste: [[2303.17845] WSense: A Robust Feature Learning Module for Lightweight Human Activity Recognition](http://arxiv.org/abs/2303.17845) #robust
Summary:
In recent times, various modules such as squeeze-and-excitation, and others have been proposed to improve the quality of features learned from wearable sensor signals. However, these modules often cause the number of parameters to be large, which is not suitable for building lightweight human activity recognition models which can be easily deployed on end devices. In this research, we propose a feature learning module, termed WSense, which uses two 1D CNN and global max pooling layers to extract similar quality features from wearable sensor data while ignoring the difference in activity recognition models caused by the size of the sliding window. Experiments were carried out using CNN and ConvLSTM feature learning pipelines on a dataset obtained with a single accelerometer (WISDM) and another obtained using the fusion of accelerometers, gyroscopes, and magnetometers (PAMAP2) under various sliding window sizes. A total of nine hundred sixty (960) experiments were conducted to validate the WSense module against baselines and existing methods on the two datasets. The results showed that the WSense module aided pipelines in learning similar quality features and outperformed the baselines and existing models with a minimal and uniform model size across all sliding window segmentations. The code is available at https://github.com/AOige/WSense.

Title: MapFormer: Boosting Change Detection by Using Pre-change Information. (arXiv:2303.17859v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17859
Code URL: null
Copy Paste: [[2303.17859] MapFormer: Boosting Change Detection by Using Pre-change Information](http://arxiv.org/abs/2303.17859) #robust
Summary:
Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of Conditional Change Detection, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose MapFormer, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery. The code will be made publicly available.

Title: STFAR: Improving Object Detection Robustness at Test-Time by Self-Training with Feature Alignment Regularization. (arXiv:2303.17937v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17937
Code URL: null
Copy Paste: [[2303.17937] STFAR: Improving Object Detection Robustness at Test-Time by Self-Training with Feature Alignment Regularization](http://arxiv.org/abs/2303.17937) #robust
Summary:
Domain adaptation helps generalizing object detection models to target domain data with distribution shift. It is often achieved by adapting with access to the whole target domain data. In a more realistic scenario, target distribution is often unpredictable until inference stage. This motivates us to explore adapting an object detection model at test-time, a.k.a. test-time adaptation (TTA). In this work, we approach test-time adaptive object detection (TTAOD) from two perspective. First, we adopt a self-training paradigm to generate pseudo labeled objects with an exponential moving average model. The pseudo labels are further used to supervise adapting source domain model. As self-training is prone to incorrect pseudo labels, we further incorporate aligning feature distributions at two output levels as regularizations to self-training. To validate the performance on TTAOD, we create benchmarks based on three standard object detection datasets and adapt generic TTA methods to object detection task. Extensive evaluations suggest our proposed method sets the state-of-the-art on test-time adaptive object detection task.

Title: RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving. (arXiv:2303.18084v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18084
Code URL: null
Copy Paste: [[2303.18084] RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving](http://arxiv.org/abs/2303.18084) #robust
Summary:
Point cloud registration is an important task in robotics and autonomous driving to estimate the ego-motion of the vehicle. Recent advances following the coarse-to-fine manner show promising potential in point cloud registration. However, existing methods rely on good superpoint correspondences, which are hard to be obtained reliably and efficiently, thus resulting in less robust and accurate point cloud registration. In this paper, we propose a novel network, named RDMNet, to find dense point correspondences coarse-to-fine and improve final pose estimation based on such reliable correspondences. Our RDMNet uses a devised 3D-RoFormer mechanism to first extract distinctive superpoints and generates reliable superpoints matches between two point clouds. The proposed 3D-RoFormer fuses 3D position information into the transformer network, efficiently exploiting point clouds' contextual and geometric information to generate robust superpoint correspondences. RDMNet then propagates the sparse superpoints matches to dense point matches using the neighborhood information for accurate point cloud registration. We extensively evaluate our method on multiple datasets from different environments. The experimental results demonstrate that our method outperforms existing state-of-the-art approaches in all tested datasets with a strong generalization ability.

Title: Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance. (arXiv:2303.18119v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18119
Code URL: null
Copy Paste: [[2303.18119] Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance](http://arxiv.org/abs/2303.18119) #robust
Summary:
Tracking 3D human motion in real-time is crucial for numerous applications across many fields. Traditional approaches involve attaching artificial fiducial objects or sensors to the body, limiting their usability and comfort-of-use and consequently narrowing their application fields. Recent advances in Artificial Intelligence (AI) have allowed for markerless solutions. However, most of these methods operate in 2D, while those providing 3D solutions compromise accuracy and real-time performance. To address this challenge and unlock the potential of visual pose estimation methods in real-world scenarios, we propose a markerless framework that combines multi-camera views and 2D AI-based pose estimation methods to track 3D human motion. Our approach integrates a Weighted Least Square (WLS) algorithm that computes 3D human motion from multiple 2D pose estimations provided by an AI-driven method. The method is integrated within the Open-VICO framework allowing simulation and real-world execution. Several experiments have been conducted, which have shown high accuracy and real-time performance, demonstrating the high level of readiness for real-world applications and the potential to revolutionize human motion capture.

Title: Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction. (arXiv:2303.18125v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18125
Code URL: null
Copy Paste: [[2303.18125] Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction](http://arxiv.org/abs/2303.18125) #robust
Summary:
This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. Existing methods suffer from two main drawbacks. Firstly, they face challenges in estimating the accurate correction field due to the uniform velocity assumption, leading to significant image correction errors under complex motion. Secondly, the drastic occlusion in dynamic scenes prevents current solutions from achieving better image quality because of the inherent difficulties in aligning and aggregating multiple frames. To tackle these challenges, we model the curvilinear trajectory of pixels analytically and propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixel. Besides, to reconstruct high-quality occlusion frames in dynamic scenes, we present a 3D video architecture that effectively Aligns and Aggregates multi-frame context, namely, RSA^2-Net. We evaluate our method across a broad range of cameras and video sequences, demonstrating its significant superiority. Specifically, our method surpasses the state-of-the-arts by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.

Title: Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection. (arXiv:2303.18174v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18174
Code URL: null
Copy Paste: [[2303.18174] Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection](http://arxiv.org/abs/2303.18174) #robust
Summary:
Despite the fact that DeepFake forgery detection algorithms have achieved impressive performance on known manipulations, they often face disastrous performance degradation when generalized to an unseen manipulation. Some recent works show improvement in generalization but rely on features fragile to image distortions such as compression. To this end, we propose Diff-ID, a concise and effective approach that explains and measures the identity loss induced by facial manipulations. When testing on an image of a specific person, Diff-ID utilizes an authentic image of that person as a reference and aligns them to the same identity-insensitive attribute feature space by applying a face-swapping generator. We then visualize the identity loss between the test and the reference image from the image differences of the aligned pairs, and design a custom metric to quantify the identity loss. The metric is then proved to be effective in distinguishing the forgery images from the real ones. Extensive experiments show that our approach achieves high detection performance on DeepFake images and state-of-the-art generalization ability to unknown forgery methods, while also being robust to image distortions.

Title: DIME-FM: DIstilling Multimodal and Efficient Foundation Models. (arXiv:2303.18232v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18232
Code URL: null
Copy Paste: [[2303.18232] DIME-FM: DIstilling Multimodal and Efficient Foundation Models](http://arxiv.org/abs/2303.18232) #robust
Summary:
Large Vision-Language Foundation Models (VLFM), such as CLIP, ALIGN and Florence, are trained on large-scale datasets of image-caption pairs and achieve superior transferability and robustness on downstream tasks, but they are difficult to use in many practical applications due to their large size, high latency and fixed architectures. Unfortunately, recent work shows training a small custom VLFM for resource-limited applications is currently very difficult using public and smaller-scale data. In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences. We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28.4M unpaired public sentences. The resulting model "Distill-ViT-B/32" rivals the CLIP-ViT-B/32 model pre-trained on its private WiT dataset (400M image-text pairs): Distill-ViT-B/32 achieves similar results in terms of zero-shot and linear-probing performance on both ImageNet and the ELEVATER (20 image classification tasks) benchmarks. It also displays comparable robustness when evaluated on five datasets with natural distribution shifts from ImageNet.

Title: Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning. (arXiv:2303.18011v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.18011
Code URL: null
Copy Paste: [[2303.18011] Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning](http://arxiv.org/abs/2303.18011) #robust
Summary:
Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT). However, feeding multiple morphologically languages into a single model during training reduces the NMT's performance. In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training. This single reference translation limits the GAN model from learning sufficient information about the source sentence representation. Thus, in this article, we propose Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence interpolation by learning the intermediate latent representation of the source and target sentences of multilingual language pairs. Apart from latent representation, we also use the Wasserstein-GAN approach for the multilingual NMT model by incorporating the model generated sentences of multiple languages for reward computation. This computed reward optimizes the performance of the GAN-based multilingual model in an effective manner. We demonstrate the experiments on low-resource language pairs and find that our approach outperforms the existing state-of-the-art approaches for multilingual NMT with a performance gain of up to 4 BLEU points. Moreover, we use our trained model on zero-shot language pairs under an unsupervised scenario and show the robustness of the proposed approach.

Title: Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency. (arXiv:2303.18191v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.18191
Code URL: https://github.com/cgcl-codes/teco
Copy Paste: [[2303.18191] Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency](http://arxiv.org/abs/2303.18191) #robust
Summary:
Deep neural networks are proven to be vulnerable to backdoor attacks. Detecting the trigger samples during the inference stage, i.e., the test-time trigger sample detection, can prevent the backdoor from being triggered. However, existing detection methods often require the defenders to have high accessibility to victim models, extra clean data, or knowledge about the appearance of backdoor triggers, limiting their practicality. In this paper, we propose the test-time corruption robustness consistency evaluation (TeCo), a novel test-time trigger sample detection method that only needs the hard-label outputs of the victim models without any extra information. Our journey begins with the intriguing observation that the backdoor-infected models have similar performance across different image corruptions for the clean images, but perform discrepantly for the trigger samples. Based on this phenomenon, we design TeCo to evaluate test-time robustness consistency by calculating the deviation of severity that leads to predictions' transition across different corruptions. Extensive experiments demonstrate that compared with state-of-the-art defenses, which even require either certain information about the trigger types or accessibility of clean data, TeCo outperforms them on different backdoor attacks, datasets, and model architectures, enjoying a higher AUROC by 10% and 5 times of stability.

Title: Towards Adversarially Robust Continual Learning. (arXiv:2303.17764v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17764
Code URL: null
Copy Paste: [[2303.17764] Towards Adversarially Robust Continual Learning](http://arxiv.org/abs/2303.17764) #robust
Summary:
Recent studies show that models trained by continual learning can achieve the comparable performances as the standard supervised learning and the learning flexibility of continual learning models enables their wide applications in the real world. Deep learning models, however, are shown to be vulnerable to adversarial attacks. Though there are many studies on the model robustness in the context of standard supervised learning, protecting continual learning from adversarial attacks has not yet been investigated. To fill in this research gap, we are the first to study adversarial robustness in continual learning and propose a novel method called \textbf{T}ask-\textbf{A}ware \textbf{B}oundary \textbf{A}ugmentation (TABA) to boost the robustness of continual learning models. With extensive experiments on CIFAR-10 and CIFAR-100, we show the efficacy of adversarial training and TABA in defending adversarial attacks.

Title: Conflict-Averse Gradient Optimization of Ensembles for Effective Offline Model-Based Optimization. (arXiv:2303.17934v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17934
Code URL: null
Copy Paste: [[2303.17934] Conflict-Averse Gradient Optimization of Ensembles for Effective Offline Model-Based Optimization](http://arxiv.org/abs/2303.17934) #robust
Summary:
Data-driven offline model-based optimization (MBO) is an established practical approach to black-box computational design problems for which the true objective function is unknown and expensive to query. However, the standard approach which optimizes designs against a learned proxy model of the ground truth objective can suffer from distributional shift. Specifically, in high-dimensional design spaces where valid designs lie on a narrow manifold, the standard approach is susceptible to producing out-of-distribution, invalid designs that "fool" the learned proxy model into outputting a high value. Using an ensemble rather than a single model as the learned proxy can help mitigate distribution shift, but naive formulations for combining gradient information from the ensemble, such as minimum or mean gradient, are still suboptimal and often hampered by non-convergent behavior.

In this work, we explore alternate approaches for combining gradient information from the ensemble that are robust to distribution shift without compromising optimality of the produced designs. More specifically, we explore two functions, formulated as convex optimization problems, for combining gradient information: multiple gradient descent algorithm (MGDA) and conflict-averse gradient descent (CAGrad). We evaluate these algorithms on a diverse set of five computational design tasks. We compare performance of ensemble MBO with MGDA and ensemble MBO with CAGrad with three naive baseline algorithms: (a) standard single-model MBO, (b) ensemble MBO with mean gradient, and (c) ensemble MBO with minimum gradient.

Our results suggest that MGDA and CAGrad strike a desirable balance between conservatism and optimality and can help robustify data-driven offline MBO without compromising optimality of designs.

Title: Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading. (arXiv:2303.18055v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.18055
Code URL: null
Copy Paste: [[2303.18055] Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading](http://arxiv.org/abs/2303.18055) #robust
Summary:
Additive manufacturing has been recognized as an industrial technological revolution for manufacturing, which allows fabrication of materials with complex three-dimensional (3D) structures directly from computer-aided design models. The mechanical properties of interpenetrating phase composites (IPCs), especially response to dynamic loading, highly depend on their 3D structures. In general, for each specified structural design, it could take hours or days to perform either finite element analysis (FEA) or experiments to test the mechanical response of IPCs to a given dynamic load. To accelerate the physics-based prediction of mechanical properties of IPCs for various structural designs, we employ a deep neural operator (DNO) to learn the transient response of IPCs under dynamic loading as surrogate of physics-based FEA models. We consider a 3D IPC beam formed by two metals with a ratio of Young's modulus of 2.7, wherein random blocks of constituent materials are used to demonstrate the generality and robustness of the DNO model. To obtain FEA results of IPC properties, 5,000 random time-dependent strain loads generated by a Gaussian process kennel are applied to the 3D IPC beam, and the reaction forces and stress fields inside the IPC beam under various loading are collected. Subsequently, the DNO model is trained using an incremental learning method with sequence-to-sequence training implemented in JAX, leading to a 100X speedup compared to widely used vanilla deep operator network models. After an offline training, the DNO model can act as surrogate of physics-based FEA to predict the transient mechanical response in terms of reaction force and stress distribution of the IPCs to various strain loads in one second at an accuracy of 98%. Also, the learned operator is able to provide extended prediction of the IPC beam subject to longer random strain loads at a reasonably well accuracy.

Title: Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks. (arXiv:2303.18083v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.18083
Code URL: null
Copy Paste: [[2303.18083] Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks](http://arxiv.org/abs/2303.18083) #robust
Summary:
As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.

biometric

steal

extraction

Title: Knowledge Distillation for Feature Extraction in Underwater VSLAM. (arXiv:2303.17981v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17981
Code URL: https://github.com/jinghe-mel/ufen-slam
Copy Paste: [[2303.17981] Knowledge Distillation for Feature Extraction in Underwater VSLAM](http://arxiv.org/abs/2303.17981) #extraction
Summary:
In recent years, learning-based feature detection and matching have outperformed manually-designed methods in in-air cases. However, it is challenging to learn the features in the underwater scenario due to the absence of annotated underwater datasets. This paper proposes a cross-modal knowledge distillation framework for training an underwater feature detection and matching network (UFEN). In particular, we use in-air RGBD data to generate synthetic underwater images based on a physical underwater imaging formation model and employ these as the medium to distil knowledge from a teacher model SuperPoint pretrained on in-air images. We embed UFEN into the ORB-SLAM3 framework to replace the ORB feature by introducing an additional binarization layer. To test the effectiveness of our method, we built a new underwater dataset with groundtruth measurements named EASI (https://github.com/Jinghe-mel/UFEN-SLAM), recorded in an indoor water tank for different turbidity levels. The experimental results on the existing dataset and our new dataset demonstrate the effectiveness of our method.

Title: Task Oriented Conversational Modelling With Subjective Knowledge. (arXiv:2303.17695v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.17695
Code URL: https://github.com/raja-kumar/knowledge-grounded-tods
Copy Paste: [[2303.17695] Task Oriented Conversational Modelling With Subjective Knowledge](http://arxiv.org/abs/2303.17695) #extraction
Summary:
Existing conversational models are handled by a database(DB) and API based systems. However, very often users' questions require information that cannot be handled by such systems. Nonetheless, answers to these questions are available in the form of customer reviews and FAQs. DSTC-11 proposes a three stage pipeline consisting of knowledge seeking turn detection, knowledge selection and response generation to create a conversational model grounded on this subjective knowledge. In this paper, we focus on improving the knowledge selection module to enhance the overall system performance. In particular, we propose entity retrieval methods which result in an accurate and faster knowledge search. Our proposed Named Entity Recognition (NER) based entity retrieval method results in 7X faster search compared to the baseline model. Additionally, we also explore a potential keyword extraction method which can improve the accuracy of knowledge selection. Preliminary results show a 4 \% improvement in exact match score on knowledge selection task. The code is available https://github.com/raja-kumar/knowledge-grounded-TODS

Title: Evaluation of GPT and BERT-based models on identifying protein-protein interactions in biomedical text. (arXiv:2303.17728v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.17728
Code URL: null
Copy Paste: [[2303.17728] Evaluation of GPT and BERT-based models on identifying protein-protein interactions in biomedical text](http://arxiv.org/abs/2303.17728) #extraction
Summary:
Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. However, with the fast-paced growth of biomedical literature, there is a growing need for automated and accurate extraction of PPIs to facilitate scientific knowledge discovery. Pre-trained language models, such as generative pre-trained transformer (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks. We evaluated the PPI identification performance of various GPT and BERT models using a manually curated benchmark corpus of 164 PPIs in 77 sentences from learning language in logic (LLL). BERT-based models achieved the best overall performance, with PubMedBERT achieving the highest precision (85.17%) and F1-score (86.47%) and BioM-ALBERT achieving the highest recall (93.83%). Despite not being explicitly trained for biomedical texts, GPT-4 achieved comparable performance to the best BERT models with 83.34% precision, 76.57% recall, and 79.18% F1-score. These findings suggest that GPT models can effectively detect PPIs from text data and have the potential for use in biomedical literature mining tasks.

Title: JobHam-place with smart recommend job options and candidate filtering options. (arXiv:2303.17930v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.17930
Code URL: null
Copy Paste: [[2303.17930] JobHam-place with smart recommend job options and candidate filtering options](http://arxiv.org/abs/2303.17930) #extraction
Summary:
Due to the increasing number of graduates, many applicants experience the situation about finding a job, and employers experience difficulty filtering job applicants, which might negatively impact their effectiveness. However, most job-hunting websites lack job recommendation and CV filtering or ranking functionality, which are not integrated into the system. Thus, a smart job hunter combined with the above functionality will be conducted in this project, which contains job recommendations, CV ranking and even a job dashboard for skills and job applicant functionality. Job recommendation and CV ranking starts from the automatic keyword extraction and end with the Job/CV ranking algorithm. Automatic keyword extraction is implemented by Job2Skill and the CV2Skill model based on Bert. Job2Skill consists of two components, text encoder and Gru-based layers, while CV2Skill is mainly based on Bert and fine-tunes the pre-trained model by the Resume- Entity dataset. Besides, to match skills from CV and job description and rank lists of jobs and candidates, job/CV ranking algorithms have been provided to compute the occurrence ratio of skill words based on TFIDF score and match ratio of the total skill numbers. Besides, some advanced features have been integrated into the website to improve user experiences, such as the calendar and sweetalert2 plugin. And some basic features to go through job application processes, such as job application tracking and interview arrangement.

Title: Dataset and Baseline System for Multi-lingual Extraction and Normalization of Temporal and Numerical Expressions. (arXiv:2303.18103v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.18103
Code URL: null
Copy Paste: [[2303.18103] Dataset and Baseline System for Multi-lingual Extraction and Normalization of Temporal and Numerical Expressions](http://arxiv.org/abs/2303.18103) #extraction
Summary:
Temporal and numerical expression understanding is of great importance in many downstream Natural Language Processing (NLP) and Information Retrieval (IR) tasks. However, much previous work covers only a few sub-types and focuses only on entity extraction, which severely limits the usability of identified mentions. In order for such entities to be useful in downstream scenarios, coverage and granularity of sub-types are important; and, even more so, providing resolution into concrete values that can be manipulated. Furthermore, most previous work addresses only a handful of languages. Here we describe a multi-lingual evaluation dataset - NTX - covering diverse temporal and numerical expressions across 14 languages and covering extraction, normalization, and resolution. Along with the dataset we provide a robust rule-based system as a strong baseline for comparisons against other models to be evaluated in this dataset. Data and code are available at \url{https://aka.ms/NTX}.

membership infer

federate

Title: Federated Learning for Metaverse: A Survey. (arXiv:2303.17987v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2303.17987
Code URL: null
Copy Paste: [[2303.17987] Federated Learning for Metaverse: A Survey](http://arxiv.org/abs/2303.17987) #federate
Summary:
The metaverse, which is at the stage of innovation and exploration, faces the dilemma of data collection and the problem of private data leakage in the process of development. This can seriously hinder the widespread deployment of the metaverse. Fortunately, federated learning (FL) is a solution to the above problems. FL is a distributed machine learning paradigm with privacy-preserving features designed for a large number of edge devices. Federated learning for metaverse (FL4M) will be a powerful tool. Because FL allows edge devices to participate in training tasks locally using their own data, computational power, and model-building capabilities. Applying FL to the metaverse not only protects the data privacy of participants but also reduces the need for high computing power and high memory on servers. Until now, there have been many studies about FL and the metaverse, respectively. In this paper, we review some of the early advances of FL4M, which will be a research direction with unlimited development potential. We first introduce the concepts of metaverse and FL, respectively. Besides, we discuss the convergence of key metaverse technologies and FL in detail, such as big data, communication technology, the Internet of Things, edge computing, blockchain, and extended reality. Finally, we discuss some key challenges and promising directions of FL4M in detail. In summary, we hope that our up-to-date brief survey can help people better understand FL4M and build a fair, open, and secure metaverse.

Title: Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis. (arXiv:2303.17885v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17885
Code URL: null
Copy Paste: [[2303.17885] Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis](http://arxiv.org/abs/2303.17885) #federate
Summary:
A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.

fair

Title: FairGen: Towards Fair Graph Generation. (arXiv:2303.17743v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17743
Code URL: null
Copy Paste: [[2303.17743] FairGen: Towards Fair Graph Generation](http://arxiv.org/abs/2303.17743) #fair
Summary:
There have been tremendous efforts over the past decades dedicated to the generation of realistic graphs in a variety of domains, ranging from social networks to computer networks, from gene regulatory networks to online transaction networks. Despite the remarkable success, the vast majority of these works are unsupervised in nature and are typically trained to minimize the expected graph reconstruction loss, which would result in the representation disparity issue in the generated graphs, i.e., the protected groups (often minorities) contribute less to the objective and thus suffer from systematically higher errors. In this paper, we aim to tailor graph generation to downstream mining tasks by leveraging label information and user-preferred parity constraint. In particular, we start from the investigation of representation disparity in the context of graph generative models. To mitigate the disparity, we propose a fairness-aware graph generative model named FairGen. Our model jointly trains a label-informed graph generation module and a fair representation learning module by progressively learning the behaviors of the protected and unprotected groups, from the easy' concepts to thehard' ones. In addition, we propose a generic context sampling strategy for graph generative models, which is proven to be capable of fairly capturing the contextual information of each group with a high probability. Experimental results on seven real-world data sets, including web-based graphs, demonstrate that FairGen (1) obtains performance on par with state-of-the-art graph generative models across six network properties, (2) mitigates the representation disparity issues in the generated graphs, and (3) substantially boosts the model performance by up to 17% in downstream tasks via data augmentation.

Title: WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading Dataset. (arXiv:2303.17876v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2303.17876
Code URL: null
Copy Paste: [[2303.17876] WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading Dataset](http://arxiv.org/abs/2303.17876) #fair
Summary:
We create WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed to support the development of fair and transparent NLP models. WebQAmGaze includes webcam eye-tracking data from 332 participants naturally reading English, Spanish, and German texts. Each participant performs two reading tasks composed of five texts, a normal reading and an information-seeking task. After preprocessing the data, we find that fixations on relevant spans seem to indicate correctness when answering the comprehension questions. Additionally, we perform a comparative analysis of the data collected to high-quality eye-tracking data. The results show a moderate correlation between the features obtained with the webcam-ET compared to those of a commercial ET device. We believe this data can advance webcam-based reading studies and open a way to cheaper and more accessible data collection. WebQAmGaze is useful to learn about the cognitive processes behind question answering (QA) and to apply these insights to computational models of language understanding.

Title: Mitigating Source Bias for Fairer Weak Supervision. (arXiv:2303.17713v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17713
Code URL: null
Copy Paste: [[2303.17713] Mitigating Source Bias for Fairer Weak Supervision](http://arxiv.org/abs/2303.17713) #fair
Summary:
Weak supervision overcomes the label bottleneck, enabling efficient development of training sets. Millions of models trained on such datasets have been deployed in the real world and interact with users on a daily basis. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown labels -- also ensure that the pseudolabels it produces are highly biased. Surprisingly, given everyday use and the potential for increased bias, weak supervision has not been studied from the point of view of fairness. This work begins such a study. Our departure point is the observation that even when a fair model can be built from a dataset with access to ground-truth labels, the corresponding dataset labeled via weak supervision can be arbitrarily unfair. Fortunately, not all is lost: we propose and empirically validate a model for source unfairness in weak supervision, then introduce a simple counterfactual fairness-based technique that can mitigate these biases. Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness metrics -- in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.

interpretability

Title: Pay Attention: Accuracy Versus Interpretability Trade-off in Fine-tuned Diffusion Models. (arXiv:2303.17908v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17908
Code URL: null
Copy Paste: [[2303.17908] Pay Attention: Accuracy Versus Interpretability Trade-off in Fine-tuned Diffusion Models](http://arxiv.org/abs/2303.17908) #interpretability
Summary:
The recent progress of diffusion models in terms of image quality has led to a major shift in research related to generative models. Current approaches often fine-tune pre-trained foundation models using domain-specific text-to-image pairs. This approach is straightforward for X-ray image generation due to the high availability of radiology reports linked to specific images. However, current approaches hardly ever look at attention layers to verify whether the models understand what they are generating. In this paper, we discover an important trade-off between image fidelity and interpretability in generative diffusion models. In particular, we show that fine-tuning text-to-image models with learnable text encoder leads to a lack of interpretability of diffusion models. Finally, we demonstrate the interpretability of diffusion models by showing that keeping the language encoder frozen, enables diffusion models to achieve state-of-the-art phrase grounding performance on certain diseases for a challenging multi-label segmentation task, without any additional training. Code and models will be available at https://github.com/MischaD/chest-distillation.

Title: Learning with Explicit Shape Priors for Medical Image Segmentation. (arXiv:2303.17967v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17967
Code URL: null
Copy Paste: [[2303.17967] Learning with Explicit Shape Priors for Medical Image Segmentation](http://arxiv.org/abs/2303.17967) #interpretability
Summary:
Medical image segmentation is considered as the basic step for medical image analysis and surgical intervention. And many previous works attempted to incorporate shape priors for designing segmentation models, which is beneficial to attain finer masks with anatomical shape information. Here in our work, we detailedly discuss three types of segmentation models with shape priors, which consist of atlas-based models, statistical-based models and UNet-based models. On the ground that the former two kinds of methods show a poor generalization ability, UNet-based models have dominated the field of medical image segmentation in recent years. However, existing UNet-based models tend to employ implicit shape priors, which do not have a good interpretability and generalization ability on different organs with distinctive shapes. Thus, we proposed a novel shape prior module (SPM), which could explicitly introduce shape priors to promote the segmentation performance of UNet-based models. To evaluate the effectiveness of SPM, we conduct experiments on three challenging public datasets. And our proposed model achieves state-of-the-art performance. Furthermore, SPM shows an outstanding generalization ability on different classic convolution-neural-networks (CNNs) and recent Transformer-based backbones, which can serve as a plug-and-play structure for the segmentation task of different datasets.

explainability

watermark

diffusion

Title: CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition. (arXiv:2303.17778v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17778
Code URL: null
Copy Paste: [[2303.17778] CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition](http://arxiv.org/abs/2303.17778) #diffusion
Summary:
We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features. Inspired by the diffusion models, our method uses a novel iterative refinement process that gradually shifts the embedding spaces from different sources to a single canonical space for better metric learning. In addition, we present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans. The point clouds in CS-Campus3D have representation gaps and other features like different views, point densities, and noise patterns. We show that our CrossLoc3D algorithm can achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall on our CS-Campus3D benchmark and achieves performance comparable to state-of-the-art 3D place recognition method on the Oxford RobotCar. We will release the code and CS-Campus3D benchmark.

Title: GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently. (arXiv:2303.17870v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17870
Code URL: null
Copy Paste: [[2303.17870] GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently](http://arxiv.org/abs/2303.17870) #diffusion
Summary:
Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate coherent text within images, particularly for complex glyph structures like Chinese characters. To address this problem, we introduce GlyphDraw, a general learning framework aiming at endowing image generation models with the capacity to generate images embedded with coherent text. To the best of our knowledge, this is the first work in the field of image synthesis to address the generation of Chinese characters. % we first adopt the OCR technique to collect images with Chinese characters as training samples, and extract the text and locations as auxiliary information. We first sophisticatedly design the image-text dataset's construction strategy, then build our model specifically on a diffusion-based image generator and carefully modify the network structure to allow the model to learn drawing Chinese characters with the help of glyph and position information. Furthermore, we maintain the model's open-domain image synthesis capability by preventing catastrophic forgetting by using a variety of training techniques. Extensive qualitative and quantitative experiments demonstrate that our method not only produces accurate Chinese characters as in prompts, but also naturally blends the generated text into the background. Please refer to https://1073521013.github.io/glyph-draw.github.io

Title: 3D-aware Image Generation using 2D Diffusion Models. (arXiv:2303.17905v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17905
Code URL: null
Copy Paste: [[2303.17905] 3D-aware Image Generation using 2D Diffusion Models](http://arxiv.org/abs/2303.17905) #diffusion
Summary:
In this paper, we introduce a novel 3D-aware image generation method that leverages 2D diffusion models. We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. This allows us to utilize 2D diffusion models to boost the generative modeling power of the method. Additionally, we incorporate depth information from monocular depth estimators to construct the training data for the conditional diffusion model using only still images. We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods. It produces high-quality images that significantly outperform prior methods. Furthermore, our approach showcases its capability to generate instances with large view angles, even though the training images are diverse and unaligned, gathered from "in-the-wild" real-world environments.

Title: IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection. (arXiv:2303.17921v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17921
Code URL: null
Copy Paste: [[2303.17921] IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection](http://arxiv.org/abs/2303.17921) #diffusion
Summary:
3D object detection is one of the most important tasks in autonomous driving and robotics. Our research focuses on tackling low efficiency issue of point-based methods on large-scale point clouds. Existing point-based methods adopt farthest point sampling (FPS) strategy for downsampling, which is computationally expensive in terms of inference time and memory consumption when the number of point cloud increases. In order to improve efficiency, we propose a novel Instance-Centroid Faster Point Sampling Module (IC-FPS) , which effectively replaces the first Set Abstraction (SA) layer that is extremely tedious. IC-FPS module is comprised of two methods, local feature diffusion based background point filter (LFDBF) and Centroid-Instance Sampling Strategy (CISS). LFDBF is constructed to exclude most invalid background points, while CISS substitutes FPS strategy by fast sampling centroids and instance points. IC-FPS module can be inserted to almost every point-based models. Extensive experiments on multiple public benchmarks have demonstrated the superiority of IC-FPS. On Waymo dataset, the proposed module significantly improves performance of baseline model and accelerates inference speed by 3.8 times. For the first time, real-time detection of point-based models in large-scale point cloud scenario is realized.

Title: Diffusion Action Segmentation. (arXiv:2303.17959v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.17959
Code URL: null
Copy Paste: [[2303.17959] Diffusion Action Segmentation](http://arxiv.org/abs/2303.17959) #diffusion
Summary:
Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. Our paper proposes an essentially different framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are progressively generated from random noise with input video features as conditions. To enhance the modeling of three striking characteristics of human actions, including the position prior, the boundary ambiguity, and the relational dependency, we devise a unified masking strategy for the conditioning inputs in our framework. Extensive experiments on three benchmark datasets, i.e., GTEA, 50Salads, and Breakfast, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action segmentation. Our codes will be made available.

Title: One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models. (arXiv:2303.18080v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18080
Code URL: null
Copy Paste: [[2303.18080] One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models](http://arxiv.org/abs/2303.18080) #diffusion
Summary:
Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at https://github.com/yasserben/DATUM

Title: A Closer Look at Parameter-Efficient Tuning in Diffusion Models. (arXiv:2303.18181v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2303.18181
Code URL: null
Copy Paste: [[2303.18181] A Closer Look at Parameter-Efficient Tuning in Diffusion Models](http://arxiv.org/abs/2303.18181) #diffusion
Summary:
Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language processing, we investigate parameter-efficient tuning in large diffusion models by inserting small learnable modules (termed adapters). In particular, we decompose the design space of adapters into orthogonal factors -- the input position, the output position as well as the function form, and perform Analysis of Variance (ANOVA), a classical statistical approach for analyzing the correlation between discrete (design options) and continuous variables (evaluation metrics). Our analysis suggests that the input position of adapters is the critical factor influencing the performance of downstream tasks. Then, we carefully study the choice of the input position, and we find that putting the input position after the cross-attention block can lead to the best performance, validated by additional visualization analyses. Finally, we provide a recipe for parameter-efficient tuning in diffusion models, which is comparable if not superior to the fully fine-tuned baseline (e.g., DreamBooth) with only 0.75 \% extra parameters, across various customized tasks.

Title: $\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States. (arXiv:2303.18242v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.18242
Code URL: https://github.com/samb-t/infty-diff
Copy Paste: [[2303.18242] $\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States](http://arxiv.org/abs/2303.18242) #diffusion
Summary:
We introduce $\infty$-Diff, a generative diffusion model which directly operates on infinite resolution data. By randomly sampling subsets of coordinates during training and learning to denoise the content at those coordinates, a continuous function is learned that allows sampling at arbitrary resolutions. In contrast to other recent infinite resolution generative models, our approach operates directly on the raw data, not requiring latent vector compression for context, using hypernetworks, nor relying on discrete components. As such, our approach achieves significantly higher sample quality, as evidenced by lower FID scores, as well as being able to effectively scale to higher resolutions than the training data while retaining detail.

Title: HD-GCN:A Hybrid Diffusion Graph Convolutional Network. (arXiv:2303.17966v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2303.17966
Code URL: null
Copy Paste: [[2303.17966] HD-GCN:A Hybrid Diffusion Graph Convolutional Network](http://arxiv.org/abs/2303.17966) #diffusion
Summary:
The information diffusion performance of GCN and its variant models is limited by the adjacency matrix, which can lower their performance. Therefore, we introduce a new framework for graph convolutional networks called Hybrid Diffusion-based Graph Convolutional Network (HD-GCN) to address the limitations of information diffusion caused by the adjacency matrix. In the HD-GCN framework, we initially utilize diffusion maps to facilitate the diffusion of information among nodes that are adjacent to each other in the feature space. This allows for the diffusion of information between similar points that may not have an adjacent relationship. Next, we utilize graph convolution to further propagate information among adjacent nodes after the diffusion maps, thereby enabling the spread of information among similar nodes that are adjacent in the graph. Finally, we employ the diffusion distances obtained through the use of diffusion maps to regularize and constrain the predicted labels of training nodes. This regularization method is then applied to the HD-GCN training, resulting in a smoother classification surface. The model proposed in this paper effectively overcomes the limitations of information diffusion imposed only by the adjacency matrix. HD-GCN utilizes hybrid diffusion by combining information diffusion between neighborhood nodes in the feature space and adjacent nodes in the adjacency matrix. This method allows for more comprehensive information propagation among nodes, resulting in improved model performance. We evaluated the performance of DM-GCN on three well-known citation network datasets and the results showed that the proposed framework is more effective than several graph-based semi-supervised learning methods.