secure

Title: PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment. (arXiv:2306.15513v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15513
Code URL: null
Copy Paste: [[2306.15513] PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment](http://arxiv.org/abs/2306.15513) #secure
Summary:
Two-party computation (2PC) is promising to enable privacy-preserving deep learning (DL). However, the 2PC-based privacy-preserving DL implementation comes with high comparison protocol overhead from the non-linear operators. This work presents PASNet, a novel systematic framework that enables low latency, high energy efficiency & accuracy, and security-guaranteed 2PC-DL by integrating the hardware latency of the cryptographic building block into the neural architecture search loss function. We develop a cryptographic hardware scheduler and the corresponding performance model for Field Programmable Gate Arrays (FPGA) as a case study. The experimental results demonstrate that our light-weighted model PASNet-A and heavily-weighted model PASNet-B achieve 63 ms and 228 ms latency on private inference on ImageNet, which are 147 and 40 times faster than the SOTA CryptGPU system, and achieve 70.54% & 78.79% accuracy and more than 1000 times higher energy efficiency.

security

Title: Irregular Change Detection in Sparse Bi-Temporal Point Clouds using Learned Place Recognition Descriptors and Point-to-Voxel Comparison. (arXiv:2306.15416v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15416
Code URL: null
Copy Paste: [[2306.15416] Irregular Change Detection in Sparse Bi-Temporal Point Clouds using Learned Place Recognition Descriptors and Point-to-Voxel Comparison](http://arxiv.org/abs/2306.15416) #security
Summary:
Change detection and irregular object extraction in 3D point clouds is a challenging task that is of high importance not only for autonomous navigation but also for updating existing digital twin models of various industrial environments. This article proposes an innovative approach for change detection in 3D point clouds using deep learned place recognition descriptors and irregular object extraction based on voxel-to-point comparison. The proposed method first aligns the bi-temporal point clouds using a map-merging algorithm in order to establish a common coordinate frame. Then, it utilizes deep learning techniques to extract robust and discriminative features from the 3D point cloud scans, which are used to detect changes between consecutive point cloud frames and therefore find the changed areas. Finally, the altered areas are sampled and compared between the two time instances to extract any obstructions that caused the area to change. The proposed method was successfully evaluated in real-world field experiments, where it was able to detect different types of changes in 3D point clouds, such as object or muck-pile addition and displacement, showcasing the effectiveness of the approach. The results of this study demonstrate important implications for various applications, including safety and security monitoring in construction sites, mapping and exploration and suggests potential future research directions in this field.

Title: Improvise, Adapt, Overcome: Dynamic Resiliency Against Unknown Attack Vectors in Microgrid Cybersecurity Games. (arXiv:2306.15106v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15106
Code URL: null
Copy Paste: [[2306.15106] Improvise, Adapt, Overcome: Dynamic Resiliency Against Unknown Attack Vectors in Microgrid Cybersecurity Games](http://arxiv.org/abs/2306.15106) #security
Summary:
Cyber-physical microgrids are vulnerable to rootkit attacks that manipulate system dynamics to create instabilities in the network. Rootkits tend to hide their access level within microgrid system components to launch sudden attacks that prey on the slow response time of defenders to manipulate system trajectory. This problem can be formulated as a multi-stage, non-cooperative, zero-sum game with the attacker and the defender modeled as opposing players. To solve the game, this paper proposes a deep reinforcement learning-based strategy that dynamically identifies rootkit access levels and isolates incoming manipulations by incorporating changes in the defense plan. A major advantage of the proposed strategy is its ability to establish resiliency without altering the physical transmission/distribution network topology, thereby diminishing potential instability issues. The paper also presents several simulation results and case studies to demonstrate the operating mechanism and robustness of the proposed strategy.

Title: Developing and Deploying Security Applications for In-Vehicle Networks. (arXiv:2306.15588v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15588
Code URL: null
Copy Paste: [[2306.15588] Developing and Deploying Security Applications for In-Vehicle Networks](http://arxiv.org/abs/2306.15588) #security
Summary:
Radiological material transportation is primarily facilitated by heavy-duty on-road vehicles. Modern vehicles have dozens of electronic control units or ECUs, which are small, embedded computers that communicate with sensors and each other for vehicle functionality. ECUs use a standardized network architecture--Controller Area Network or CAN--which presents grave security concerns that have been exploited by researchers and hackers alike. For instance, ECUs can be impersonated by adversaries who have infiltrated an automotive CAN and disable or invoke unintended vehicle functions such as brakes, acceleration, or safety mechanisms. Further, the quality of security approaches varies wildly between manufacturers. Thus, research and development of after-market security solutions have grown remarkably in recent years. Many researchers are exploring deployable intrusion detection and prevention mechanisms using machine learning and data science techniques. However, there is a gap between developing security system algorithms and deploying prototype security appliances in-vehicle. In this paper, we, a research team at Oak Ridge National Laboratory working in this space, highlight challenges in the development pipeline, and provide techniques to standardize methodology and overcome technological hurdles.

Title: Automated Fuzzing Harness Generation for Library APIs and Binary Protocol Parsers. (arXiv:2306.15596v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15596
Code URL: null
Copy Paste: [[2306.15596] Automated Fuzzing Harness Generation for Library APIs and Binary Protocol Parsers](http://arxiv.org/abs/2306.15596) #security
Summary:
Fuzzing is a widely used software security testing technique that is designed to identify vulnerabilities in systems by providing invalid or unexpected input. Continuous fuzzing systems like OSS-FUZZ have been successful in finding security bugs in many different software systems. The typical process of finding security bugs using fuzzing involves several steps: first, the "fuzz-worthy" functions that are likely to contain vulnerabilities must be identified; second, the setup requirements for the API must be understood before it can be called; third, a fuzzing harness must be written and bound to a coverage-guided fuzzer like LLVM's LibFuzzer; and finally, the security bugs discovered by the fuzzing harness must be triaged and checked for reproducibility. This project focuses on automating the first two steps in this process. In particular, we present an automated system that can generate fuzzing harnesses for library APIs and binary protocol parsers by analyzing unit tests. This allows for the scaling of the fuzzing infrastructure in proportion to the growth of the codebase, without the need for manual coding of harnesses. Additionally, we develop a metric to assess the "fuzz-worthiness" of an API, enabling us to prioritize the most promising targets for testing.

privacy

Title: ethp2psim: Evaluating and deploying privacy-enhanced peer-to-peer routing protocols for the Ethereum network. (arXiv:2306.15024v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15024
Code URL: null
Copy Paste: [[2306.15024] ethp2psim: Evaluating and deploying privacy-enhanced peer-to-peer routing protocols for the Ethereum network](http://arxiv.org/abs/2306.15024) #privacy
Summary:
Network-level privacy is the Achilles heel of financial privacy in cryptocurrencies. Financial privacy amounts to achieving and maintaining blockchain- and network-level privacy. Blockchain-level privacy recently received substantial attention. Specifically, several privacy-enhancing technologies were proposed and deployed to enhance blockchain-level privacy. On the other hand, network-level privacy, i.e., privacy on the peer-to-peer layer, has seen far less attention and development. In this work, we aim to provide a peer-to-peer network simulator, ethp2psim, that allows researchers to evaluate the privacy guarantees of privacy-enhanced broadcast and message routing algorithms. Our goal is two-fold. First, we want to enable researchers to implement their proposed protocols in our modular simulator framework. Second, our simulator allows researchers to evaluate the privacy guarantees of privacy-enhanced routing algorithms. Finally, ethp2psim can help choose the right protocol parameters for efficient, robust, and private deployment.

Title: Optimal Differentially Private Learning with Public Data. (arXiv:2306.15056v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15056
Code URL: https://github.com/optimization-for-data-driven-science/dp-with-public-data
Copy Paste: [[2306.15056] Optimal Differentially Private Learning with Public Data](http://arxiv.org/abs/2306.15056) #privacy
Summary:
Differential Privacy (DP) ensures that training a machine learning model does not leak private data. However, the cost of DP is lower model accuracy or higher sample complexity. In practice, we may have access to auxiliary public data that is free of privacy concerns. This has motivated the recent study of what role public data might play in improving the accuracy of DP models. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set while having access to side public data? What algorithms are optimal? 2. How can we harness public data to improve DP model training in practice? We consider these questions in both the local and central models of DP. To answer the first question, we prove tight (up to constant factors) lower and upper bounds that characterize the optimal error rates of three fundamental problems: mean estimation, empirical risk minimization, and stochastic convex optimization. We prove that public data reduces the sample complexity of DP model training. Perhaps surprisingly, we show that the optimal error rates can be attained (up to constants) by either discarding private data and training a public model, or treating public data like it's private data and using an optimal DP algorithm. To address the second question, we develop novel algorithms which are "even more optimal" (i.e. better constants) than the asymptotically optimal approaches described above. For local DP mean estimation with public data, our algorithm is optimal including constants. Empirically, our algorithms show benefits over existing approaches for DP model training with side access to public data.

Title: A New Mathematical Optimization-Based Method for the m-invariance Problem. (arXiv:2306.15371v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15371
Code URL: null
Copy Paste: [[2306.15371] A New Mathematical Optimization-Based Method for the m-invariance Problem](http://arxiv.org/abs/2306.15371) #privacy
Summary:
The issue of ensuring privacy for users who share their personal information has been a growing priority in a business and scientific environment where the use of different types of data and the laws that protect it have increased in tandem. Different technologies have been widely developed for static publications, i.e., where the information is published only once, such as k-anonymity and {\epsilon}-differential privacy. In the case where microdata information is published dynamically, although established notions such as m-invariance and {\tau}-safety already exist, developments for improving utility remain superficial. We propose a new heuristic approach for the NP-hard combinatorial problem of m-invariance and {\tau}-safety, which is based on a mathematical optimization column generation scheme. The quality of a solution to m-invariance and {\tau}-safety can be measured by the Information Loss (IL), a value in [0,100], the closer to 0 the better. We show that our approach improves by far current heuristics, providing in some instances solutions with ILs of 1.87, 8.5 and 1.93, while the state-of-the art methods reported ILs of 39.03, 51.84 and 57.97, respectively.

Title: Identifying Practical Challenges in the Implementation of Technical Measures for Data Privacy Compliance. (arXiv:2306.15497v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15497
Code URL: null
Copy Paste: [[2306.15497] Identifying Practical Challenges in the Implementation of Technical Measures for Data Privacy Compliance](http://arxiv.org/abs/2306.15497) #privacy
Summary:
Modern privacy regulations provide a strict mandate for data processing entities to implement appropriate technical measures to demonstrate compliance. In practice, determining what measures are indeed "appropriate" is not trivial, particularly in light of vague guidelines provided by privacy regulations. To exacerbate the issue, challenges arise not only in the implementation of the technical measures themselves, but also in a variety of factors involving the roles, processes, decisions, and culture surrounding the pursuit of privacy compliance. In this paper, we present 33 challenges faced in the implementation of technical measures for privacy compliance, derived from a qualitative analysis of 16 interviews with privacy professionals. In addition, we evaluate the interview findings in a survey study, which gives way to a discussion of the identified challenges and their implications.

Title: On-device modeling of user's social context and familiar places from smartphone-embedded sensor data. (arXiv:2306.15437v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15437
Code URL: null
Copy Paste: [[2306.15437] On-device modeling of user's social context and familiar places from smartphone-embedded sensor data](http://arxiv.org/abs/2306.15437) #privacy
Summary:
Context modeling and recognition are crucial for adaptive mobile and ubiquitous computing. Context-awareness in mobile environments relies on prompt reactions to context changes. However, current solutions focus on limited context information processed on centralized architectures, risking privacy leakage and lacking personalization. On-device context modeling and recognition are emerging research trends, addressing these concerns. Social interactions and visited locations play significant roles in characterizing daily life scenarios. This paper proposes an unsupervised and lightweight approach to model the user's social context and locations directly on the mobile device. Leveraging the ego-network model, the system extracts high-level, semantic-rich context features from smartphone-embedded sensor data. For the social context, the approach utilizes data on physical and cyber social interactions among users and their devices. Regarding location, it prioritizes modeling the familiarity degree of specific locations over raw location data, such as GPS coordinates and proximity devices. The effectiveness of the proposed approach is demonstrated through three sets of experiments, employing five real-world datasets. These experiments evaluate the structure of social and location ego networks, provide a semantic evaluation of the proposed models, and assess mobile computing performance. Finally, the relevance of the extracted features is showcased by the improved performance of three machine learning models in recognizing daily-life situations. Compared to using only features related to physical context, the proposed approach achieves a 3% improvement in AUROC, 9% in Precision, and 5% in Recall.

Title: Simple Steps to Success: Axiomatics of Distance-Based Algorithmic Recourse. (arXiv:2306.15557v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15557
Code URL: null
Copy Paste: [[2306.15557] Simple Steps to Success: Axiomatics of Distance-Based Algorithmic Recourse](http://arxiv.org/abs/2306.15557) #privacy
Summary:
We propose a novel data-driven framework for algorithmic recourse that offers users interventions to change their predicted outcome. Existing approaches to compute recourse find a set of points that satisfy some desiderata -- e.g. an intervention in the underlying causal graph, or minimizing a cost function. Satisfying these criteria, however, requires extensive knowledge of the underlying model structure, often an unrealistic amount of information in several domains. We propose a data-driven, computationally efficient approach to computing algorithmic recourse. We do so by suggesting directions in the data manifold that users can take to change their predicted outcome. We present Stepwise Explainable Paths (StEP), an axiomatically justified framework to compute direction-based algorithmic recourse. We offer a thorough empirical and theoretical investigation of StEP. StEP offers provable privacy and robustness guarantees, and outperforms the state-of-the-art on several established recourse desiderata.

Title: A Three-Way Knot: Privacy, Fairness, and Predictive Performance Dynamics. (arXiv:2306.15567v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15567
Code URL: null
Copy Paste: [[2306.15567] A Three-Way Knot: Privacy, Fairness, and Predictive Performance Dynamics](http://arxiv.org/abs/2306.15567) #privacy
Summary:
As the frontier of machine learning applications moves further into human interaction, multiple concerns arise regarding automated decision-making. Two of the most critical issues are fairness and data privacy. On the one hand, one must guarantee that automated decisions are not biased against certain groups, especially those unprotected or marginalized. On the other hand, one must ensure that the use of personal information fully abides by privacy regulations and that user identities are kept safe. The balance between privacy, fairness, and predictive performance is complex. However, despite their potential societal impact, we still demonstrate a poor understanding of the dynamics between these optimization vectors. In this paper, we study this three-way tension and how the optimization of each vector impacts others, aiming to inform the future development of safe applications. In light of claims that predictive performance and fairness can be jointly optimized, we find this is only possible at the expense of data privacy. Overall, experimental results show that one of the vectors will be penalized regardless of which of the three we optimize. Nonetheless, we find promising avenues for future work in joint optimization solutions, where smaller trade-offs are observed between the three vectors.

Title: On the Usefulness of Synthetic Tabular Data Generation. (arXiv:2306.15636v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15636
Code URL: null
Copy Paste: [[2306.15636] On the Usefulness of Synthetic Tabular Data Generation](http://arxiv.org/abs/2306.15636) #privacy
Summary:
Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning (ML) training. Privacy-preserving synthetic data generation can accelerate data exchange for downstream tasks, but there is not enough evidence to show how or why synthetic data can boost ML training. In this study, we benchmarked ML performance using synthetic tabular data for four use cases: data sharing, data augmentation, class balancing, and data summarization. We observed marginal improvements for the balancing use case on some datasets. However, we conclude that there is not enough evidence to claim that synthetic tabular data is useful for ML training.

protect

defense

Title: Advancing Adversarial Training by Injecting Booster Signal. (arXiv:2306.15451v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15451
Code URL: null
Copy Paste: [[2306.15451] Advancing Adversarial Training by Injecting Booster Signal](http://arxiv.org/abs/2306.15451) #defense
Summary:
Recent works have demonstrated that deep neural networks (DNNs) are highly vulnerable to adversarial attacks. To defend against adversarial attacks, many defense strategies have been proposed, among which adversarial training has been demonstrated to be the most effective strategy. However, it has been known that adversarial training sometimes hurts natural accuracy. Then, many works focus on optimizing model parameters to handle the problem. Different from the previous approaches, in this paper, we propose a new approach to improve the adversarial robustness by using an external signal rather than model parameters. In the proposed method, a well-optimized universal external signal called a booster signal is injected into the outside of the image which does not overlap with the original content. Then, it boosts both adversarial robustness and natural accuracy. The booster signal is optimized in parallel to model parameters step by step collaboratively. Experimental results show that the booster signal can improve both the natural and robust accuracies over the recent state-of-the-art adversarial training methods. Also, optimizing the booster signal is general and flexible enough to be adopted on any existing adversarial training methods.

Title: MTFS: a Moving Target Defense-Enabled File System for Malware Mitigation. (arXiv:2306.15566v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15566
Code URL: null
Copy Paste: [[2306.15566] MTFS: a Moving Target Defense-Enabled File System for Malware Mitigation](http://arxiv.org/abs/2306.15566) #defense
Summary:
Ransomware has remained one of the most notorious threats in the cybersecurity field. Moving Target Defense (MTD) has been proposed as a novel paradigm for proactive defense. Although various approaches leverage MTD, few of them rely on the operating system and, specifically, the file system, thereby making them dependent on other computing devices. Furthermore, existing ransomware defense techniques merely replicate or detect attacks, without preventing them. Thus, this paper introduces the MTFS overlay file system and the design and implementation of three novel MTD techniques implemented on top of it. One delaying attackers, one trapping recursive directory traversal, and another one hiding file types. The effectiveness of the techniques are shown in two experiments. First, it is shown that the techniques can delay and mitigate ransomware on real IoT devices. Secondly, in a broader scope, the solution was confronted with 14 ransomware samples, highlighting that it can save 97% of the files.

Title: Adversarial Training for Graph Neural Networks. (arXiv:2306.15427v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15427
Code URL: null
Copy Paste: [[2306.15427] Adversarial Training for Graph Neural Networks](http://arxiv.org/abs/2306.15427) #defense
Summary:
Despite its success in the image domain, adversarial training does not (yet) stand out as an effective defense for Graph Neural Networks (GNNs) against graph structure perturbations. In the pursuit of fixing adversarial training (1) we show and overcome fundamental theoretical as well as practical limitations of the adopted graph learning setting in prior work; (2) we reveal that more flexible GNNs based on learnable graph diffusion are able to adjust to adversarial perturbations, while the learned message passing scheme is naturally interpretable; (3) we introduce the first attack for structure perturbations that, while targeting multiple nodes at once, is capable of handling global (graph-level) as well as local (node-level) constraints. Including these contributions, we demonstrate that adversarial training is a state-of-the-art defense against adversarial structure perturbations.

attack

Title: DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization. (arXiv:2306.15164v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15164
Code URL: null
Copy Paste: [[2306.15164] DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization](http://arxiv.org/abs/2306.15164) #attack
Summary:
Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data's probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for training and reduces time consumption by up to 70\% compared to current best-performing adversarial training methods. Experiments demonstrate that DSRM considerably improves BERT's resistance to textual adversarial attacks and achieves state-of-the-art robust accuracy on various benchmarks.

Title: Are aligned neural networks adversarially aligned?. (arXiv:2306.15447v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15447
Code URL: null
Copy Paste: [[2306.15447] Are aligned neural networks adversarially aligned?](http://arxiv.org/abs/2306.15447) #attack
Summary:
Large language models are now tuned to align with the goals of their creators, namely to be "helpful and harmless." These models should respond helpfully to user questions, but refuse to answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study to what extent these models remain aligned, even when interacting with an adversarial user who constructs worst-case inputs (adversarial examples). These inputs are designed to cause the model to emit harmful content that would otherwise be prohibited. We show that existing NLP-based optimization attacks are insufficiently powerful to reliably attack aligned text models: even when current NLP-based attacks fail, we can find adversarial inputs with brute force. As a result, the failure of current attacks should not be seen as proof that aligned text models remain aligned under adversarial inputs.

However the recent trend in large-scale ML models is multimodal models that allow users to provide images that influence the text that is generated. We show these models can be easily attacked, i.e., induced to perform arbitrary un-aligned behavior through adversarial perturbation of the input image. We conjecture that improved NLP attacks may demonstrate this same level of adversarial control over text-only models.

Title: Catch Me If You Can: A New Low-Rate DDoS Attack Strategy Disguised by Feint. (arXiv:2306.15248v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15248
Code URL: null
Copy Paste: [[2306.15248] Catch Me If You Can: A New Low-Rate DDoS Attack Strategy Disguised by Feint](http://arxiv.org/abs/2306.15248) #attack
Summary:
While collaborative systems provide convenience to our lives, they also face many security threats. One of them is the Low-rate Distributed Denial-of-Service (LDDoS) attack, which is a worthy concern. Unlike volumetric DDoS attacks that continuously send large volumes of traffic, LDDoS attacks are more stealthy and difficult to be detected owing to their low-volume feature. Due to its stealthiness and harmfulness, LDDoS has become one of the most destructive attacks in cloud computing. Although a few LDDoS attack detection and defense methods have been proposed, we observe that sophisticated LDDoS attacks (being more stealthy) can bypass some of the existing LDDoS defense methods. To verify our security observation, we proposed a new Feint-based LDDoS (F-LDDoS) attack strategy. In this strategy, we divide a Pulse Interval into a Feinting Interval and an Attack Interval. Unlike the previous LDDoS attacks, the bots also send traffic randomly in the Feinting Interval, thus disguise themselves as benign users during the F-LDDoS attack. In this way, although the victim detects that it is under an LDDoS attack, it is difficult to locate the attack sources and apply mitigation solutions. Experimental results show that F-LDDoS attack can degrade TCP bandwidth 6.7%-14% more than the baseline LDDoS attack. Besides, F-LDDoS also reduces the similarities between bot traffic and aggregated attack traffic, and increases the uncertainty of packet arrival. These results mean that the proposed F-LDDoS is more effective and more stealthy than normal LDDoS attacks. Finally, we discuss the countermeasures of F-LDDoS to draw the attention of defenders and improve the defense methods.

Title: A Highly Accurate Query-Recovery Attack against Searchable Encryption using Non-Indexed Documents. (arXiv:2306.15302v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15302
Code URL: null
Copy Paste: [[2306.15302] A Highly Accurate Query-Recovery Attack against Searchable Encryption using Non-Indexed Documents](http://arxiv.org/abs/2306.15302) #attack
Summary:
Cloud data storage solutions offer customers cost-effective and reduced data management. While attractive, data security issues remain to be a core concern. Traditional encryption protects stored documents, but hinders simple functionalities such as keyword search. Therefore, searchable encryption schemes have been proposed to allow for the search on encrypted data. Efficient schemes leak at least the access pattern (the accessed documents per keyword search), which is known to be exploitable in query recovery attacks assuming the attacker has a significant amount of background knowledge on the stored documents. Existing attacks can only achieve decent results with strong adversary models (e.g. at least 20% of previously known documents or require additional knowledge such as on query frequencies) and they give no metric to evaluate the certainty of recovered queries. This hampers their practical utility and questions their relevance in the real-world.

We propose a refined score attack which achieves query recovery rates of around 85% without requiring exact background knowledge on stored documents; a distributionally similar, but otherwise different (i.e., non-indexed), dataset suffices. The attack starts with very few known queries (around 10 known queries in our experiments over different datasets of varying size) and then iteratively recovers further queries with confidence scores by adding previously recovered queries that had high confidence scores to the set of known queries. Additional to high recovery rates, our approach yields interpretable results in terms of confidence scores.

Title: Your Attack Is Too DUMB: Formalizing Attacker Scenarios for Adversarial Transferability. (arXiv:2306.15363v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15363
Code URL: null
Copy Paste: [[2306.15363] Your Attack Is Too DUMB: Formalizing Attacker Scenarios for Adversarial Transferability](http://arxiv.org/abs/2306.15363) #attack
Summary:
Evasion attacks are a threat to machine learning models, where adversaries attempt to affect classifiers by injecting malicious samples. An alarming side-effect of evasion attacks is their ability to transfer among different models: this property is called transferability. Therefore, an attacker can produce adversarial samples on a custom model (surrogate) to conduct the attack on a victim's organization later. Although literature widely discusses how adversaries can transfer their attacks, their experimental settings are limited and far from reality. For instance, many experiments consider both attacker and defender sharing the same dataset, balance level (i.e., how the ground truth is distributed), and model architecture.

In this work, we propose the DUMB attacker model. This framework allows analyzing if evasion attacks fail to transfer when the training conditions of surrogate and victim models differ. DUMB considers the following conditions: Dataset soUrces, Model architecture, and the Balance of the ground truth. We then propose a novel testbed to evaluate many state-of-the-art evasion attacks with DUMB; the testbed consists of three computer vision tasks with two distinct datasets each, four types of balance levels, and three model architectures. Our analysis, which generated 13K tests over 14 distinct attacks, led to numerous novel findings in the scope of transferable attacks with surrogate models. In particular, mismatches between attackers and victims in terms of dataset source, balance levels, and model architecture lead to non-negligible loss of attack performance.

robust

Title: Efficient High-Resolution Template Matching with Vector Quantized Nearest Neighbour Fields. (arXiv:2306.15010v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15010
Code URL: null
Copy Paste: [[2306.15010] Efficient High-Resolution Template Matching with Vector Quantized Nearest Neighbour Fields](http://arxiv.org/abs/2306.15010) #robust
Summary:
Template matching is a fundamental problem in computer vision and has applications in various fields, such as object detection, image registration, and object tracking. The current state-of-the-art methods rely on nearest-neighbour (NN) matching in which the query feature space is converted to NN space by representing each query pixel with its NN in the template pixels. The NN-based methods have been shown to perform better in occlusions, changes in appearance, illumination variations, and non-rigid transformations. However, NN matching scales poorly with high-resolution data and high feature dimensions. In this work, we present an NN-based template-matching method which efficiently reduces the NN computations and introduces filtering in the NN fields to consider deformations. A vector quantization step first represents the template with $k$ features, then filtering compares the template and query distributions over the $k$ features. We show that state-of-the-art performance was achieved in low-resolution data, and our method outperforms previous methods at higher resolution showing the robustness and scalability of the approach.

Title: Efficient and Accurate Scene Text Detection with Low-Rank Approximation Network. (arXiv:2306.15142v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15142
Code URL: null
Copy Paste: [[2306.15142] Efficient and Accurate Scene Text Detection with Low-Rank Approximation Network](http://arxiv.org/abs/2306.15142) #robust
Summary:
Recently, regression-based methods, which predict parameter curves for localizing texts, are popular in scene text detection. However, these methods struggle to balance concise structure and fast post-processing, and the existing parameter curves are still not ideal for modeling arbitrary-shaped texts, leading to a challenge in balancing speed and accuracy. To tackle these challenges, we firstly propose a dual matching scheme for positive samples, which accelerates inference speed through sparse matching scheme and accelerates model convergence through dense matching scheme. Then, we propose a novel text contour representation method based on low-rank approximation by exploiting the shape correlation between different text contours, which is complete, compact, simplicity and robustness. Based on these designs, we implement an efficient and accurate arbitrary-shaped text detector, named LRANet. Extensive experiments are conducted on three challenging datasets, which demonstrate the accuracy and efficiency of our LRANet over state-of-the-art methods. The code will be released soon.

Title: Transferability Metrics for Object Detection. (arXiv:2306.15306v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15306
Code URL: https://github.com/dataiku-research/transferability_metrics_for_object_detection
Copy Paste: [[2306.15306] Transferability Metrics for Object Detection](http://arxiv.org/abs/2306.15306) #robust
Summary:
Transfer learning aims to make the most of existing pre-trained models to achieve better performance on a new task in limited data scenarios. However, it is unclear which models will perform best on which task, and it is prohibitively expensive to try all possible combinations. If transferability estimation offers a computation-efficient approach to evaluate the generalisation ability of models, prior works focused exclusively on classification settings. To overcome this limitation, we extend transferability metrics to object detection. We design a simple method to extract local features corresponding to each object within an image using ROI-Align. We also introduce TLogME, a transferability metric taking into account the coordinates regression task. In our experiments, we compare TLogME to state-of-the-art metrics in the estimation of transfer performance of the Faster-RCNN object detector. We evaluate all metrics on source and target selection tasks, for real and synthetic datasets, and with different backbone architectures. We show that, over different tasks, TLogME using the local extraction method provides a robust correlation with transfer performance and outperforms other transferability metrics on local and global level features.

Title: Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition. (arXiv:2306.15321v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15321
Code URL: null
Copy Paste: [[2306.15321] Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition](http://arxiv.org/abs/2306.15321) #robust
Summary:
Graph convolutional networks have been widely used in skeleton-based action recognition. However, existing approaches are limited in fine-grained action recognition due to the similarity of inter-class data. Moreover, the noisy data from pose extraction increases the challenge of fine-grained recognition. In this work, we propose a flexible attention block called Channel-Variable Spatial-Temporal Attention (CVSTA) to enhance the discriminative power of spatial-temporal joints and obtain a more compact intra-class feature distribution. Based on CVSTA, we construct a Multi-Dimensional Refinement Graph Convolutional Network (MDR-GCN), which can improve the discrimination among channel-, joint- and frame-level features for fine-grained actions. Furthermore, we propose a Robust Decouple Loss (RDL), which significantly boosts the effect of the CVSTA and reduces the impact of noise. The proposed method combining MDR-GCN with RDL outperforms the known state-of-the-art skeleton-based approaches on fine-grained datasets, FineGym99 and FSD-10, and also on the coarse dataset NTU-RGB+D X-view version.

Title: Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning. (arXiv:2306.15333v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15333
Code URL: null
Copy Paste: [[2306.15333] Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning](http://arxiv.org/abs/2306.15333) #robust
Summary:
This paper proposes Shoggoth, an efficient edge-cloud collaborative architecture, for boosting inference performance on real-time video of changing scenes. Shoggoth uses online knowledge distillation to improve the accuracy of models suffering from data drift and offloads the labeling process to the cloud, alleviating constrained resources of edge devices. At the edge, we design adaptive training using small batches to adapt models under limited computing power, and adaptive sampling of training frames for robustness and reducing bandwidth. The evaluations on the realistic dataset show 15%-20% model accuracy improvement compared to the edge-only strategy and fewer network costs than the cloud-only strategy.

Title: Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning. (arXiv:2306.15457v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15457
Code URL: null
Copy Paste: [[2306.15457] Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning](http://arxiv.org/abs/2306.15457) #robust
Summary:
Recently, it has been widely known that deep neural networks are highly vulnerable and easily broken by adversarial attacks. To mitigate the adversarial vulnerability, many defense algorithms have been proposed. Recently, to improve adversarial robustness, many works try to enhance feature representation by imposing more direct supervision on the discriminative feature. However, existing approaches lack an understanding of learning adversarially robust feature representation. In this paper, we propose a novel training framework called Robust Proxy Learning. In the proposed method, the model explicitly learns robust feature representations with robust proxies. To this end, firstly, we demonstrate that we can generate class-representative robust features by adding class-wise robust perturbations. Then, we use the class representative features as robust proxies. With the class-wise robust features, the model explicitly learns adversarially robust features through the proposed robust proxy learning framework. Through extensive experiments, we verify that we can manually generate robust features, and our proposed learning framework could increase the robustness of the DNNs.

Title: See Through the Fog: Curriculum Learning with Progressive Occlusion in Medical Imaging. (arXiv:2306.15574v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15574
Code URL: null
Copy Paste: [[2306.15574] See Through the Fog: Curriculum Learning with Progressive Occlusion in Medical Imaging](http://arxiv.org/abs/2306.15574) #robust
Summary:
In recent years, deep learning models have revolutionized medical image interpretation, offering substantial improvements in diagnostic accuracy. However, these models often struggle with challenging images where critical features are partially or fully occluded, which is a common scenario in clinical practice. In this paper, we propose a novel curriculum learning-based approach to train deep learning models to handle occluded medical images effectively. Our method progressively introduces occlusion, starting from clear, unobstructed images and gradually moving to images with increasing occlusion levels. This ordered learning process, akin to human learning, allows the model to first grasp simple, discernable patterns and subsequently build upon this knowledge to understand more complicated, occluded scenarios. Furthermore, we present three novel occlusion synthesis methods, namely Wasserstein Curriculum Learning (WCL), Information Adaptive Learning (IAL), and Geodesic Curriculum Learning (GCL). Our extensive experiments on diverse medical image datasets demonstrate substantial improvements in model robustness and diagnostic accuracy over conventional training methodologies.

Title: Structured Dialogue Discourse Parsing. (arXiv:2306.15103v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15103
Code URL: https://github.com/chijames/structured_dialogue_discourse_parsing
Copy Paste: [[2306.15103] Structured Dialogue Discourse Parsing](http://arxiv.org/abs/2306.15103) #robust
Summary:
Dialogue discourse parsing aims to uncover the internal structure of a multi-participant conversation by finding all the discourse~\emph{links} and corresponding~\emph{relations}. Previous work either treats this task as a series of independent multiple-choice problems, in which the link existence and relations are decoded separately, or the encoding is restricted to only local interaction, ignoring the holistic structural information. In contrast, we propose a principled method that improves upon previous work from two perspectives: encoding and decoding. From the encoding side, we perform structured encoding on the adjacency matrix followed by the matrix-tree learning algorithm, where all discourse links and relations in the dialogue are jointly optimized based on latent tree-level distribution. From the decoding side, we perform structured inference using the modified Chiu-Liu-Edmonds algorithm, which explicitly generates the labeled multi-root non-projective spanning tree that best captures the discourse structure. In addition, unlike in previous work, we do not rely on hand-crafted features; this improves the model's robustness. Experiments show that our method achieves new state-of-the-art, surpassing the previous model by 2.3 on STAC and 1.5 on Molweni (F1 scores). \footnote{Code released at~\url{https://github.com/chijames/structured_dialogue_discourse_parsing}.}

Title: A Survey on Out-of-Distribution Evaluation of Neural NLP Models. (arXiv:2306.15261v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15261
Code URL: null
Copy Paste: [[2306.15261] A Survey on Out-of-Distribution Evaluation of Neural NLP Models](http://arxiv.org/abs/2306.15261) #robust
Summary:
Adversarial robustness, domain generalization and dataset biases are three active lines of research contributing to out-of-distribution (OOD) evaluation on neural NLP models. However, a comprehensive, integrated discussion of the three research lines is still lacking in the literature. In this survey, we 1) compare the three lines of research under a unifying definition; 2) summarize the data-generating processes and evaluation protocols for each line of research; and 3) emphasize the challenges and opportunities for future work.

Title: Can Pretrained Language Models Derive Correct Semantics from Corrupt Subwords under Noise?. (arXiv:2306.15268v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15268
Code URL: https://github.com/xinzhel/word_corruption
Copy Paste: [[2306.15268] Can Pretrained Language Models Derive Correct Semantics from Corrupt Subwords under Noise?](http://arxiv.org/abs/2306.15268) #robust
Summary:
For Pretrained Language Models (PLMs), their susceptibility to noise has recently been linked to subword segmentation. However, it is unclear which aspects of segmentation affect their understanding. This study assesses the robustness of PLMs against various disrupted segmentation caused by noise. An evaluation framework for subword segmentation, named Contrastive Lexical Semantic (CoLeS) probe, is proposed. It provides a systematic categorization of segmentation corruption under noise and evaluation protocols by generating contrastive datasets with canonical-noisy word pairs. Experimental results indicate that PLMs are unable to accurately compute word meanings if the noise introduces completely different subwords, small subword fragments, or a large number of additional subwords, particularly when they are inserted within other subwords.

Title: SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate through Compiler Co-design. (arXiv:2306.15656v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15656
Code URL: null
Copy Paste: [[2306.15656] SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate through Compiler Co-design](http://arxiv.org/abs/2306.15656) #robust
Summary:
This paper introduces SparseOptimizer, a novel deep learning optimizer that exploits Moreau-Yosida regularization to naturally induce sparsity in large language models such as BERT, ALBERT and GPT. Key to the design of SparseOptimizer is an embedded shrinkage operator, which imparts sparsity directly within the optimization process. This operator, backed by a sound theoretical framework, includes an analytical solution, thereby reinforcing the optimizer's robustness and efficacy. Crucially, SparseOptimizer's plug-and-play functionality eradicates the need for code modifications, making it a universally adaptable tool for a wide array of large language models. Empirical evaluations on benchmark datasets such as GLUE, RACE, SQuAD1, and SQuAD2 confirm that SparseBERT and SparseALBERT, when sparsified using SparseOptimizer, achieve performance comparable to their dense counterparts, BERT and ALBERT, while significantly reducing their parameter count. Further, this work proposes an innovative optimizer-compiler co-design strategy, demonstrating the potential of inference acceleration (\textbf{3.37x}, \textbf{6.30x}, and \textbf{7.15x} in comparison with Pytorch, TensorFlow, and LLVM generic compile, respectively) in SparseBERT when paired with an appropriately designed compiler. This study represents a significant step forward in the evolution of efficient, scalable, and high-performing large language models, setting a precedent for future exploration and optimization in this domain. The SparseOptimizer code and SparseALBERT model will be made available upon paper acceptance.

Title: [Re] Double Sampling Randomized Smoothing. (arXiv:2306.15221v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15221
Code URL: https://github.com/dsgiitr/re_dsrs
Copy Paste: [[2306.15221] [Re] Double Sampling Randomized Smoothing](http://arxiv.org/abs/2306.15221) #robust
Summary:
This paper is a contribution to the reproducibility challenge in the field of machine learning, specifically addressing the issue of certifying the robustness of neural networks (NNs) against adversarial perturbations. The proposed Double Sampling Randomized Smoothing (DSRS) framework overcomes the limitations of existing methods by using an additional smoothing distribution to improve the robustness certification. The paper provides a clear manifestation of DSRS for a generalized family of Gaussian smoothing and a computationally efficient method for implementation. The experiments on MNIST and CIFAR-10 demonstrate the effectiveness of DSRS, consistently certifying larger robust radii compared to other methods. Also various ablations studies are conducted to further analyze the hyperparameters and effect of adversarial training methods on the certified radius by the proposed framework.

Title: Errorless Robust JPEG Steganography Using Steganographic Polar Codes. (arXiv:2306.15246v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15246
Code URL: null
Copy Paste: [[2306.15246] Errorless Robust JPEG Steganography Using Steganographic Polar Codes](http://arxiv.org/abs/2306.15246) #robust
Summary:
Recently, a robust steganographic algorithm that achieves errorless robustness against JPEG recompression is proposed. The method evaluates the behavior of DCT coefficients after recompression using the local JPEG encoder to select robust coefficients and sets the other coefficients as wet cost. Combining the lattice embedding scheme, the method is errorless by construction. However, the authors only concern with the success rate under theoretical embedding, while the success rate of the implementation with practical steganographic codes is not verified. In this letter, we implement the method with two steganographic codes, i.e., steganographic polar code and syndrome-trellis code. By analyzing the possibility of success embedding of two steganographic codes under wet paper embedding, we discover that steganographic polar code achieves success embedding with a larger number of wet coefficients compared with syndrome-trellis code, which makes steganographic polar code more suitable under the errorless robust embedding paradigm. The experimental results show that the combination of steganographic polar code and errorless robust embedding achieves a higher success rate compared with the implementation with syndrome-trellis code under close security performance.

Title: Energy Modelling and Forecasting for an Underground Agricultural Farm using a Higher Order Dynamic Mode Decomposition Approach. (arXiv:2306.15089v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15089
Code URL: null
Copy Paste: [[2306.15089] Energy Modelling and Forecasting for an Underground Agricultural Farm using a Higher Order Dynamic Mode Decomposition Approach](http://arxiv.org/abs/2306.15089) #robust
Summary:
This paper presents an approach based on higher order dynamic mode decomposition (HODMD) to model, analyse, and forecast energy behaviour in an urban agriculture farm situated in a retrofitted London underground tunnel, where observed measurements are influenced by noisy and occasionally transient conditions. HODMD is a data-driven reduced order modelling method typically used to analyse and predict highly noisy and complex flows in fluid dynamics or any type of complex data from dynamical systems. HODMD is a recent extension of the classical dynamic mode decomposition method (DMD), customised to handle scenarios where the spectral complexity underlying the measurement data is higher than its spatial complexity, such as is the environmental behaviour of the farm. HODMD decomposes temporal data as a linear expansion of physically-meaningful DMD-modes in a semi-automatic approach, using a time-delay embedded approach. We apply HODMD to three seasonal scenarios using real data measured by sensors located at at the cross-sectional centre of the the underground farm. Through the study we revealed three physically-interpretable mode pairs that govern the environmental behaviour at the centre of the farm, consistently across environmental scenarios. Subsequently, we demonstrate how we can reconstruct the fundamental structure of the observed time-series using only these modes, and forecast for three days ahead, as one, compact and interpretable reduced-order model. We find HODMD to serve as a robust, semi-automatic modelling alternative for predictive modelling in Digital Twins.

Title: Exploiting Inferential Structure in Neural Processes. (arXiv:2306.15169v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15169
Code URL: https://github.com/dvtailor/np-structured-inference
Copy Paste: [[2306.15169] Exploiting Inferential Structure in Neural Processes](http://arxiv.org/abs/2306.15169) #robust
Summary:
Neural Processes (NPs) are appealing due to their ability to perform fast adaptation based on a context set. This set is encoded by a latent variable, which is often assumed to follow a simple distribution. However, in real-word settings, the context set may be drawn from richer distributions having multiple modes, heavy tails, etc. In this work, we provide a framework that allows NPs' latent variable to be given a rich prior defined by a graphical model. These distributional assumptions directly translate into an appropriate aggregation strategy for the context set. Moreover, we describe a message-passing procedure that still allows for end-to-end optimization with stochastic gradients. We demonstrate the generality of our framework by using mixture and Student-t assumptions that yield improvements in function modelling and test-time robustness.

Title: Assessing Dataset Quality Through Decision Tree Characteristics in Autoencoder-Processed Spaces. (arXiv:2306.15392v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15392
Code URL: https://github.com/szmazurek/ds_assessment
Copy Paste: [[2306.15392] Assessing Dataset Quality Through Decision Tree Characteristics in Autoencoder-Processed Spaces](http://arxiv.org/abs/2306.15392) #robust
Summary:
In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels, we illustrate the profound impact of dataset quality on model training and performance. We further introduce two additional datasets designed to represent specific data conditions - one maximizing entropy and the other demonstrating high redundancy. Our findings underscore the importance of appropriate feature selection, adequate data volume, and data quality in achieving high-performing machine learning models. To aid researchers and practitioners, we propose a comprehensive framework for dataset quality assessment, which can help evaluate if the dataset at hand is sufficient and of the required quality for specific tasks. This research offers valuable insights into data assessment practices, contributing to the development of more accurate and robust machine learning models.

Title: Enhancing Representation Learning on High-Dimensional, Small-Size Tabular Data: A Divide and Conquer Method with Ensembled VAEs. (arXiv:2306.15661v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15661
Code URL: null
Copy Paste: [[2306.15661] Enhancing Representation Learning on High-Dimensional, Small-Size Tabular Data: A Divide and Conquer Method with Ensembled VAEs](http://arxiv.org/abs/2306.15661) #robust
Summary:
Variational Autoencoders and their many variants have displayed impressive ability to perform dimensionality reduction, often achieving state-of-the-art performance. Many current methods however, struggle to learn good representations in High Dimensional, Low Sample Size (HDLSS) tasks, which is an inherently challenging setting. We address this challenge by using an ensemble of lightweight VAEs to learn posteriors over subsets of the feature-space, which get aggregated into a joint posterior in a novel divide-and-conquer approach. Specifically, we present an alternative factorisation of the joint posterior that induces a form of implicit data augmentation that yields greater sample efficiency. Through a series of experiments on eight real-world datasets, we show that our method learns better latent representations in HDLSS settings, which leads to higher accuracy in a downstream classification task. Furthermore, we verify that our approach has a positive effect on disentanglement and achieves a lower estimated Total Correlation on learnt representations. Finally, we show that our approach is robust to partial features at inference, exhibiting little performance degradation even with most features missing.

biometric

steal

Title: RansomAI: AI-powered Ransomware for Stealthy Encryption. (arXiv:2306.15559v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2306.15559
Code URL: null
Copy Paste: [[2306.15559] RansomAI: AI-powered Ransomware for Stealthy Encryption](http://arxiv.org/abs/2306.15559) #steal
Summary:
Cybersecurity solutions have shown promising performance when detecting ransomware samples that use fixed algorithms and encryption rates. However, due to the current explosion of Artificial Intelligence (AI), sooner than later, ransomware (and malware in general) will incorporate AI techniques to intelligently and dynamically adapt its encryption behavior to be undetected. It might result in ineffective and obsolete cybersecurity solutions, but the literature lacks AI-powered ransomware to verify it. Thus, this work proposes RansomAI, a Reinforcement Learning-based framework that can be integrated into existing ransomware samples to adapt their encryption behavior and stay stealthy while encrypting files. RansomAI presents an agent that learns the best encryption algorithm, rate, and duration that minimizes its detection (using a reward mechanism and a fingerprinting intelligent detection system) while maximizing its damage function. The proposed framework was validated in a ransomware, Ransomware-PoC, that infected a Raspberry Pi 4, acting as a crowdsensor. A pool of experiments with Deep Q-Learning and Isolation Forest (deployed on the agent and detection system, respectively) has demonstrated that RansomAI evades the detection of Ransomware-PoC affecting the Raspberry Pi 4 in a few minutes with >90% accuracy.

extraction

Title: FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction. (arXiv:2306.14913v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.14913
Code URL: null
Copy Paste: [[2306.14913] FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction](http://arxiv.org/abs/2306.14913) #extraction
Summary:
Universal Information Extraction (UIE) has been introduced as a unified framework for various Information Extraction (IE) tasks and has achieved widespread success. Despite this, UIE models have limitations. For example, they rely heavily on span boundaries in the data during training, which does not reflect the reality of span annotation challenges. Slight adjustments to positions can also meet requirements. Additionally, UIE models lack attention to the limited span length feature in IE. To address these deficiencies, we propose the Fuzzy Span Universal Information Extraction (FSUIE) framework. Specifically, our contribution consists of two concepts: fuzzy span loss and fuzzy span attention. Our experimental results on a series of main IE tasks show significant improvement compared to the baseline, especially in terms of fast convergence and strong performance with small amounts of data and training epochs. These results demonstrate the effectiveness and generalization of FSUIE in different tasks, settings, and scenarios.

Title: Product Information Extraction using ChatGPT. (arXiv:2306.14921v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.14921
Code URL: https://github.com/wbsg-uni-mannheim/pie_chatgpt
Copy Paste: [[2306.14921] Product Information Extraction using ChatGPT](http://arxiv.org/abs/2306.14921) #extraction
Summary:
Structured product data in the form of attribute/value pairs is the foundation of many e-commerce applications such as faceted product search, product comparison, and product recommendation. Product offers often only contain textual descriptions of the product attributes in the form of titles or free text. Hence, extracting attribute/value pairs from textual product descriptions is an essential enabler for e-commerce applications. In order to excel, state-of-the-art product information extraction methods require large quantities of task-specific training data. The methods also struggle with generalizing to out-of-distribution attributes and attribute values that were not a part of the training data. Due to being pre-trained on huge amounts of text as well as due to emergent effects resulting from the model size, Large Language Models like ChatGPT have the potential to address both of these shortcomings. This paper explores the potential of ChatGPT for extracting attribute/value pairs from product descriptions. We experiment with different zero-shot and few-shot prompt designs. Our results show that ChatGPT achieves a performance similar to a pre-trained language model but requires much smaller amounts of training data and computation for fine-tuning.

Title: Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning. (arXiv:2306.15503v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15503
Code URL: null
Copy Paste: [[2306.15503] Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning](http://arxiv.org/abs/2306.15503) #extraction
Summary:
In recent years, data-driven reinforcement learning (RL), also known as offline RL, have gained significant attention. However, the role of data sampling techniques in offline RL has been overlooked despite its potential to enhance online RL performance. Recent research suggests applying sampling techniques directly to state-transitions does not consistently improve performance in offline RL. Therefore, in this study, we propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories for more comprehensive information extraction from limited data. TR enhances learning efficiency by backward sampling of trajectories that optimizes the use of subsequent state information. Building on TR, we build the weighted critic target to avoid sampling unseen actions in offline training, and Prioritized Trajectory Replay (PTR) that enables more efficient trajectory sampling, prioritized by various trajectory priority metrics. We demonstrate the benefits of integrating TR and PTR with existing offline RL algorithms on D4RL. In summary, our research emphasizes the significance of trajectory-based data sampling techniques in enhancing the efficiency and performance of offline RL algorithms.

membership infer

federate

Title: FedET: A Communication-Efficient Federated Class-Incremental Learning Framework Based on Enhanced Transformer. (arXiv:2306.15347v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15347
Code URL: null
Copy Paste: [[2306.15347] FedET: A Communication-Efficient Federated Class-Incremental Learning Framework Based on Enhanced Transformer](http://arxiv.org/abs/2306.15347) #federate
Summary:
Federated Learning (FL) has been widely concerned for it enables decentralized learning while ensuring data privacy. However, most existing methods unrealistically assume that the classes encountered by local clients are fixed over time. After learning new classes, this assumption will make the model's catastrophic forgetting of old classes significantly severe. Moreover, due to the limitation of communication cost, it is challenging to use large-scale models in FL, which will affect the prediction accuracy. To address these challenges, we propose a novel framework, Federated Enhanced Transformer (FedET), which simultaneously achieves high accuracy and low communication cost. Specifically, FedET uses Enhancer, a tiny module, to absorb and communicate new knowledge, and applies pre-trained Transformers combined with different Enhancers to ensure high precision on various tasks. To address local forgetting caused by new classes of new tasks and global forgetting brought by non-i.i.d (non-independent and identically distributed) class imbalance across different local clients, we proposed an Enhancer distillation method to modify the imbalance between old and new knowledge and repair the non-i.i.d. problem. Experimental results demonstrate that FedET's average accuracy on representative benchmark datasets is 14.1% higher than the state-of-the-art method, while FedET saves 90% of the communication cost compared to the previous method.

Title: When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions. (arXiv:2306.15546v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15546
Code URL: null
Copy Paste: [[2306.15546] When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions](http://arxiv.org/abs/2306.15546) #federate
Summary:
The intersection of the Foundation Model (FM) and Federated Learning (FL) provides mutual benefits, presents a unique opportunity to unlock new possibilities in AI research, and address critical challenges in AI and real-world applications. FL expands the availability of data for FMs and enables computation sharing, distributing the training process and reducing the burden on FL participants. It promotes collaborative FM development, democratizing the process and fostering inclusivity and innovation. On the other hand, FM, with its enormous size, pre-trained knowledge, and exceptional performance, serves as a robust starting point for FL, facilitating faster convergence and better performance under non-iid data. Additionally, leveraging FM to generate synthetic data enriches data diversity, reduces overfitting, and preserves privacy. By examining the interplay between FL and FM, this paper aims to deepen the understanding of their synergistic relationship, highlighting the motivations, challenges, and future directions. Through an exploration of the challenges faced by FL and FM individually and their interconnections, we aim to inspire future research directions that can further enhance both fields, driving advancements and propelling the development of privacy-preserving and scalable AI systems.

fair

Title: Testing of Detection Tools for AI-Generated Text. (arXiv:2306.15666v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15666
Code URL: null
Copy Paste: [[2306.15666] Testing of Detection Tools for AI-Generated Text](http://arxiv.org/abs/2306.15666) #fair
Summary:
Recent advances in generative pre-trained transformer large language models have emphasised the potential risks of unfair use of artificial intelligence (AI) generated content in an academic environment and intensified efforts in searching for solutions to detect such content. The paper examines the general functionality of detection tools for artificial intelligence generated text and evaluates them based on accuracy and error type analysis. Specifically, the study seeks to answer research questions about whether existing detection tools can reliably differentiate between human-written text and ChatGPT-generated text, and whether machine translation and content obfuscation techniques affect the detection of AIgenerated text. The research covers 12 publicly available tools and two commercial systems (Turnitin and PlagiarismCheck) that are widely used in the academic setting. The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AIgenerated text. Furthermore, content obfuscation techniques significantly worsen the performance of tools. The study makes several significant contributions. First, it summarises up-to-date similar scientific and non-scientific efforts in the field. Second, it presents the result of one of the most comprehensive tests conducted so far, based on a rigorous research methodology, an original document set, and a broad coverage of tools. Third, it discusses the implications and drawbacks of using detection tools for AI-generated text in academic settings.

Title: Fairness Aware Counterfactuals for Subgroups. (arXiv:2306.14978v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.14978
Code URL: null
Copy Paste: [[2306.14978] Fairness Aware Counterfactuals for Subgroups](http://arxiv.org/abs/2306.14978) #fair
Summary:
In this work, we present Fairness Aware Counterfactuals for Subgroups (FACTS), a framework for auditing subgroup fairness through counterfactual explanations. We start with revisiting (and generalizing) existing notions and introducing new, more refined notions of subgroup fairness. We aim to (a) formulate different aspects of the difficulty of individuals in certain subgroups to achieve recourse, i.e. receive the desired outcome, either at the micro level, considering members of the subgroup individually, or at the macro level, considering the subgroup as a whole, and (b) introduce notions of subgroup fairness that are robust, if not totally oblivious, to the cost of achieving recourse. We accompany these notions with an efficient, model-agnostic, highly parameterizable, and explainable framework for evaluating subgroup fairness. We demonstrate the advantages, the wide applicability, and the efficiency of our approach through a thorough experimental evaluation of different benchmark datasets.

Title: Balanced Filtering via Non-Disclosive Proxies. (arXiv:2306.15083v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15083
Code URL: null
Copy Paste: [[2306.15083] Balanced Filtering via Non-Disclosive Proxies](http://arxiv.org/abs/2306.15083) #fair
Summary:
We study the problem of non-disclosively collecting a sample of data that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at collection time. Specifically, our collection mechanism does not reveal significantly more about group membership of any individual sample than can be ascertained from base rates alone. To do this, we adopt a fairness pipeline perspective, in which a learner can use a small set of labeled data to train a proxy function that can later be used for this filtering task. We then associate the range of the proxy function with sampling probabilities; given a new candidate, we classify it using our proxy function, and then select it for our sample with probability proportional to the sampling probability corresponding to its proxy classification. Importantly, we require that the proxy classification itself not reveal significant information about the sensitive group membership of any individual sample (i.e., it should be sufficiently non-disclosive). We show that under modest algorithmic assumptions, we find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze generalization properties.

Title: FAIRER: Fairness as Decision Rationale Alignment. (arXiv:2306.15299v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15299
Code URL: null
Copy Paste: [[2306.15299] FAIRER: Fairness as Decision Rationale Alignment](http://arxiv.org/abs/2306.15299) #fair
Summary:
Deep neural networks (DNNs) have made significant progress, but often suffer from fairness issues, as deep models typically show distinct accuracy differences among certain subgroups (e.g., males and females). Existing research addresses this critical issue by employing fairness-aware loss functions to constrain the last-layer outputs and directly regularize DNNs. Although the fairness of DNNs is improved, it is unclear how the trained network makes a fair prediction, which limits future fairness improvements. In this paper, we investigate fairness from the perspective of decision rationale and define the parameter parity score to characterize the fair decision process of networks by analyzing neuron influence in various subgroups. Extensive empirical studies show that the unfair issue could arise from the unaligned decision rationales of subgroups. Existing fairness regularization terms fail to achieve decision rationale alignment because they only constrain last-layer outputs while ignoring intermediate neuron alignment. To address the issue, we formulate the fairness as a new task, i.e., decision rationale alignment that requires DNNs' neurons to have consistent responses on subgroups at both intermediate processes and the final prediction. To make this idea practical during optimization, we relax the naive objective function and propose gradient-guided parity alignment, which encourages gradient-weighted consistency of neurons across subgroups. Extensive experiments on a variety of datasets show that our method can significantly enhance fairness while sustaining a high level of accuracy and outperforming other approaches by a wide margin.

interpretability

Title: Homological Neural Networks: A Sparse Architecture for Multivariate Complexity. (arXiv:2306.15337v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15337
Code URL: null
Copy Paste: [[2306.15337] Homological Neural Networks: A Sparse Architecture for Multivariate Complexity](http://arxiv.org/abs/2306.15337) #interpretability
Summary:
The rapid progress of Artificial Intelligence research came with the development of increasingly complex deep learning models, leading to growing challenges in terms of computational complexity, energy efficiency and interpretability. In this study, we apply advanced network-based information filtering techniques to design a novel deep neural network unit characterized by a sparse higher-order graphical architecture built over the homological structure of underlying data. We demonstrate its effectiveness in two application domains which are traditionally challenging for deep learning: tabular data and time series regression problems. Results demonstrate the advantages of this novel design which can tie or overcome the results of state-of-the-art machine learning and deep learning models using only a fraction of parameters.

explainability

Title: "You might think about slightly revising the title": identifying hedges in peer-tutoring interactions. (arXiv:2306.14911v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.14911
Code URL: https://github.com/anonymoushedges/hedgedetection
Copy Paste: [[2306.14911] "You might think about slightly revising the title": identifying hedges in peer-tutoring interactions](http://arxiv.org/abs/2306.14911) #explainability
Summary:
Hedges play an important role in the management of conversational interaction. In peer tutoring, they are notably used by tutors in dyads (pairs of interlocutors) experiencing low rapport to tone down the impact of instructions and negative feedback. Pursuing the objective of building a tutoring agent that manages rapport with students in order to improve learning, we used a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges. We compared approaches relying on pre-trained resources with others that integrate insights from the social science literature. Our best performance involved a hybrid approach that outperforms the existing baseline while being easier to interpret. We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations, and we identify some novel features, and the benefits of such a hybrid model approach.

watermark

diffusion

Title: PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment. (arXiv:2306.15667v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15667
Code URL: null
Copy Paste: [[2306.15667] PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment](http://arxiv.org/abs/2306.15667) #diffusion
Summary:
Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. This novel view of an old problem has several advantages. (i) The nature of the diffusion framework mirrors the iterative procedure of bundle adjustment. (ii) The formulation allows a seamless integration of geometric constraints from epipolar geometry. (iii) It excels in typically difficult scenarios such as sparse views with wide baselines. (iv) The method can predict intrinsics and extrinsics for an arbitrary amount of images. We demonstrate that our method PoseDiffusion significantly improves over the classic SfM pipelines and the learned approaches on two real-world datasets. Finally, it is observed that our method can generalize across datasets without further training. Project page: https://posediffusion.github.io/

Title: Unsupervised Episode Generation for Graph Meta-learning. (arXiv:2306.15217v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15217
Code URL: null
Copy Paste: [[2306.15217] Unsupervised Episode Generation for Graph Meta-learning](http://arxiv.org/abs/2306.15217) #diffusion
Summary:
In this paper, we investigate Unsupervised Episode Generation methods to solve Few-Shot Node-Classification (FSNC) problem via Meta-learning without labels. Dominant meta-learning methodologies for FSNC were developed under the existence of abundant labeled nodes for training, which however may not be possible to obtain in the real-world. Although few studies have been proposed to tackle the label-scarcity problem, they still rely on a limited amount of labeled data, which hinders the full utilization of the information of all nodes in a graph. Despite the effectiveness of Self-Supervised Learning (SSL) approaches on FSNC without labels, they mainly learn generic node embeddings without consideration on the downstream task to be solved, which may limit its performance. In this work, we propose unsupervised episode generation methods to benefit from their generalization ability for FSNC tasks while resolving label-scarcity problem. We first propose a method that utilizes graph augmentation to generate training episodes called g-UMTRA, which however has several drawbacks, i.e., 1) increased training time due to the computation of augmented features and 2) low applicability to existing baselines. Hence, we propose Neighbors as Queries (NaQ), which generates episodes from structural neighbors found by graph diffusion. Our proposed methods are model-agnostic, that is, they can be plugged into any existing graph meta-learning models, while not sacrificing much of their performance or sometimes even improving them. We provide theoretical insights to support why our unsupervised episode generation methodologies work, and extensive experimental results demonstrate the potential of our unsupervised episode generation methods for graph meta-learning towards FSNC problems.

noise learning

data-free

transformer

Title: Cutting-Edge Techniques for Depth Map Super-Resolution. (arXiv:2306.15244v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15244
Code URL: null
Copy Paste: [[2306.15244] Cutting-Edge Techniques for Depth Map Super-Resolution](http://arxiv.org/abs/2306.15244) #transformer
Summary:
To overcome hardware limitations in commercially available depth sensors which result in low-resolution depth maps, depth map super-resolution (DMSR) is a practical and valuable computer vision task. DMSR requires upscaling a low-resolution (LR) depth map into a high-resolution (HR) space. Joint image filtering for DMSR has been applied using spatially-invariant and spatially-variant convolutional neural network (CNN) approaches. In this project, we propose a novel joint image filtering DMSR algorithm using a Swin transformer architecture. Furthermore, we introduce a Nonlinear Activation Free (NAF) network based on a conventional CNN model used in cutting-edge image restoration applications and compare the performance of the techniques. The proposed algorithms are validated through numerical studies and visual examples demonstrating improvements to state-of-the-art performance while maintaining competitive computation time for noisy depth map super-resolution.

Title: Towards predicting Pedestrian Evacuation Time and Density from Floorplans using a Vision Transformer. (arXiv:2306.15318v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15318
Code URL: null
Copy Paste: [[2306.15318] Towards predicting Pedestrian Evacuation Time and Density from Floorplans using a Vision Transformer](http://arxiv.org/abs/2306.15318) #transformer
Summary:
Conventional pedestrian simulators are inevitable tools in the design process of a building, as they enable project engineers to prevent overcrowding situations and plan escape routes for evacuation. However, simulation runtime and the multiple cumbersome steps in generating simulation results are potential bottlenecks during the building design process. Data-driven approaches have demonstrated their capability to outperform conventional methods in speed while delivering similar or even better results across many disciplines. In this work, we present a deep learning-based approach based on a Vision Transformer to predict density heatmaps over time and total evacuation time from a given floorplan. Specifically, due to limited availability of public datasets, we implement a parametric data generation pipeline including a conventional simulator. This enables us to build a large synthetic dataset that we use to train our architecture. Furthermore, we seamlessly integrate our model into a BIM-authoring tool to generate simulation results instantly and automatically.

Title: Taming Detection Transformers for Medical Object Detection. (arXiv:2306.15472v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15472
Code URL: null
Copy Paste: [[2306.15472] Taming Detection Transformers for Medical Object Detection](http://arxiv.org/abs/2306.15472) #transformer
Summary:
The accurate detection of suspicious regions in medical images is an error-prone and time-consuming process required by many routinely performed diagnostic procedures. To support clinicians during this difficult task, several automated solutions were proposed relying on complex methods with many hyperparameters. In this study, we investigate the feasibility of DEtection TRansformer (DETR) models for volumetric medical object detection. In contrast to previous works, these models directly predict a set of objects without relying on the design of anchors or manual heuristics such as non-maximum-suppression to detect objects. We show by conducting extensive experiments with three models, namely DETR, Conditional DETR, and DINO DETR on four data sets (CADA, RibFrac, KiTS19, and LIDC) that these set prediction models can perform on par with or even better than currently existing methods. DINO DETR, the best-performing model in our experiments demonstrates this by outperforming a strong anchor-based one-stage detector, Retina U-Net, on three out of four data sets.

Title: Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression. (arXiv:2306.15063v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15063
Code URL: https://github.com/mansheej/icl-task-diversity
Copy Paste: [[2306.15063] Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression](http://arxiv.org/abs/2306.15063) #transformer
Summary:
Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises a foundational question: can ICL solve fundamentally $\textit{new}$ tasks that are very different from those seen during pretraining? To probe this question, we examine ICL's performance on linear regression while varying the diversity of tasks in the pretraining dataset. We empirically demonstrate a $\textit{task diversity threshold}$ for the emergence of ICL. Below this threshold, the pretrained transformer cannot solve unseen regression tasks as it behaves like a Bayesian estimator with the $\textit{non-diverse pretraining task distribution}$ as the prior. Beyond this threshold, the transformer significantly outperforms this estimator; its behavior aligns with that of ridge regression, corresponding to a Gaussian prior over $\textit{all tasks}$, including those not seen during pretraining. These results highlight that, when pretrained on data with task diversity greater than the threshold, transformers $\textit{can}$ solve fundamentally new tasks in-context. Importantly, this capability hinges on it deviating from the Bayes optimal estimator with the pretraining distribution as the prior. This study underscores, in a concrete example, the critical role of task diversity, alongside data and model scale, in the emergence of ICL. Code is available at https://github.com/mansheej/icl-task-diversity.

Title: Constructing Multilingual Code Search Dataset Using Neural Machine Translation. (arXiv:2306.15604v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15604
Code URL: https://github.com/ynklab/xcodesearchnet
Copy Paste: [[2306.15604] Constructing Multilingual Code Search Dataset Using Neural Machine Translation](http://arxiv.org/abs/2306.15604) #transformer
Summary:
Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only in English. In this research, we create a multilingual code search dataset in four natural and four programming languages using a neural machine translation model. Using our dataset, we pre-train and fine-tune the Transformer-based models and then evaluate them on multiple code search test sets. Our results show that the model pre-trained with all natural and programming language data has performed best in most cases. By applying back-translation data filtering to our dataset, we demonstrate that the translation quality affects the model's performance to a certain extent, but the data size matters more.

Title: Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos. (arXiv:2306.15644v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15644
Code URL: null
Copy Paste: [[2306.15644] Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos](http://arxiv.org/abs/2306.15644) #transformer
Summary:
To realize human-robot collaboration, robots need to execute actions for new tasks according to human instructions given finite prior knowledge. Human experts can share their knowledge of how to perform a task with a robot through multi-modal instructions in their demonstrations, showing a sequence of short-horizon steps to achieve a long-horizon goal. This paper introduces a method for robot action sequence generation from instruction videos using (1) an audio-visual Transformer that converts audio-visual features and instruction speech to a sequence of robot actions called dynamic movement primitives (DMPs) and (2) style-transfer-based training that employs multi-task learning with video captioning and weakly-supervised learning with a semantic classifier to exploit unpaired video-action data. We built a system that accomplishes various cooking actions, where an arm robot executes a DMP sequence acquired from a cooking video using the audio-visual Transformer. Experiments with Epic-Kitchen-100, YouCookII, QuerYD, and in-house instruction video datasets show that the proposed method improves the quality of DMP sequences by 2.3 times the METEOR score obtained with a baseline video-to-action Transformer. The model achieved 32% of the task success rate with the task knowledge of the object.

Title: Length Generalization in Arithmetic Transformers. (arXiv:2306.15400v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15400
Code URL: null
Copy Paste: [[2306.15400] Length Generalization in Arithmetic Transformers](http://arxiv.org/abs/2306.15400) #transformer
Summary:
We examine how transformers cope with two challenges: learning basic integer arithmetic, and generalizing to longer sequences than seen during training. We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums. However, this method fails for multiplication, and we propose train set priming: adding a few ($10$ to $50$) long sequences to the training set. We show that priming allows models trained on $5$-digit $\times$ $3$-digit multiplications to generalize to $35\times 3$ examples. We also show that models can be primed for different generalization lengths, and that the priming sample size scales as the logarithm of the training set size. Finally, we discuss potential applications of priming beyond arithmetic.

generative

Title: Free-style and Fast 3D Portrait Synthesis. (arXiv:2306.15419v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15419
Code URL: null
Copy Paste: [[2306.15419] Free-style and Fast 3D Portrait Synthesis](http://arxiv.org/abs/2306.15419) #generative
Summary:
Efficiently generating a free-style 3D portrait with high quality and consistency is a promising yet challenging task. The portrait styles generated by most existing methods are usually restricted by their 3D generators, which are learned in specific facial datasets, such as FFHQ. To get a free-style 3D portrait, one can build a large-scale multi-style database to retrain the 3D generator, or use a off-the-shelf tool to do the style translation. However, the former is time-consuming due to data collection and training process, the latter may destroy the multi-view consistency. To tackle this problem, we propose a fast 3D portrait synthesis framework in this paper, which enable one to use text prompts to specify styles. Specifically, for a given portrait style, we first leverage two generative priors, a 3D-aware GAN generator (EG3D) and a text-guided image editor (Ip2p), to quickly construct a few-shot training set, where the inference process of Ip2p is optimized to make editing more stable. Then we replace original triplane generator of EG3D with a Image-to-Triplane (I2T) module for two purposes: 1) getting rid of the style constraints of pre-trained EG3D by fine-tuning I2T on the few-shot dataset; 2) improving training efficiency by fixing all parts of EG3D except I2T. Furthermore, we construct a multi-style and multi-identity 3D portrait database to demonstrate the scalability and generalization of our method. Experimental results show that our method is capable of synthesizing high-quality 3D portraits with specified styles in a few minutes, outperforming the state-of-the-art.

Title: Clickbait Classification and Spoiling Using Natural Language Processing. (arXiv:2306.14907v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.14907
Code URL: null
Copy Paste: [[2306.14907] Clickbait Classification and Spoiling Using Natural Language Processing](http://arxiv.org/abs/2306.14907) #generative
Summary:
Clickbait is the practice of engineering titles to incentivize readers to click through to articles. Such titles with sensationalized language reveal as little information as possible. Occasionally, clickbait will be intentionally misleading, so natural language processing (NLP) can scan the article and answer the question posed by the clickbait title, or spoil it. We tackle two tasks: classifying the clickbait into one of 3 types (Task 1), and spoiling the clickbait (Task 2). For Task 1, we propose two binary classifiers to determine the final spoiler type. For Task 2, we experiment with two approaches: using a question-answering model to identify the span of text of the spoiler, and using a large language model (LLM) to generate the spoiler. Because the spoiler is contained in the article, we frame the second task as a question-answering approach for identifying the starting and ending positions of the spoiler. We created models for Task 1 that were better than the baselines proposed by the dataset authors and engineered prompts for Task 2 that did not perform as well as the baselines proposed by the dataset authors due to the evaluation metric performing worse when the output text is from a generative model as opposed to an extractive model.

Title: Learning to Rank in Generative Retrieval. (arXiv:2306.15222v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15222
Code URL: null
Copy Paste: [[2306.15222] Learning to Rank in Generative Retrieval](http://arxiv.org/abs/2306.15222) #generative
Summary:
Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generation models and represents a new paradigm distinct from traditional learning-to-rank methods. However, despite its rapid development, current generative retrieval methods are still limited. They typically rely on a heuristic function to transform predicted identifiers into a passage rank list, which creates a gap between the learning objective of generative retrieval and the desired passage ranking target. Moreover, the inherent exposure bias problem of text generation also persists in generative retrieval. To address these issues, we propose a novel framework, called LTRGR, that combines generative retrieval with the classical learning-to-rank paradigm. Our approach involves training an autoregressive model using a passage rank loss, which directly optimizes the autoregressive model toward the optimal passage ranking. This framework only requires an additional training step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conducted experiments on three public datasets, and our results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods, indicating its effectiveness and robustness.

Title: MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation. (arXiv:2306.15253v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15253
Code URL: null
Copy Paste: [[2306.15253] MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation](http://arxiv.org/abs/2306.15253) #generative
Summary:
Humans talk in free-form while negotiating the expressed meanings or common ground. Despite the impressive conversational abilities of the large generative language models, they do not consider the individual differences in contextual understanding in a shared situated environment. In this work, we propose MindDial, a novel conversational framework that can generate situated free-form responses to negotiate common ground. We design an explicit mind module that can track three-level beliefs -- the speaker's belief, the speaker's prediction of the listener's belief, and the common belief based on the gap between the first two. Then the speaking act classification head will decide to continue to talk, end this turn, or take task-related action. We augment a common ground alignment dataset MutualFriend with belief dynamics annotation, of which the goal is to find a single mutual friend based on the free chat between two agents. Experiments show that our model with mental state modeling can resemble human responses when aligning common ground meanwhile mimic the natural human conversation flow. The ablation study further validates the third-level common belief can aggregate information of the first and second-order beliefs and align common ground more efficiently.

Title: BatchGFN: Generative Flow Networks for Batch Active Learning. (arXiv:2306.15058v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15058
Code URL: https://github.com/s-a-malik/batchgfn
Copy Paste: [[2306.15058] BatchGFN: Generative Flow Networks for Batch Active Learning](http://arxiv.org/abs/2306.15058) #generative
Summary:
We introduce BatchGFN -- a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active learning in a principled way. We show our approach enables sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems. This alleviates the computational complexity of batch-aware algorithms and removes the need for greedy approximations to find maximizers for the batch reward. We also present early results for amortizing training across acquisition steps, which will enable scaling to real-world tasks.

Title: Learning non-Markovian Decision-Making from State-only Sequences. (arXiv:2306.15156v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15156
Code URL: null
Copy Paste: [[2306.15156] Learning non-Markovian Decision-Making from State-only Sequences](http://arxiv.org/abs/2306.15156) #generative
Summary:
Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy. We demonstrate the efficacy of the proposed method in a prototypical path planning task with non-Markovian constraints and show that the learned model exhibits strong performances in challenging domains from the MuJoCo suite.

Title: Learning from Invalid Data: On Constraint Satisfaction in Generative Models. (arXiv:2306.15166v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15166
Code URL: null
Copy Paste: [[2306.15166] Learning from Invalid Data: On Constraint Satisfaction in Generative Models](http://arxiv.org/abs/2306.15166) #generative
Summary:
Generative models have demonstrated impressive results in vision, language, and speech. However, even with massive datasets, they struggle with precision, generating physically invalid or factually incorrect data. This is particularly problematic when the generated data must satisfy constraints, for example, to meet product specifications in engineering design or to adhere to the laws of physics in a natural scene. To improve precision while preserving diversity and fidelity, we propose a novel training mechanism that leverages datasets of constraint-violating data points, which we consider invalid. Our approach minimizes the divergence between the generative distribution and the valid prior while maximizing the divergence with the invalid distribution. We demonstrate how generative models like GANs and DDPMs that we augment to train with invalid data vastly outperform their standard counterparts which solely train on valid data points. For example, our training procedure generates up to 98 % fewer invalid samples on 2D densities, improves connectivity and stability four-fold on a stacking block problem, and improves constraint satisfaction by 15 % on a structural topology optimization benchmark in engineering design. We also analyze how the quality of the invalid data affects the learning procedure and the generalization properties of models. Finally, we demonstrate significant improvements in sample efficiency, showing that a tenfold increase in valid samples leads to a negligible difference in constraint satisfaction, while less than 10 % invalid samples lead to a tenfold improvement. Our proposed mechanism offers a promising solution for improving precision in generative models while preserving diversity and fidelity, particularly in domains where constraint satisfaction is critical and data is limited, such as engineering design, robotics, and medicine.

Title: Anomaly Detection in Networks via Score-Based Generative Models. (arXiv:2306.15324v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15324
Code URL: https://github.com/realfolkcode/graphdiffusionanomaly
Copy Paste: [[2306.15324] Anomaly Detection in Networks via Score-Based Generative Models](http://arxiv.org/abs/2306.15324) #generative
Summary:
Node outlier detection in attributed graphs is a challenging problem for which there is no method that would work well across different datasets. Motivated by the state-of-the-art results of score-based models in graph generative modeling, we propose to incorporate them into the aforementioned problem. Our method achieves competitive results on small-scale graphs. We provide an empirical analysis of the Dirichlet energy, and show that generative models might struggle to accurately reconstruct it.

large language model

Title: Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic. (arXiv:2306.15195v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15195
Code URL: https://github.com/shikras/shikra
Copy Paste: [[2306.15195] Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic](http://arxiv.org/abs/2306.15195) #large language model
Summary:
In human conversations, individuals can indicate relevant regions within a scene while addressing others. In turn, the other person can then respond by referring to specific regions if necessary. This natural referential ability in dialogue remains absent in current Multimodal Large Language Models (MLLMs). To fill this gap, this paper proposes an MLLM called Shikra, which can handle spatial coordinate inputs and outputs in natural language. Its architecture consists of a vision encoder, an alignment layer, and a LLM. It is designed to be straightforward and simple, without the need for extra vocabularies, position encoder, pre-/post-detection modules, or external plug-in models. All inputs and outputs are in natural language form. Referential dialogue is a superset of various vision-language (VL) tasks. Shikra can naturally handle location-related tasks like REC and PointQA, as well as conventional VL tasks such as Image Captioning and VQA. Experimental results showcase Shikra's promising performance. Furthermore, it enables numerous exciting applications, like providing mentioned objects' coordinates in chains of thoughts and comparing user-pointed regions similarities. Our code and model are accessed at https://github.com/shikras/shikra.

Title: PRISMA-DFLLM: An Extension of PRISMA for Systematic Literature Reviews using Domain-specific Finetuned Large Language Models. (arXiv:2306.14905v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.14905
Code URL: null
Copy Paste: [[2306.14905] PRISMA-DFLLM: An Extension of PRISMA for Systematic Literature Reviews using Domain-specific Finetuned Large Language Models](http://arxiv.org/abs/2306.14905) #large language model
Summary:
With the proliferation of open-sourced Large Language Models (LLMs) and efficient finetuning techniques, we are on the cusp of the emergence of numerous domain-specific LLMs that have been finetuned for expertise across specialized fields and applications for which the current general-purpose LLMs are unsuitable. In academia, this technology has the potential to revolutionize the way we conduct systematic literature reviews (SLRs), access knowledge and generate new insights. This paper proposes an AI-enabled methodological framework that combines the power of LLMs with the rigorous reporting guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). By finetuning LLMs on domain-specific academic papers that have been selected as a result of a rigorous SLR process, the proposed PRISMA-DFLLM (for Domain-specific Finetuned LLMs) reporting guidelines offer the potential to achieve greater efficiency, reusability and scalability, while also opening the potential for conducting incremental living systematic reviews with the aid of LLMs. Additionally, the proposed approach for leveraging LLMs for SLRs enables the dissemination of finetuned models, empowering researchers to accelerate advancements and democratize cutting-edge research. This paper presents the case for the feasibility of finetuned LLMs to support rigorous SLRs and the technical requirements for realizing this. This work then proposes the extended PRISMA-DFLLM checklist of reporting guidelines as well as the advantages, challenges, and potential implications of implementing PRISMA-DFLLM. Finally, a future research roadmap to develop this line of AI-enabled SLRs is presented, paving the way for a new era of evidence synthesis and knowledge discovery.

Title: The Importance of Human-Labeled Data in the Era of LLMs. (arXiv:2306.14910v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.14910
Code URL: null
Copy Paste: [[2306.14910] The Importance of Human-Labeled Data in the Era of LLMs](http://arxiv.org/abs/2306.14910) #large language model
Summary:
The advent of large language models (LLMs) has brought about a revolution in the development of tailored machine learning models and sparked debates on redefining data requirements. The automation facilitated by the training and implementation of LLMs has led to discussions and aspirations that human-level labeling interventions may no longer hold the same level of importance as in the era of supervised learning. This paper presents compelling arguments supporting the ongoing relevance of human-labeled data in the era of LLMs.

Title: LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding. (arXiv:2306.14924v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.14924
Code URL: null
Copy Paste: [[2306.14924] LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding](http://arxiv.org/abs/2306.14924) #large language model
Summary:
Deductive coding is a widely used qualitative research method for determining the prevalence of themes across documents. While useful, deductive coding is often burdensome and time consuming since it requires researchers to read, interpret, and reliably categorize a large body of unstructured text documents. Large language models (LLMs), like ChatGPT, are a class of quickly evolving AI tools that can perform a range of natural language processing and reasoning tasks. In this study, we explore the use of LLMs to reduce the time it takes for deductive coding while retaining the flexibility of a traditional content analysis. We outline the proposed approach, called LLM-assisted content analysis (LACA), along with an in-depth case study using GPT-3.5 for LACA on a publicly available deductive coding data set. Additionally, we conduct an empirical benchmark using LACA on 4 publicly available data sets to assess the broader question of how well GPT-3.5 performs across a range of deductive coding tasks. Overall, we find that GPT-3.5 can often perform deductive coding at levels of agreement comparable to human coders. Additionally, we demonstrate that LACA can help refine prompts for deductive coding, identify codes for which an LLM is randomly guessing, and help assess when to use LLMs vs. human coders for deductive coding. We conclude with several implications for future practice of deductive coding and related research methods.

Title: WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models. (arXiv:2306.15087v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15087
Code URL: https://github.com/katyfelkner/winoqueer
Copy Paste: [[2306.15087] WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models](http://arxiv.org/abs/2306.15087) #large language model
Summary:
We present WinoQueer: a benchmark specifically designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community. The benchmark is community-sourced, via application of a novel method that generates a bias benchmark from a community survey. We apply our benchmark to several popular LLMs and find that off-the-shelf models generally do exhibit considerable anti-queer bias. Finally, we show that LLM bias against a marginalized community can be somewhat mitigated by finetuning on data written about or by members of that community, and that social media text written by community members is more effective than news text written about the community by non-members. Our method for community-in-the-loop benchmark development provides a blueprint for future researchers to develop community-driven, harms-grounded LLM benchmarks for other marginalized communities.

Title: Understanding Social Reasoning in Language Models with Language Models. (arXiv:2306.15448v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15448
Code URL: null
Copy Paste: [[2306.15448] Understanding Social Reasoning in Language Models with Language Models](http://arxiv.org/abs/2306.15448) #large language model
Summary:
As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle.

Title: Using Large Language Models to Provide Explanatory Feedback to Human Tutors. (arXiv:2306.15498v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15498
Code URL: null
Copy Paste: [[2306.15498] Using Large Language Models to Provide Explanatory Feedback to Human Tutors](http://arxiv.org/abs/2306.15498) #large language model
Summary:
Research demonstrates learners engaging in the process of producing explanations to support their reasoning, can have a positive impact on learning. However, providing learners real-time explanatory feedback often presents challenges related to classification accuracy, particularly in domain-specific environments, containing situationally complex and nuanced responses. We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise. This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based (F1 score = 0.811), and ineffective, or outcome-based (F1 score = 0.350), praise responses. More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition, which can provide tutors feedback, not only while engaging in lessons, but can potentially suggest real-time tutor moves. Future work involves leveraging large language models for data augmentation to improve accuracy, while also developing an explanatory feedback interface.

Title: Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool. (arXiv:2306.15518v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15518
Code URL: null
Copy Paste: [[2306.15518] Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool](http://arxiv.org/abs/2306.15518) #large language model
Summary:
This paper introduces a novel approach to enhance Large Language Models (LLMs) with expert knowledge to automate the analysis of corporate sustainability reports by benchmarking them against the Task Force for Climate-Related Financial Disclosures (TCFD) recommendations. Corporate sustainability reports are crucial in assessing organizations' environmental and social risks and impacts. However, analyzing these reports' vast amounts of information makes human analysis often too costly. As a result, only a few entities worldwide have the resources to analyze these reports, which could lead to a lack of transparency. While AI-powered tools can automatically analyze the data, they are prone to inaccuracies as they lack domain-specific expertise. This paper introduces a novel approach to enhance LLMs with expert knowledge to automate the analysis of corporate sustainability reports. We christen our tool CHATREPORT, and apply it in a first use case to assess corporate climate risk disclosures following the TCFD recommendations. CHATREPORT results from collaborating with experts in climate science, finance, economic policy, and computer science, demonstrating how domain experts can be involved in developing AI tools. We make our prompt templates, generated data, and scores available to the public to encourage transparency.

Title: Extending Context Window of Large Language Models via Positional Interpolation. (arXiv:2306.15595v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2306.15595
Code URL: null
Copy Paste: [[2306.15595] Extending Context Window of Large Language Models via Positional Interpolation](http://arxiv.org/abs/2306.15595) #large language model
Summary:
We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least $\sim 600 \times$ smaller than that of extrapolation, further demonstrating its stability. Models extended via Position Interpolation retain its original architecture and can reuse most pre-existing optimization and infrastructure.

Title: LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. (arXiv:2306.15626v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2306.15626
Code URL: https://github.com/lean-dojo/leandojo
Copy Paste: [[2306.15626] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models](http://arxiv.org/abs/2306.15626) #large language model
Summary:
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection: a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 96,962 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.

segmentation

Title: MIMIC: Masked Image Modeling with Image Correspondences. (arXiv:2306.15128v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15128
Code URL: https://github.com/raivnlab/mimic
Copy Paste: [[2306.15128] MIMIC: Masked Image Modeling with Image Correspondences](http://arxiv.org/abs/2306.15128) #segmentation
Summary:
Many pixelwise dense prediction tasks-depth estimation and semantic segmentation in computer vision today rely on pretrained image representations. Therefore, curating effective pretraining datasets is vital. Unfortunately, the effective pretraining datasets are those with multi-view scenes and have only been curated using annotated 3D meshes, point clouds, and camera parameters from simulated environments. We propose a dataset-curation mechanism that does not require any annotations. We mine two datasets: MIMIC-1M with 1.3M and MIMIC-3M with 3.1M multi-view image pairs from open-sourced video datasets and from synthetic 3D environments. We train multiple self-supervised models with different masked image modeling objectives to showcase the following findings: Representations trained on MIMIC-3M outperform those mined using annotations on multiple downstream tasks, including depth estimation, semantic segmentation, surface normals, and pose estimation. They also outperform representations that are frozen and when downstream training data is limited to few-shot. Larger dataset (MIMIC-3M) significantly improves performance, which is promising since our curation method can arbitrarily scale to produce even larger datasets. MIMIC code, dataset, and pretrained models are open-sourced at https://github.com/RAIVNLab/MIMIC.

Title: Delving into Crispness: Guided Label Refinement for Crisp Edge Detection. (arXiv:2306.15172v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15172
Code URL: null
Copy Paste: [[2306.15172] Delving into Crispness: Guided Label Refinement for Crisp Edge Detection](http://arxiv.org/abs/2306.15172) #segmentation
Summary:
Learning-based edge detection usually suffers from predicting thick edges. Through extensive quantitative study with a new edge crispness measure, we find that noisy human-labeled edges are the main cause of thick predictions. Based on this observation, we advocate that more attention should be paid on label quality than on model design to achieve crisp edge detection. To this end, we propose an effective Canny-guided refinement of human-labeled edges whose result can be used to train crisp edge detectors. Essentially, it seeks for a subset of over-detected Canny edges that best align human labels. We show that several existing edge detectors can be turned into a crisp edge detector through training on our refined edge maps. Experiments demonstrate that deep models trained with refined edges achieve significant performance boost of crispness from 17.4% to 30.6%. With the PiDiNet backbone, our method improves ODS and OIS by 12.2% and 12.6% on the Multicue dataset, respectively, without relying on non-maximal suppression. We further conduct experiments and show the superiority of our crisp edge detection for optical flow estimation and image segmentation.

Title: FBA-Net: Foreground and Background Aware Contrastive Learning for Semi-Supervised Atrium Segmentation. (arXiv:2306.15189v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15189
Code URL: null
Copy Paste: [[2306.15189] FBA-Net: Foreground and Background Aware Contrastive Learning for Semi-Supervised Atrium Segmentation](http://arxiv.org/abs/2306.15189) #segmentation
Summary:
Medical image segmentation of gadolinium enhancement magnetic resonance imaging (GE MRI) is an important task in clinical applications. However, manual annotation is time-consuming and requires specialized expertise. Semi-supervised segmentation methods that leverage both labeled and unlabeled data have shown promise, with contrastive learning emerging as a particularly effective approach. In this paper, we propose a contrastive learning strategy of foreground and background representations for semi-supervised 3D medical image segmentation (FBA-Net). Specifically, we leverage the contrastive loss to learn representations of both the foreground and background regions in the images. By training the network to distinguish between foreground-background pairs, we aim to learn a representation that can effectively capture the anatomical structures of interest. Experiments on three medical segmentation datasets demonstrate state-of-the-art performance. Notably, our method achieves a Dice score of 91.31% with only 20% labeled data, which is remarkably close to the 91.62% score of the fully supervised method that uses 100% labeled data on the left atrium dataset. Our framework has the potential to advance the field of semi-supervised 3D medical image segmentation and enable more efficient and accurate analysis of medical images with a limited amount of annotated labels.

Title: Semantic Segmentation Using Super Resolution Technique as Pre-Processing. (arXiv:2306.15218v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15218
Code URL: null
Copy Paste: [[2306.15218] Semantic Segmentation Using Super Resolution Technique as Pre-Processing](http://arxiv.org/abs/2306.15218) #segmentation
Summary:
Combining high-level and low-level visual tasks is a common technique in the field of computer vision. This work integrates the technique of image super resolution to semantic segmentation for document image binarization. It demonstrates that using image super-resolution as a preprocessing step can effectively enhance the results and performance of semantic segmentation.

Title: Hierarchical Dense Correlation Distillation for Few-Shot Segmentation-Extended Abstract. (arXiv:2306.15278v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15278
Code URL: null
Copy Paste: [[2306.15278] Hierarchical Dense Correlation Distillation for Few-Shot Segmentation-Extended Abstract](http://arxiv.org/abs/2306.15278) #segmentation
Summary:
Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations. Previous methods limited to the semantic feature and prototype representation suffer from coarse segmentation granularity and train-set overfitting. In this work, we design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture. The self-attention modules are used to assist in establishing hierarchical dense features, as a means to accomplish the cascade matching between query and support features. Moreover, we propose a matching module to reduce train-set overfitting and introduce correlation distillation leveraging semantic correspondence from coarse resolution to boost fine-grained segmentation. Our method performs decently in experiments. We achieve 50.0% mIoU on COCO dataset one-shot setting and 56.0% on five-shot segmentation, respectively. The code will be available on the project website. We hope our work can benefit broader industrial applications where novel classes with limited annotations are required to be decently identified.

Title: PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation. (arXiv:2306.15348v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15348
Code URL: https://github.com/jieqianyu/panet
Copy Paste: [[2306.15348] PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation](http://arxiv.org/abs/2306.15348) #segmentation
Summary:
Reliable LiDAR panoptic segmentation (LPS), including both semantic and instance segmentation, is vital for many robotic applications, such as autonomous driving. This work proposes a new LPS framework named PANet to eliminate the dependency on the offset branch and improve the performance on large objects, which are always over-segmented by clustering algorithms. Firstly, we propose a non-learning Sparse Instance Proposal (SIP) module with the ``sampling-shifting-grouping" scheme to directly group thing points into instances from the raw point cloud efficiently. More specifically, balanced point sampling is introduced to generate sparse seed points with more uniform point distribution over the distance range. And a shift module, termed bubble shifting, is proposed to shrink the seed points to the clustered centers. Then we utilize the connected component label algorithm to generate instance proposals. Furthermore, an instance aggregation module is devised to integrate potentially fragmented instances, improving the performance of the SIP module on large objects. Extensive experiments show that PANet achieves state-of-the-art performance among published works on the SemanticKITII validation and nuScenes validation for the panoptic segmentation task.

Title: SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion. (arXiv:2306.15349v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15349
Code URL: https://github.com/jieqianyu/ssc-rs
Copy Paste: [[2306.15349] SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion](http://arxiv.org/abs/2306.15349) #segmentation
Summary:
Semantic scene completion (SSC) jointly predicts the semantics and geometry of the entire 3D scene, which plays an essential role in 3D scene understanding for autonomous driving systems. SSC has achieved rapid progress with the help of semantic context in segmentation. However, how to effectively exploit the relationships between the semantic context in semantic segmentation and geometric structure in scene completion remains under exploration. In this paper, we propose to solve outdoor SSC from the perspective of representation separation and BEV fusion. Specifically, we present the network, named SSC-RS, which uses separate branches with deep supervision to explicitly disentangle the learning procedure of the semantic and geometric representations. And a BEV fusion network equipped with the proposed Adaptive Representation Fusion (ARF) module is presented to aggregate the multi-scale features effectively and efficiently. Due to the low computational burden and powerful representation ability, our model has good generality while running in real-time. Extensive experiments on SemanticKITTI demonstrate our SSC-RS achieves state-of-the-art performance.

Title: TrickVOS: A Bag of Tricks for Video Object Segmentation. (arXiv:2306.15377v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15377
Code URL: null
Copy Paste: [[2306.15377] TrickVOS: A Bag of Tricks for Video Object Segmentation](http://arxiv.org/abs/2306.15377) #segmentation
Summary:
Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods; i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS; a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.

Title: No-Service Rail Surface Defect Segmentation via Normalized Attention and Dual-scale Interaction. (arXiv:2306.15442v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15442
Code URL: null
Copy Paste: [[2306.15442] No-Service Rail Surface Defect Segmentation via Normalized Attention and Dual-scale Interaction](http://arxiv.org/abs/2306.15442) #segmentation
Summary:
No-service rail surface defect (NRSD) segmentation is an essential way for perceiving the quality of no-service rails. However, due to the complex and diverse outlines and low-contrast textures of no-service rails, existing natural image segmentation methods cannot achieve promising performance in NRSD images, especially in some unique and challenging NRSD scenes. To this end, in this paper, we propose a novel segmentation network for NRSDs based on Normalized Attention and Dual-scale Interaction, named NaDiNet. Specifically, NaDiNet follows the enhancement-interaction paradigm. The Normalized Channel-wise Self-Attention Module (NAM) and the Dual-scale Interaction Block (DIB) are two key components of NaDiNet. NAM is a specific extension of the channel-wise self-attention mechanism (CAM) to enhance features extracted from low-contrast NRSD images. The softmax layer in CAM will produce very small correlation coefficients which are not conducive to low-contrast feature enhancement. Instead, in NAM, we directly calculate the normalized correlation coefficient between channels to enlarge the feature differentiation. DIB is specifically designed for the feature interaction of the enhanced features. It has two interaction branches with dual scales, one for fine-grained clues and the other for coarse-grained clues. With both branches working together, DIB can perceive defect regions of different granularities. With these modules working together, our NaDiNet can generate accurate segmentation map. Extensive experiments on the public NRSD-MN dataset with man-made and natural NRSDs demonstrate that our proposed NaDiNet with various backbones (i.e., VGG, ResNet, and DenseNet) consistently outperforms 10 state-of-the-art methods. The code and results of our method are available at https://github.com/monxxcn/NaDiNet.

Title: Meshes Meet Voxels: Abdominal Organ Segmentation via Diffeomorphic Deformations. (arXiv:2306.15515v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15515
Code URL: null
Copy Paste: [[2306.15515] Meshes Meet Voxels: Abdominal Organ Segmentation via Diffeomorphic Deformations](http://arxiv.org/abs/2306.15515) #segmentation
Summary:
Abdominal multi-organ segmentation from CT and MRI is an essential prerequisite for surgical planning and computer-aided navigation systems. Three-dimensional numeric representations of abdominal shapes are further important for quantitative and statistical analyses thereof. Existing methods in the field, however, are unable to extract highly accurate 3D representations that are smooth, topologically correct, and match points on a template. In this work, we present UNetFlow, a novel diffeomorphic shape deformation approach for abdominal organs. UNetFlow combines the advantages of voxel-based and mesh-based approaches for 3D shape extraction. Our results demonstrate high accuracy with respect to manually annotated CT data and better topological correctness compared to previous methods. In addition, we show the generalization of UNetFlow to MRI.

Title: What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation. (arXiv:2306.15521v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2306.15521
Code URL: https://github.com/blumenstiel/mess
Copy Paste: [[2306.15521] What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation](http://arxiv.org/abs/2306.15521) #segmentation
Summary:
While semantic segmentation has seen tremendous improvements in the past, there is still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models. The toolkit is available at https://github.com/blumenstiel/MESS.