secure

Title: Composing Bridges. (arXiv:2305.16435v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16435
Code URL: null
Copy Paste: [[2305.16435] Composing Bridges](http://arxiv.org/abs/2305.16435) #secure
Summary:
The present work builds on previous investigations of the authors (and their collaborators) regarding bridges, a certain type of morphisms between encryption schemes, making a step forward in developing a (category theory) language for studying relations between encryption schemes. Here we analyse the conditions under which bridges can be performed sequentially, formalizing the notion of composability. One of our results gives a sufficient condition for a pair of bridges to be composable. We illustrate that composing two bridges, each independently satisfying a previously established IND-CPA security definition, can actually lead to an insecure bridge. Our main result gives a sufficient condition that a pair of secure composable bridges should satisfy in order for their composition to be a secure bridge. We also introduce the concept of a complete bridge and show that it is connected to the notion of Fully composable Homomorphic Encryption (FcHE), recently considered by Micciancio. Moreover, we show that a result of Micciancio which gives a construction of FcHE schemes can be phrased in the language of complete bridges, where his insights can be formalised in a greater generality.

Title: vFedSec: Efficient Secure Aggregation for Vertical Federated Learning via Secure Layer. (arXiv:2305.16794v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16794
Code URL: null
Copy Paste: [[2305.16794] vFedSec: Efficient Secure Aggregation for Vertical Federated Learning via Secure Layer](http://arxiv.org/abs/2305.16794) #secure
Summary:
Most work in privacy-preserving federated learning (FL) has been focusing on horizontally partitioned datasets where clients share the same sets of features and can train complete models independently. However, in many interesting problems, individual data points are scattered across different clients/organizations in a vertical setting. Solutions for this type of FL require the exchange of intermediate outputs and gradients between participants, posing a potential risk of privacy leakage when privacy and security concerns are not considered. In this work, we present vFedSec - a novel design with an innovative Secure Layer for training vertical FL securely and efficiently using state-of-the-art security modules in secure aggregation. We theoretically demonstrate that our method does not impact the training performance while protecting private data effectively. Empirically results also show its applicability with extensive experiments that our design can achieve the protection with negligible computation and communication overhead. Also, our method can obtain 9.1e2 ~ 3.8e4 speedup compared to widely-adopted homomorphic encryption (HE) method.

security

Title: Vision-based UAV Detection in Complex Backgrounds and Rainy Conditions. (arXiv:2305.16450v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16450
Code URL: null
Copy Paste: [[2305.16450] Vision-based UAV Detection in Complex Backgrounds and Rainy Conditions](http://arxiv.org/abs/2305.16450) #security
Summary:
To detect UAVs in real-time, computer vision and deep learning approaches are developing areas of research. There have been concerns raised regarding the possible hazards and misuse of employing unmanned aerial vehicles (UAVs) in many applications. These include potential privacy violations, safety-related issues, and security threats. Vision-based detection systems often comprise a combination of hardware components such as cameras and software components. In this work, the performance of recent and popular vision-based object detection techniques is investigated for the task of UAV detection under challenging conditions such as complex backgrounds, varying UAV sizes, complex background scenarios, and low-to-heavy rainy conditions. To study the performance of selected methods under these conditions, two datasets were curated: one with a sky background and one with complex background. In this paper, one-stage detectors and two-stage detectors are studied and evaluated. The findings presented in the paper shall help provide insights concerning the performance of the selected models for the task of UAV detection under challenging conditions and pave the way to develop more robust UAV detection methods

Title: 5G/6G-Enabled Metaverse Technologies: Taxonomy, Applications, and Open Security Challenges with Future Research Directions. (arXiv:2305.16473v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16473
Code URL: null
Copy Paste: [[2305.16473] 5G/6G-Enabled Metaverse Technologies: Taxonomy, Applications, and Open Security Challenges with Future Research Directions](http://arxiv.org/abs/2305.16473) #security
Summary:
Internet technology has proven to be a vital contributor to many cutting-edge innovations that have given humans access to interact virtually with objects. Until now, numerous virtual systems had been developed for digital transformation to enable access to thousands of services and applications that range from virtual gaming to social networks. However, the majority of these systems lack to maintain consistency during interconnectivity and communication. To explore this discussion, in the recent past a new term, Metaverse has been introduced, which is the combination of meta and universe that describes a shared virtual environment, where a number of technologies, such as 4th and 5th generation technologies, VR, ML algorithms etc., work collectively to support each other for the sake of one objective, which is the virtual accessibility of objects via one network platform. With the development, integration, and virtualization of technologies, a lot of improvement in daily life applications is expected, but at the same time, there is a big challenge for the research community to secure this platform from external and external threats, because this technology is exposed to many cybersecurity attacks. Hence, it is imperative to systematically review and understand the taxonomy, applications, open security challenges, and future research directions of the emerging Metaverse technologies. In this paper, we have made useful efforts to present a comprehensive survey regarding Metaverse technology by taking into account the aforesaid parameters. Following this, in the initial phase, we explored the future of Metaverse in the presence of 4th and 5th generation technologies. Thereafter, we discussed the possible attacks to set a preface for the open security challenges. Based on that, we suggested potential research directions that could be beneficial to address these challenges cost-effectively.

Title: Panini -- Anonymous Anycast and an Instantiation. (arXiv:2305.16629v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16629
Code URL: null
Copy Paste: [[2305.16629] Panini -- Anonymous Anycast and an Instantiation](http://arxiv.org/abs/2305.16629) #security
Summary:
Anycast messaging (i.e., sending a message to an unspecified receiver) has long been neglected by the anonymous communication community. An anonymous anycast prevents senders from learning who the receiver of their message is, allowing for greater privacy in areas such as political activism and whistleblowing. While there have been some protocol ideas proposed, formal treatment of the problem is absent. Formal definitions of what constitutes anonymous anycast and privacy in this context are however a requirement for constructing protocols with provable guarantees. In this work, we define the anycast functionality and use a game-based approach to formalize its privacy and security goals. We further propose Panini, the first anonymous anycast protocol that only requires readily available infrastructure. We show that Panini allows the actual receiver of the anycast message to remain anonymous, even in the presence of an honest but curious sender. In an empirical evaluation, we find that Panini adds only minimal overhead over regular unicast: Sending a message anonymously to one of eight possible receivers results in an end-to-end latency of 0.76s.

privacy

Title: S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts. (arXiv:2305.16685v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16685
Code URL: https://github.com/ytongxie/s4m
Copy Paste: [[2305.16685] S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts](http://arxiv.org/abs/2305.16685) #privacy
Summary:
In this paper, we seek to design a report generation model that is able to generate reasonable reports even given different images of various body parts. We start by directly merging multiple datasets and training a single report generation model on this one. We, however, observe that the reports generated in such a simple way only obtain comparable performance compared with that trained separately on each specific dataset. We suspect that this is caused by the dilemma between the diversity of body parts and the limited availability of medical data. To develop robust and generalizable models, it is important to consider a diverse range of body parts and medical conditions. However, collecting a sufficiently large dataset for each specific body part can be difficult due to various factors, such as data availability and privacy concerns. Thus, rather than striving for more data, we propose a single-for-multiple (S4M) framework, which seeks to facilitate the learning of the report generation model with two auxiliary priors: an explicit prior (\ie, feeding radiology-informed knowledge) and an implicit prior (\ie, guided by cross-modal features). Specifically, based on the conventional encoder-decoder report generation framework, we incorporate two extra branches: a Radiology-informed Knowledge Aggregation (RadKA) branch and an Implicit Prior Guidance (IPG) branch. We conduct the experiments on our merged dataset which consists of a public dataset (\ie, IU-Xray) and five private datasets, covering six body parts: chest, abdomen, knee, hip, wrist and shoulder. Our S4M model outperforms all the baselines, regardless of whether they are trained on separate or merged datasets. Code is available at: \url{https://github.com/YtongXie/S4M}.

Title: FairDP: Certified Fairness with Differential Privacy. (arXiv:2305.16474v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16474
Code URL: null
Copy Paste: [[2305.16474] FairDP: Certified Fairness with Differential Privacy](http://arxiv.org/abs/2305.16474) #privacy
Summary:
This paper introduces FairDP, a novel mechanism designed to simultaneously ensure differential privacy (DP) and fairness. FairDP operates by independently training models for distinct individual groups, using group-specific clipping terms to assess and bound the disparate impacts of DP. Throughout the training process, the mechanism progressively integrates knowledge from group models to formulate a comprehensive model that balances privacy, utility, and fairness in downstream tasks. Extensive theoretical and empirical analyses validate the efficacy of FairDP, demonstrating improved trade-offs between model utility, privacy, and fairness compared with existing methods.

Title: Privacy-aware Gaussian Process Regression. (arXiv:2305.16541v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16541
Code URL: null
Copy Paste: [[2305.16541] Privacy-aware Gaussian Process Regression](http://arxiv.org/abs/2305.16541) #privacy
Summary:
We propose the first theoretical and methodological framework for Gaussian process regression subject to privacy constraints. The proposed method can be used when a data owner is unwilling to share a high-fidelity supervised learning model built from their data with the public due to privacy concerns. The key idea of the proposed method is to add synthetic noise to the data until the predictive variance of the Gaussian process model reaches a prespecified privacy level. The optimal covariance matrix of the synthetic noise is formulated in terms of semi-definite programming. We also introduce the formulation of privacy-aware solutions under continuous privacy constraints using kernel-based approaches, and study their theoretical properties. The proposed method is illustrated by considering a model that tracks the trajectories of satellites.

Title: Towards Certification of Machine Learning-Based Distributed Systems. (arXiv:2305.16822v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16822
Code URL: null
Copy Paste: [[2305.16822] Towards Certification of Machine Learning-Based Distributed Systems](http://arxiv.org/abs/2305.16822) #privacy
Summary:
Machine Learning (ML) is increasingly used to drive the operation of complex distributed systems deployed on the cloud-edge continuum enabled by 5G. Correspondingly, distributed systems' behavior is becoming more non-deterministic in nature. This evolution of distributed systems requires the definition of new assurance approaches for the verification of non-functional properties. Certification, the most popular assurance technique for system and software verification, is not immediately applicable to systems whose behavior is determined by Machine Learning-based inference. However, there is an increasing push from policy makers, regulators, and industrial stakeholders towards the definition of techniques for the certification of non-functional properties (e.g., fairness, robustness, privacy) of ML. This article analyzes the challenges and deficiencies of current certification schemes, discusses open research issues and proposes a first certification scheme for ML-based distributed systems.

protect

Title: Don't Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text. (arXiv:2305.16444v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16444
Code URL: null
Copy Paste: [[2305.16444] Don't Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text](http://arxiv.org/abs/2305.16444) #protect
Summary:
Can language models transform inputs to protect text classifiers against adversarial attacks? In this work, we present ATINTER, a model that intercepts and learns to rewrite adversarial inputs to make them non-adversarial for a downstream text classifier. Our experiments on four datasets and five attack mechanisms reveal that ATINTER is effective at providing better adversarial robustness than existing defense approaches, without compromising task accuracy. For example, on sentiment classification using the SST-2 dataset, our method improves the adversarial accuracy over the best existing defense approach by more than 4% with a smaller decrease in task accuracy (0.5% vs 2.5%). Moreover, we show that ATINTER generalizes across multiple downstream tasks and classifiers without having to explicitly retrain it for those settings. Specifically, we find that when ATINTER is trained to remove adversarial perturbations for the sentiment classification task on the SST-2 dataset, it even transfers to a semantically different task of news classification (on AGNews) and improves the adversarial robustness by more than 10%.

defense

attack

Title: IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks. (arXiv:2305.16503v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16503
Code URL: https://github.com/xlhex/imbert
Copy Paste: [[2305.16503] IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks](http://arxiv.org/abs/2305.16503) #attack
Summary:
Backdoor attacks are an insidious security threat against machine learning models. Adversaries can manipulate the predictions of compromised models by inserting triggers into the training phase. Various backdoor attacks have been devised which can achieve nearly perfect attack success without affecting model predictions for clean inputs. Means of mitigating such vulnerabilities are underdeveloped, especially in natural language processing. To fill this gap, we introduce IMBERT, which uses either gradients or self-attention scores derived from victim models to self-defend against backdoor attacks at inference time. Our empirical studies demonstrate that IMBERT can effectively identify up to 98.5% of inserted triggers. Thus, it significantly reduces the attack success rate while attaining competitive accuracy on the clean dataset across widespread insertion-based attacks compared to two baselines. Finally, we show that our approach is model-agnostic, and can be easily ported to several pre-trained transformer models.

Title: FIDS: Fuzzy Intrusion Detection System for simultaneous detection of DoS/DDoS attacks in Cloud computing. (arXiv:2305.16389v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16389
Code URL: null
Copy Paste: [[2305.16389] FIDS: Fuzzy Intrusion Detection System for simultaneous detection of DoS/DDoS attacks in Cloud computing](http://arxiv.org/abs/2305.16389) #attack
Summary:
In recent times, I've encountered a principle known as cloud computing, a model that simplifies user access to data and computing power on a demand basis. The main objective of cloud computing is to accommodate users' growing needs by decreasing dependence on human resources, minimizing expenses, and enhancing the speed of data access. Nevertheless, preserving security and privacy in cloud computing systems pose notable challenges. This issue arises because these systems have a distributed structure, which is susceptible to unsanctioned access - a fundamental problem. In the context of cloud computing, the provision of services on demand makes them targets for common assaults like Denial of Service (DoS) attacks, which include Economic Denial of Sustainability (EDoS) and Distributed Denial of Service (DDoS). These onslaughts can be classified into three categories: bandwidth consumption attacks, specific application attacks, and connection layer attacks. Most of the studies conducted in this arena have concentrated on a singular type of attack, with the concurrent detection of multiple DoS attacks often overlooked. This article proposes a suitable method to identify four types of assaults: HTTP, Database, TCP SYN, and DNS Flood. The aim is to present a universal algorithm that performs effectively in detecting all four attacks instead of using separate algorithms for each one. In this technique, seventeen server parameters like memory usage, CPU usage, and input/output counts are extracted and monitored for changes, identifying the failure point using the CUSUM algorithm to calculate the likelihood of each attack. Subsequently, a fuzzy neural network is employed to determine the occurrence of an attack. When compared to the Snort software, the proposed method's results show a significant improvement in the average detection rate, jumping from 57% to 95%.

Title: Automated Verification of Correctness for Masked Arithmetic Programs. (arXiv:2305.16596v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16596
Code URL: null
Copy Paste: [[2305.16596] Automated Verification of Correctness for Masked Arithmetic Programs](http://arxiv.org/abs/2305.16596) #attack
Summary:
Masking is a widely-used effective countermeasure against power side-channel attacks for implementing cryptographic algorithms. Surprisingly, few formal verification techniques have addressed a fundamental question, i.e., whether the masked program and the original (unmasked) cryptographic algorithm are functional equivalent. In this paper, we study this problem for masked arithmetic programs over Galois fields of characteristic 2. We propose an automated approach based on term rewriting, aided by random testing and SMT solving. The overall approach is sound, and complete under certain conditions which do meet in practice. We implement the approach as a new tool FISCHER and carry out extensive experiments on various benchmarks. The results confirm the effectiveness, efficiency and scalability of our approach. Almost all the benchmarks can be proved for the first time by the term rewriting system solely. In particular, FISCHER detects a new flaw in a masked implementation published in EUROCRYPT 2017.

Title: Attacks on Continuous Chaos Communication and Remedies for Resource Limited Devices. (arXiv:2305.16692v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16692
Code URL: null
Copy Paste: [[2305.16692] Attacks on Continuous Chaos Communication and Remedies for Resource Limited Devices](http://arxiv.org/abs/2305.16692) #attack
Summary:
The Global Wearable market is anticipated to rise at a considerable rate in the next coming years and communication is a fundamental block in any wearable device. In communication, encryption methods are being used with the aid of microcontrollers or software implementations, which are power-consuming and incorporate complex hardware implementation. Internet of Things (IoT) devices are considered as resource-constrained devices that are expected to operate with low computational power and resource utilization criteria. At the same time, recent research has shown that IoT devices are highly vulnerable to emerging security threats, which elevates the need for low-power and small-size hardware-based security countermeasures. Chaotic encryption is a method of data encryption that utilizes chaotic systems and non-linear dynamics to generate secure encryption keys. It aims to provide high-level security by creating encryption keys that are sensitive to initial conditions and difficult to predict, making it challenging for unauthorized parties to intercept and decode encrypted data. Since the discovery of chaotic equations, there have been various encryption applications associated with them. In this paper, we comprehensively analyze the physical and encryption attacks on continuous chaotic systems in resource-constrained devices and their potential remedies. To this aim, we introduce different categories of attacks of chaotic encryption. Our experiments focus on chaotic equations implemented using Chua's equation and leverages circuit architectures and provide simulations proof of remedies for different attacks. These remedies are provided to block the attackers from stealing users' information (e.g., a pulse message) with negligible cost to the power and area of the design.

Title: Incentive Attacks on DAG-Based Blockchains with Random Transaction Selection. (arXiv:2305.16757v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16757
Code URL: null
Copy Paste: [[2305.16757] Incentive Attacks on DAG-Based Blockchains with Random Transaction Selection](http://arxiv.org/abs/2305.16757) #attack
Summary:
Several blockchain consensus protocols proposed to use of Directed Acyclic Graphs (DAGs) to solve the limited processing throughput of traditional single-chain Proof-of-Work (PoW) blockchains. Many such protocols utilize a random transaction selection (RTS) strategy (e.g., PHANTOM, GHOSTDAG, SPECTRE, Inclusive, and Prism) to avoid transaction duplicates across parallel blocks in DAG and thus maximize the network throughput. However, previous research has not rigorously examined incentive-oriented greedy behaviors when transaction selection deviates from the protocol. In this work, we first perform a generic game-theoretic analysis abstracting several DAG-based blockchain protocols that use the RTS strategy, and we prove that such a strategy does not constitute a Nash equilibrium, which is contradictory to the proof in the Inclusive paper. Next, we develop a blockchain simulator that extends existing open-source tools to support multiple chains and explore incentive-based deviations from the protocol. We perform simulations with ten miners to confirm our conclusion from the game-theoretic analysis. The simulations confirm that greedy actors who do not follow the RTS strategy can profit more than honest miners and harm the processing throughput of the protocol because duplicate transactions are included in more than one block of different chains. We show that this effect is indirectly proportional to the network propagation delay. Finally, we show that greedy miners are incentivized to form a shared mining pool to increase their profits. This undermines the decentralization and degrades the design of the protocols in question. To further support our claims, we execute more complex experiments on a realistic Bitcoin-like network with more than 7000 nodes.

robust

Title: A Semi-Automated Corner Case Detection and Evaluation Pipeline. (arXiv:2305.16369v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16369
Code URL: null
Copy Paste: [[2305.16369] A Semi-Automated Corner Case Detection and Evaluation Pipeline](http://arxiv.org/abs/2305.16369) #robust
Summary:
In order to deploy automated vehicles to the public, it has to be proven that the vehicle can safely and robustly handle traffic in many different scenarios. One important component of automated vehicles is the perception system that captures and processes the environment around the vehicle. Perception systems require large datasets for training their deep neural network. Knowing which parts of the data in these datasets describe a corner case is an advantage during training or testing of the network. These corner cases describe situations that are rare and potentially challenging for the network. We propose a pipeline that converts collective expert knowledge descriptions into the extended KI Absicherung ontology. The ontology is used to describe scenes and scenarios that can be mapped to perception datasets. The corner cases can then be extracted from the datasets. In addition, the pipeline enables the evaluation of the detection networks against the extracted corner cases to measure their performance.

Title: ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image. (arXiv:2305.16411v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16411
Code URL: null
Copy Paste: [[2305.16411] ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image](http://arxiv.org/abs/2305.16411) #robust
Summary:
Recent advancements in text-to-image generation have enabled significant progress in zero-shot 3D shape generation. This is achieved by score distillation, a methodology that uses pre-trained text-to-image diffusion models to optimize the parameters of a 3D neural presentation, e.g. Neural Radiance Field (NeRF). While showing promising results, existing methods are often not able to preserve the geometry of complex shapes, such as human bodies. To address this challenge, we present ZeroAvatar, a method that introduces the explicit 3D human body prior to the optimization process. Specifically, we first estimate and refine the parameters of a parametric human body from a single image. Then during optimization, we use the posed parametric body as additional geometry constraint to regularize the diffusion model as well as the underlying density field. Lastly, we propose a UV-guided texture regularization term to further guide the completion of texture on invisible body parts. We show that ZeroAvatar significantly enhances the robustness and 3D consistency of optimization-based image-to-3D avatar generation, outperforming existing zero-shot image-to-3D methods.

Title: ReConPatch : Contrastive Patch Representation Learning for Industrial Anomaly Detection. (arXiv:2305.16713v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16713
Code URL: null
Copy Paste: [[2305.16713] ReConPatch : Contrastive Patch Representation Learning for Industrial Anomaly Detection](http://arxiv.org/abs/2305.16713) #robust
Summary:
Anomaly detection is crucial to the advanced identification of product defects such as incorrect parts, misaligned components, and damages in industrial manufacturing. Due to the rare observations and unknown types of defects, anomaly detection is considered to be challenging in machine learning. To overcome this difficulty, recent approaches utilize the common visual representations from natural image datasets and distill the relevant features. However, existing approaches still have the discrepancy between the pre-trained feature and the target data, or require the input augmentation which should be carefully designed particularly for the industrial dataset. In this paper, we introduce ReConPatch, which constructs discriminative features for anomaly detection by training a linear modulation attached to a pre-trained model. ReConPatch employs contrastive representation learning to collect and distribute features in a way that produces a target-oriented and easily separable representation. To address the absence of labeled pairs for the contrastive learning, we utilize two similarity measures, pairwise and contextual similarities, between data representations as a pseudo-label. Unlike previous work, ReConPatch achieves robust anomaly detection performance without extensive input augmentation. Our method achieves the state-of-the-art anomaly detection performance (99.72%) for the widely used and challenging MVTec AD dataset.

Title: CNN Feature Map Augmentation for Single-Source Domain Generalization. (arXiv:2305.16746v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16746
Code URL: null
Copy Paste: [[2305.16746] CNN Feature Map Augmentation for Single-Source Domain Generalization](http://arxiv.org/abs/2305.16746) #robust
Summary:
In search of robust and generalizable machine learning models, Domain Generalization (DG) has gained significant traction during the past few years. The goal in DG is to produce models which continue to perform well when presented with data distributions different from the ones seen during training. While deep convolutional neural networks (CNN) have been able to achieve outstanding performance on downstream computer vision tasks, they still often fail to generalize on previously unseen data Domains. Therefore, in this work we focus on producing a model which is able to remain robust under data distribution shift and propose an alternative regularization technique for convolutional neural network architectures in the single-source DG image classification setting. To mitigate the problem caused by domain shift between source and target data, we propose augmenting intermediate feature maps of CNNs. Specifically, we pass them through a novel Augmentation Layer to prevent models from overfitting on the training set and improve their cross-domain generalization. To the best of our knowledge, this is the first paper proposing such a setup for the DG image classification setting. Experiments on the DG benchmark datasets of PACS, VLCS, Office-Home and TerraIncognita validate the effectiveness of our method, in which our model surpasses state-of-the-art algorithms in most cases.

Title: Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering. (arXiv:2305.16335v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16335
Code URL: null
Copy Paste: [[2305.16335] Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering](http://arxiv.org/abs/2305.16335) #robust
Summary:
Short text clustering is challenging since it takes imbalanced and noisy data as inputs. Existing approaches cannot solve this problem well, since (1) they are prone to obtain degenerate solutions especially on heavy imbalanced datasets, and (2) they are vulnerable to noises. To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data. RSTC includes two modules, i.e., pseudo-label generation module and robust representation learning module. The former generates pseudo-labels to provide supervision for the later, which contributes to more robust representations and correctly separated clusters. To provide robustness against the imbalance in data, we propose self-adaptive optimal transport in the pseudo-label generation module. To improve robustness against the noise in data, we further introduce both class-wise and instance-wise contrastive learning in the robust representation learning module. Our empirical studies on eight short text clustering datasets demonstrate that RSTC significantly outperforms the state-of-the-art models. The code is available at: https://github.com/hmllmh/RSTC.

Title: Handling Realistic Label Noise in BERT Text Classification. (arXiv:2305.16337v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16337
Code URL: null
Copy Paste: [[2305.16337] Handling Realistic Label Noise in BERT Text Classification](http://arxiv.org/abs/2305.16337) #robust
Summary:
Labels noise refers to errors in training labels caused by cheap data annotation methods, such as web scraping or crowd-sourcing, which can be detrimental to the performance of supervised classifiers. Several methods have been proposed to counteract the effect of random label noise in supervised classification, and some studies have shown that BERT is already robust against high rates of randomly injected label noise. However, real label noise is not random; rather, it is often correlated with input features or other annotator-specific factors. In this paper, we evaluate BERT in the presence of two types of realistic label noise: feature-dependent label noise, and synthetic label noise from annotator disagreements. We show that the presence of these types of noise significantly degrades BERT classification performance. To improve robustness, we evaluate different types of ensembles and noise-cleaning methods and compare their effectiveness against label noise across different datasets.

Title: Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios. (arXiv:2305.16572v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16572
Code URL: https://github.com/goldengua/counterfactual_inference_lm
Copy Paste: [[2305.16572] Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios](http://arxiv.org/abs/2305.16572) #robust
Summary:
Current pre-trained language models have enabled remarkable improvements in downstream tasks, but it remains difficult to distinguish effects of statistical correlation from more systematic logical reasoning grounded on the understanding of real world. We tease these factors apart by leveraging counterfactual conditionals, which force language models to predict unusual consequences based on hypothetical propositions. We introduce a set of tests from psycholinguistic experiments, as well as larger-scale controlled datasets, to probe counterfactual predictions from five pre-trained language models. We find that models are consistently able to override real-world knowledge in counterfactual scenarios, and that this effect is more robust in case of stronger baseline world knowledge -- however, we also find that for most models this effect appears largely to be driven by simple lexical cues. When we mitigate effects of both world knowledge and lexical cues to test knowledge of linguistic nuances of counterfactuals, we find that only GPT-3 shows sensitivity to these nuances, though this sensitivity is also non-trivially impacted by lexical associative factors.

Title: An Investigation of Noise in Morphological Inflection. (arXiv:2305.16581v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16581
Code URL: https://github.com/adamits/morphological-inflection-noise
Copy Paste: [[2305.16581] An Investigation of Noise in Morphological Inflection](http://arxiv.org/abs/2305.16581) #robust
Summary:
With a growing focus on morphological inflection systems for languages where high-quality data is scarce, training data noise is a serious but so far largely ignored concern. We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data. Then, we compare the effect of different types of noise on multiple state-of-the-art inflection models. Finally, we propose a novel character-level masked language modeling (CMLM) pretraining objective and explore its impact on the models' resistance to noise. Our experiments show that various architectures are impacted differently by separate types of noise, but encoder-decoders tend to be more robust to noise than models trained with a copy bias. CMLM pretraining helps transformers, but has lower impact on LSTMs.

Title: Evaluation of Question Generation Needs More References. (arXiv:2305.16626v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16626
Code URL: null
Copy Paste: [[2305.16626] Evaluation of Question Generation Needs More References](http://arxiv.org/abs/2305.16626) #robust
Summary:
Question generation (QG) is the task of generating a valid and fluent question based on a given context and the target answer. According to various purposes, even given the same context, instructors can ask questions about different concepts, and even the same concept can be written in different ways. However, the evaluation for QG usually depends on single reference-based similarity metrics, such as n-gram-based metric or learned metric, which is not sufficient to fully evaluate the potential of QG methods. To this end, we propose to paraphrase the reference question for a more robust QG evaluation. Using large language models such as GPT-3, we created semantically and syntactically diverse questions, then adopt the simple aggregation of the popular evaluation metrics as the final scores. Through our experiments, we found that using multiple (pseudo) references is more effective for QG evaluation while showing a higher correlation with human evaluations than evaluation with a single reference.

Title: TADA: Task-Agnostic Dialect Adapters for English. (arXiv:2305.16651v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16651
Code URL: null
Copy Paste: [[2305.16651] TADA: Task-Agnostic Dialect Adapters for English](http://arxiv.org/abs/2305.16651) #robust
Summary:
Large Language Models, the dominant starting point for Natural Language Processing (NLP) applications, fail at a higher rate for speakers of English dialects other than Standard American English (SAE). Prior work addresses this using task-specific data or synthetic data augmentation, both of which require intervention for each dialect and task pair. This poses a scalability issue that prevents the broad adoption of robust dialectal English NLP. We introduce a simple yet effective method for task-agnostic dialect adaptation by aligning non-SAE dialects using adapters and composing them with task-specific adapters from SAE. Task-Agnostic Dialect Adapters (TADA) improve dialectal robustness on 4 dialectal variants of the GLUE benchmark without task-specific supervision.

Title: With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness. (arXiv:2305.16819v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16819
Code URL: null
Copy Paste: [[2305.16819] With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness](http://arxiv.org/abs/2305.16819) #robust
Summary:
Conditional language models still generate unfaithful output that is not supported by their input. These unfaithful generations jeopardize trust in real-world applications such as summarization or human-machine interaction, motivating a need for automatic faithfulness metrics. To implement such metrics, NLI models seem attractive, since they solve a strongly related task that comes with a wealth of prior research and data. But recent research suggests that NLI models require costly additional machinery to perform reliably across datasets, e.g., by running inference on a cartesian product of input and generated sentences, or supporting them with a question-generation/answering step.

In this work we show that pure NLI models _can_ outperform more complex metrics when combining task-adaptive data augmentation with robust inference procedures. We propose: (1) Augmenting NLI training data to adapt NL inferences to the specificities of faithfulness prediction in dialogue; (2) Making use of both entailment and contradiction probabilities in NLI, and (3) Using Monte-Carlo dropout during inference. Applied to the TRUE benchmark, which combines faithfulness datasets across diverse domains and tasks, our approach strongly improves a vanilla NLI model and significantly outperforms previous work, while showing favourable computational cost.

Title: Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring. (arXiv:2305.16826v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16826
Code URL: https://github.com/doheejin/protact
Copy Paste: [[2305.16826] Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring](http://arxiv.org/abs/2305.16826) #robust
Summary:
Automated essay scoring (AES) aims to score essays written for a given prompt, which defines the writing topic. Most existing AES systems assume to grade essays of the same prompt as used in training and assign only a holistic score. However, such settings conflict with real-education situations; pre-graded essays for a particular prompt are lacking, and detailed trait scores of sub-rubrics are required. Thus, predicting various trait scores of unseen-prompt essays (called cross-prompt essay trait scoring) is a remaining challenge of AES. In this paper, we propose a robust model: prompt- and trait relation-aware cross-prompt essay trait scorer. We encode prompt-aware essay representation by essay-prompt attention and utilizing the topic-coherence feature extracted by the topic-modeling mechanism without access to labeled data; therefore, our model considers the prompt adherence of an essay, even in a cross-prompt setting. To facilitate multi-trait scoring, we design trait-similarity loss that encapsulates the correlations of traits. Experiments prove the efficacy of our model, showing state-of-the-art results for all prompts and traits. Significant improvements in low-resource-prompt and inferior traits further indicate our model's strength.

Title: Counterfactual Explainer Framework for Deep Reinforcement Learning Models Using Policy Distillation. (arXiv:2305.16532v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16532
Code URL: https://github.com/amir-samadi/counterfactual-explanation
Copy Paste: [[2305.16532] Counterfactual Explainer Framework for Deep Reinforcement Learning Models Using Policy Distillation](http://arxiv.org/abs/2305.16532) #robust
Summary:
Deep Reinforcement Learning (DRL) has demonstrated promising capability in solving complex control problems. However, DRL applications in safety-critical systems are hindered by the inherent lack of robust verification techniques to assure their performance in such applications. One of the key requirements of the verification process is the development of effective techniques to explain the system functionality, i.e., why the system produces specific results in given circumstances. Recently, interpretation methods based on the Counterfactual (CF) explanation approach have been proposed to address the problem of explanation in DRLs. This paper proposes a novel CF explanation framework to explain the decisions made by a black-box DRL. To evaluate the efficacy of the proposed explanation framework, we carried out several experiments in the domains of automated driving systems and Atari Pong game. Our analysis demonstrates that the proposed framework generates plausible and meaningful explanations for various decisions made by deep underlying DRLs. Source codes are available at: \url{https://github.com/Amir-Samadi/Counterfactual-Explanation}

Title: Comparing Long Short-Term Memory (LSTM) and Bidirectional LSTM Deep Neural Networks for power consumption prediction. (arXiv:2305.16546v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16546
Code URL: null
Copy Paste: [[2305.16546] Comparing Long Short-Term Memory (LSTM) and Bidirectional LSTM Deep Neural Networks for power consumption prediction](http://arxiv.org/abs/2305.16546) #robust
Summary:
Electric consumption prediction methods are investigated for many reasons such as decision-making related to energy efficiency as well as for anticipating demand in the energy market dynamics. The objective of the present work is the comparison between two Deep Learning models, namely the Long Short-Term Memory (LSTM) and Bi-directional LSTM (BLSTM) for univariate electric consumption Time Series (TS) short-term forecast. The Data Sets (DSs) were selected for their different contexts and scales, aiming the assessment of the models' robustness. Four DSs were used, related to the power consumption of: (a) a household in France; (b) a university building in Santar\'em, Brazil; (c) the T\'etouan city zones, in Morocco; and (c) the Singapore aggregated electric demand. The metrics RMSE, MAE, MAPE and R2 were calculated in a TS cross-validation scheme. The Friedman's test was applied to normalized RMSE (NRMSE) results, showing that BLSTM outperforms LSTM with statistically significant difference (p = 0.0455), corroborating the fact that bidirectional weight updating improves significantly the LSTM performance concerning different scales of electric power consumption.

Title: Unsupervised Embedding Quality Evaluation. (arXiv:2305.16562v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16562
Code URL: null
Copy Paste: [[2305.16562] Unsupervised Embedding Quality Evaluation](http://arxiv.org/abs/2305.16562) #robust
Summary:
Unsupervised learning has recently significantly gained in popularity, especially with deep learning-based approaches. Despite numerous successes and approaching supervised-level performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often unclear whether they will perform well when transferred to another domain.

Past works are generally limited to assessing the amount of information contained in embeddings, which is most relevant for self-supervised learning of deep neural networks. This works chooses to follow a different approach: can we quantify how easy it is to linearly separate the data in a stable way? We survey the literature and uncover three methods that could be potentially used for evaluating quality of representations. We also introduce one novel method based on recent advances in understanding the high-dimensional geometric structure self-supervised learning.

We conduct extensive experiments and study the properties of these metrics and ones introduced in the previous work. Our results suggest that while there is no free lunch, there are metrics that can robustly estimate embedding quality in an unsupervised way.

Title: The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model. (arXiv:2305.16589v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16589
Code URL: null
Copy Paste: [[2305.16589] The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model](http://arxiv.org/abs/2305.16589) #robust
Summary:
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL.

Assuming access to a generative model that draws samples based on the nominal MDP, we characterize the sample complexity of RMDPs when the uncertainty set is specified via either the total variation (TV) distance or $\chi^2$ divergence. The algorithm studied here is a model-based method called {\em distributionally robust value iteration}, which is shown to be near-optimal for the full range of uncertainty levels. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set: in the case w.r.t.~the TV distance, the minimax sample complexity of RMDPs is always smaller than that of standard MDPs; in the case w.r.t.~the $\chi^2$ divergence, the sample complexity of RMDPs can often far exceed the standard MDP counterpart.

Title: Unleashing the Potential of Unsupervised Deep Outlier Detection through Automated Training Stopping. (arXiv:2305.16777v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16777
Code URL: https://github.com/goldennormal/automatedtrainingod
Copy Paste: [[2305.16777] Unleashing the Potential of Unsupervised Deep Outlier Detection through Automated Training Stopping](http://arxiv.org/abs/2305.16777) #robust
Summary:
Outlier detection (OD) has received continuous research interests due to its wide applications. With the development of deep learning, increasingly deep OD algorithms are proposed. Despite the availability of numerous deep OD models, existing research has reported that the performance of deep models is extremely sensitive to the configuration of hyperparameters (HPs). However, the selection of HPs for deep OD models remains a notoriously difficult task due to the lack of any labels and long list of HPs. In our study. we shed light on an essential factor, training time, that can introduce significant variation in the performance of deep model. Even the performance is stable across other HPs, training time itself can cause a serious HP sensitivity issue. Motivated by this finding, we are dedicated to formulating a strategy to terminate model training at the optimal iteration. Specifically, we propose a novel metric called loss entropy to internally evaluate the model performance during training while an automated training stopping algorithm is devised. To our knowledge, our approach is the first to enable reliable identification of the optimal training iteration during training without requiring any labels. Our experiments on tabular, image datasets show that our approach can be applied to diverse deep models and datasets. It not only enhances the robustness of deep models to their HPs, but also improves the performance and reduces plenty of training time compared to naive training.

biometric

Title: Fast IDentity Online with Anonymous Credentials (FIDO-AC). (arXiv:2305.16758v1 [cs.CR])

Paper URL: http://arxiv.org/abs/2305.16758
Code URL: null
Copy Paste: [[2305.16758] Fast IDentity Online with Anonymous Credentials (FIDO-AC)](http://arxiv.org/abs/2305.16758) #biometric
Summary:
Web authentication is a critical component of today's Internet and the digital world we interact with. The FIDO2 protocol enables users to leverage common devices to easily authenticate to online services in both mobile and desktop environments following the passwordless authentication approach based on cryptography and biometric verification. However, there is little to no connection between the authentication process and users' attributes. More specifically, the FIDO protocol does not specify methods that could be used to combine trusted attributes with the FIDO authentication process generically and allows users to disclose them to the relying party arbitrarily. In essence, applications requiring attributes verification (e.g. age or expiry date of a driver's license, etc.) still rely on ad-hoc approaches, not satisfying the data minimization principle and not allowing the user to vet the disclosed data. A primary recent example is the data breach on Singtel Optus, one of the major telecommunications providers in Australia, where very personal and sensitive data (e.g. passport numbers) were leaked. This paper introduces FIDO-AC, a novel framework that combines the FIDO2 authentication process with the user's digital and non-shareable identity. We show how to instantiate this framework using off-the-shelf FIDO tokens and any electronic identity document, e.g., the ICAO biometric passport (ePassport). We demonstrate the practicality of our approach by evaluating a prototype implementation of the FIDO-AC system.

steal

Title: Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability. (arXiv:2305.16494v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16494
Code URL: https://github.com/xavihart/diff-pgd
Copy Paste: [[2305.16494] Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability](http://arxiv.org/abs/2305.16494) #steal
Summary:
Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD

extraction

Title: A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem. (arXiv:2305.16343v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16343
Code URL: null
Copy Paste: [[2305.16343] A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem](http://arxiv.org/abs/2305.16343) #extraction
Summary:
Automatic Term Recognition is used to extract domain-specific terms that belong to a given domain. In order to be accurate, these corpus and language-dependent methods require large volumes of textual data that need to be processed to extract candidate terms that are afterward scored according to a given metric. To improve text preprocessing and candidate terms extraction and scoring, we propose a distributed Spark-based architecture to automatically extract domain-specific terms. The main contributions are as follows: (1) propose a novel distributed automatic domain-specific multi-word term recognition architecture built on top of the Spark ecosystem; (2) perform an in-depth analysis of our architecture in terms of accuracy and scalability; (3) design an easy-to-integrate Python implementation that enables the use of Big Data processing in fields such as Computational Linguistics and Natural Language Processing. We prove empirically the feasibility of our architecture by performing experiments on two real-world datasets.

Title: Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset. (arXiv:2305.16344v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16344
Code URL: null
Copy Paste: [[2305.16344] Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset](http://arxiv.org/abs/2305.16344) #extraction
Summary:
Large Language Models (LLMs) demonstrate exceptional performance in textual understanding and tabular reasoning tasks. However, their ability to comprehend and analyze hybrid text, containing textual and tabular data, remains underexplored. In this research, we specialize in harnessing the potential of LLMs to comprehend critical information from financial reports, which are hybrid long-documents. We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports. To evaluate AFIE, we develop a Financial Reports Numerical Extraction (FINE) dataset and conduct an extensive experimental analysis. Our framework is effectively validated on GPT-3.5 and GPT-4, yielding average accuracy increases of 53.94% and 33.77%, respectively, compared to a naive method. These results suggest that the AFIE framework offers accuracy for automated numerical extraction from complex, hybrid documents.

Title: Teamwork Is Not Always Good: An Empirical Study of Classifier Drift in Class-incremental Information Extraction. (arXiv:2305.16559v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16559
Code URL: https://github.com/vt-nlp/ice
Copy Paste: [[2305.16559] Teamwork Is Not Always Good: An Empirical Study of Classifier Drift in Class-incremental Information Extraction](http://arxiv.org/abs/2305.16559) #extraction
Summary:
Class-incremental learning (CIL) aims to develop a learning system that can continually learn new classes from a data stream without forgetting previously learned classes. When learning classes incrementally, the classifier must be constantly updated to incorporate new classes, and the drift in decision boundary may lead to severe forgetting. This fundamental challenge, however, has not yet been studied extensively, especially in the setting where no samples from old classes are stored for rehearsal. In this paper, we take a closer look at how the drift in the classifier leads to forgetting, and accordingly, design four simple yet (super-) effective solutions to alleviate the classifier drift: an Individual Classifiers with Frozen Feature Extractor (ICE) framework where we individually train a classifier for each learning session, and its three variants ICE-PL, ICE-O, and ICE-PL&O which further take the logits of previously learned classes from old sessions or a constant logit of an Other class as a constraint to the learning of new classifiers. Extensive experiments and analysis on 6 class-incremental information extraction tasks demonstrate that our solutions, especially ICE-O, consistently show significant improvement over the previous state-of-the-art approaches with up to 44.7% absolute F-score gain, providing a strong baseline and insights for future research on class-incremental learning.

Title: GDA: Generative Data Augmentation Techniques for Relation Extraction Tasks. (arXiv:2305.16663v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16663
Code URL: null
Copy Paste: [[2305.16663] GDA: Generative Data Augmentation Techniques for Relation Extraction Tasks](http://arxiv.org/abs/2305.16663) #extraction
Summary:
Relation extraction (RE) tasks show promising performance in extracting relations from two entities mentioned in sentences, given sufficient annotations available during training. Such annotations would be labor-intensive to obtain in practice. Existing work adopts data augmentation techniques to generate pseudo-annotated sentences beyond limited annotations. These techniques neither preserve the semantic consistency of the original sentences when rule-based augmentations are adopted, nor preserve the syntax structure of sentences when expressing relations using seq2seq models, resulting in less diverse augmentations. In this work, we propose a dedicated augmentation technique for relational texts, named GDA, which uses two complementary modules to preserve both semantic consistency and syntax structures. We adopt a generative formulation and design a multi-tasking solution to achieve synergies. Furthermore, GDA adopts entity hints as the prior knowledge of the generative model to augment diverse sentences. Experimental results in three datasets under a low-resource setting showed that GDA could bring {\em 2.0\%} F1 improvements compared with no augmentation technique. Source code and data are available.

Title: AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model. (arXiv:2305.16734v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16734
Code URL: https://github.com/pluslabnlp/ampere
Copy Paste: [[2305.16734] AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model](http://arxiv.org/abs/2305.16734) #extraction
Summary:
Event argument extraction (EAE) identifies event arguments and their specific roles for a given event. Recent advancement in generation-based EAE models has shown great performance and generalizability over classification-based models. However, existing generation-based EAE models mostly focus on problem re-formulation and prompt design, without incorporating additional information that has been shown to be effective for classification-based models, such as the abstract meaning representation (AMR) of the input passages. Incorporating such information into generation-based models is challenging due to the heterogeneous nature of the natural language form prevalently used in generation-based models and the structured form of AMRs. In this work, we study strategies to incorporate AMR into generation-based EAE models. We propose AMPERE, which generates AMR-aware prefixes for every layer of the generation model. Thus, the prefix introduces AMR information to the generation-based EAE model and then improves the generation. We also introduce an adjusted copy mechanism to AMPERE to help overcome potential noises brought by the AMR graph. Comprehensive experiments and analyses on ACE2005 and ERE datasets show that AMPERE can get 4% - 10% absolute F1 score improvements with reduced training data and it is in general powerful across different training sizes.

Title: Automating the Analysis of Institutional Design in International Agreements. (arXiv:2305.16750v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16750
Code URL: null
Copy Paste: [[2305.16750] Automating the Analysis of Institutional Design in International Agreements](http://arxiv.org/abs/2305.16750) #extraction
Summary:
This paper explores the automatic knowledge extraction of formal institutional design - norms, rules, and actors - from international agreements. The focus was to analyze the relationship between the visibility and centrality of actors in the formal institutional design in regulating critical aspects of cultural heritage relations. The developed tool utilizes techniques such as collecting legal documents, annotating them with Institutional Grammar, and using graph analysis to explore the formal institutional design. The system was tested against the 2003 UNESCO Convention for the Safeguarding of the Intangible Cultural Heritage.

membership infer

federate

Title: Parameter-Efficient Fine-Tuning without Introducing New Latency. (arXiv:2305.16742v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16742
Code URL: null
Copy Paste: [[2305.16742] Parameter-Efficient Fine-Tuning without Introducing New Latency](http://arxiv.org/abs/2305.16742) #federate
Summary:
Parameter-efficient fine-tuning (PEFT) of pre-trained language models has recently demonstrated remarkable achievements, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters, and consequently addressing the storage and communication constraints. Nonetheless, various PEFT methods are limited by their inherent characteristics. In the case of sparse fine-tuning, which involves modifying only a small subset of the existing parameters, the selection of fine-tuned parameters is task- and domain-specific, making it unsuitable for federated learning. On the other hand, PEFT methods with adding new parameters typically introduce additional inference latency. In this paper, we demonstrate the feasibility of generating a sparse mask in a task-agnostic manner, wherein all downstream tasks share a common mask. Our approach, which relies solely on the magnitude information of pre-trained parameters, surpasses existing methodologies by a significant margin when evaluated on the GLUE benchmark. Additionally, we introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation, thereby achieving identical inference speed to that of full fine-tuning. Through extensive experiments, our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.

Title: WeiAvg: Federated Learning Model Aggregation Promoting Data Diversity. (arXiv:2305.16351v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16351
Code URL: null
Copy Paste: [[2305.16351] WeiAvg: Federated Learning Model Aggregation Promoting Data Diversity](http://arxiv.org/abs/2305.16351) #federate
Summary:
Federated learning provides a promising privacy-preserving way for utilizing large-scale private edge data from massive Internet-of-Things (IoT) devices. While existing research extensively studied optimizing the learning process, computing efficiency, and communication overhead, one important and often overlooked aspect is that participants contribute predictive knowledge from their data, impacting the quality of the federated models learned. While FedAvg treats each client equally and assigns weight solely based on the number of samples, the diversity of samples on each client could greatly affect the local update performance and the final aggregated model. In this paper, we propose a novel approach to address this issue by introducing a Weighted Averaging (WeiAvg) framework that emphasizes updates from high-diversity clients and diminishes the influence of those from low-diversity clients. Specifically, we introduced a projection-based approximation method to estimate the diversity of client data, instead of the computation of an entropy. We use the approximation because the locally computed entropy may not be transmitted due to excess privacy risk. Extensive experimental results show that WeiAvg converges faster and achieves higher accuracy than the original FedAvg algorithm and FedProx.

Title: Federated Neural Compression Under Heterogeneous Data. (arXiv:2305.16416v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16416
Code URL: null
Copy Paste: [[2305.16416] Federated Neural Compression Under Heterogeneous Data](http://arxiv.org/abs/2305.16416) #federate
Summary:
We discuss a federated learned compression problem, where the goal is to learn a compressor from real-world data which is scattered across clients and may be statistically heterogeneous, yet share a common underlying representation. We propose a distributed source model that encompasses both characteristics, and naturally suggests a compressor architecture that uses analysis and synthesis transforms shared by clients. Inspired by personalized federated learning methods, we employ an entropy model that is personalized to each client. This allows for a global latent space to be learned across clients, and personalized entropy models that adapt to the clients' latent distributions. We show empirically that this strategy outperforms solely local methods, which indicates that learned compression also benefits from a shared global representation in statistically heterogeneous federated settings.

fair

Title: Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event Chains of Children's Fairy Tales. (arXiv:2305.16641v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16641
Code URL: null
Copy Paste: [[2305.16641] Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event Chains of Children's Fairy Tales](http://arxiv.org/abs/2305.16641) #fair
Summary:
Social biases and stereotypes are embedded in our culture in part through their presence in our stories, as evidenced by the rich history of humanities and social science literature analyzing such biases in children stories. Because these analyses are often conducted manually and at a small scale, such investigations can benefit from the use of more recent natural language processing methods that examine social bias in models and data corpora. Our work joins this interdisciplinary effort and makes a unique contribution by taking into account the event narrative structures when analyzing the social bias of stories. We propose a computational pipeline that automatically extracts a story's temporal narrative verb-based event chain for each of its characters as well as character attributes such as gender. We also present a verb-based event annotation scheme that can facilitate bias analysis by including categories such as those that align with traditional stereotypes. Through a case study analyzing gender bias in fairy tales, we demonstrate that our framework can reveal bias in not only the unigram verb-based events in which female and male characters participate but also in the temporal narrative order of such event participation.

interpretability

Title: Prototype-Based Interpretability for Legal Citation Prediction. (arXiv:2305.16490v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16490
Code URL: null
Copy Paste: [[2305.16490] Prototype-Based Interpretability for Legal Citation Prediction](http://arxiv.org/abs/2305.16490) #interpretability
Summary:
Deep learning has made significant progress in the past decade, and demonstrates potential to solve problems with extensive social impact. In high-stakes decision making areas such as law, experts often require interpretability for automatic systems to be utilized in practical settings. In this work, we attempt to address these requirements applied to the important problem of legal citation prediction (LCP). We design the task with parallels to the thought-process of lawyers, i.e., with reference to both precedents and legislative provisions. After initial experimental results, we refine the target citation predictions with the feedback of legal experts. Additionally, we introduce a prototype architecture to add interpretability, achieving strong performance while adhering to decision parameters used by lawyers. Our study builds on and leverages the state-of-the-art language processing models for law, while addressing vital considerations for high-stakes tasks with practical societal impact.

Title: Backpack Language Models. (arXiv:2305.16765v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16765
Code URL: null
Copy Paste: [[2305.16765] Backpack Language Models](http://arxiv.org/abs/2305.16765) #interpretability
Summary:
We present Backpacks: a new neural architecture that marries strong modeling performance with an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination of sense vectors in this sequence. We find that, after training, sense vectors specialize, each encoding a different aspect of a word. We can interpret a sense vector by inspecting its (non-contextual, linear) projection onto the output space, and intervene on these interpretable hooks to change the model's behavior in predictable ways. We train a 170M-parameter Backpack language model on OpenWebText, matching the loss of a GPT-2 small (124Mparameter) Transformer. On lexical similarity evaluations, we find that Backpack sense vectors outperform even a 6B-parameter Transformer LM's word embeddings. Finally, we present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing. For example, we can edit the sense vocabulary to tend more towards a topic, or localize a source of gender bias to a sense vector and globally suppress that sense.

Title: Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues. (arXiv:2305.16798v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16798
Code URL: https://github.com/amzn/user-satisfaction-modeling
Copy Paste: [[2305.16798] Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues](http://arxiv.org/abs/2305.16798) #interpretability
Summary:
User Satisfaction Modeling (USM) is one of the popular choices for task-oriented dialogue systems evaluation, where user satisfaction typically depends on whether the user's task goals were fulfilled by the system. Task-oriented dialogue systems use task schema, which is a set of task attributes, to encode the user's task goals. Existing studies on USM neglect explicitly modeling the user's task goals fulfillment using the task schema. In this paper, we propose SG-USM, a novel schema-guided user satisfaction modeling framework. It explicitly models the degree to which the user's preferences regarding the task attributes are fulfilled by the system for predicting the user's satisfaction level. SG-USM employs a pre-trained language model for encoding dialogue context and task attributes. Further, it employs a fulfillment representation layer for learning how many task attributes have been fulfilled in the dialogue, an importance predictor component for calculating the importance of task attributes. Finally, it predicts the user satisfaction based on task attribute fulfillment and task attribute importance. Experimental results on benchmark datasets (i.e. MWOZ, SGD, ReDial, and JDDC) show that SG-USM consistently outperforms competitive existing methods. Our extensive analysis demonstrates that SG-USM can improve the interpretability of user satisfaction modeling, has good scalability as it can effectively deal with unseen tasks and can also effectively work in low-resource settings by leveraging unlabeled data.

Title: Artificial Intelligence-Based Methods for Precision Medicine: Diabetes Risk Prediction. (arXiv:2305.16346v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16346
Code URL: null
Copy Paste: [[2305.16346] Artificial Intelligence-Based Methods for Precision Medicine: Diabetes Risk Prediction](http://arxiv.org/abs/2305.16346) #interpretability
Summary:
The rising prevalence of type 2 diabetes mellitus (T2DM) necessitates the development of predictive models for T2DM risk assessment. Artificial intelligence (AI) models are being extensively used for this purpose, but a comprehensive review of their advancements and challenges is lacking. This scoping review analyzes existing literature on AI-based models for T2DM risk prediction. Forty studies were included, mainly published in the past four years. Traditional machine learning models were more prevalent than deep learning models. Electronic health records were the most commonly used data source. Unimodal AI models relying on EHR data were prominent, while only a few utilized multimodal models. Both unimodal and multimodal models showed promising performance, with the latter outperforming the former. Internal validation was common, while external validation was limited. Interpretability methods were reported in half of the studies. Few studies reported novel biomarkers, and open-source code availability was limited. This review provides insights into the current state and limitations of AI-based T2DM risk prediction models and highlights challenges for their development and clinical implementation.

explainability

Title: An Experimental Investigation into the Evaluation of Explainability Methods. (arXiv:2305.16361v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16361
Code URL: null
Copy Paste: [[2305.16361] An Experimental Investigation into the Evaluation of Explainability Methods](http://arxiv.org/abs/2305.16361) #explainability
Summary:
EXplainable Artificial Intelligence (XAI) aims to help users to grasp the reasoning behind the predictions of an Artificial Intelligence (AI) system. Many XAI approaches have emerged in recent years. Consequently, a subfield related to the evaluation of XAI methods has gained considerable attention, with the aim to determine which methods provide the best explanation using various approaches and criteria. However, the literature lacks a comparison of the evaluation metrics themselves, that one can use to evaluate XAI methods. This work aims to fill this gap by comparing 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references. Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy. We also demonstrate the significant impact of varying the baseline hyperparameter on the evaluation metric values. Finally, we use dummy methods to assess the reliability of metrics in terms of ranking, pointing out their limitations.

Title: Extending Explainable Boosting Machines to Scientific Image Data. (arXiv:2305.16526v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16526
Code URL: null
Copy Paste: [[2305.16526] Extending Explainable Boosting Machines to Scientific Image Data](http://arxiv.org/abs/2305.16526) #explainability
Summary:
As the deployment of computer vision technology becomes increasingly common in applications of consequence such as medicine or science, the need for explanations of the system output has become a focus of great concern. Unfortunately, many state-of-the-art computer vision models are opaque, making their use challenging from an explanation standpoint, and current approaches to explaining these opaque models have stark limitations and have been the subject of serious criticism. In contrast, Explainable Boosting Machines (EBMs) are a class of models that are easy to interpret and achieve performance on par with the very best-performing models, however, to date EBMs have been limited solely to tabular data. Driven by the pressing need for interpretable models in science, we propose the use of EBMs for scientific image data. Inspired by an important application underpinning the development of quantum technologies, we apply EBMs to cold-atom soliton image data, and, in doing so, demonstrate EBMs for image data for the first time. To tabularize the image data we employ Gabor Wavelet Transform-based techniques that preserve the spatial structure of the data. We show that our approach provides better explanations than other state-of-the-art explainability methods for images.

watermark

diffusion

Title: DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models. (arXiv:2305.16381v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16381
Code URL: null
Copy Paste: [[2305.16381] DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models](http://arxiv.org/abs/2305.16381) #diffusion
Summary:
Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e.g., rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the reward function remains challenging. In this work, we propose using online reinforcement learning (RL) to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward. Our approach, coined DPOK, integrates policy optimization with KL regularization. We conduct an analysis of KL regularization for both RL fine-tuning and supervised fine-tuning. In our experiments, we show that DPOK is generally superior to supervised fine-tuning with respect to both image-text alignment and image quality.

Title: Are Diffusion Models Vision-And-Language Reasoners?. (arXiv:2305.16397v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16397
Code URL: https://github.com/mcgill-nlp/diffusion-itm
Copy Paste: [[2305.16397] Are Diffusion Models Vision-And-Language Reasoners?](http://arxiv.org/abs/2305.16397) #diffusion
Summary:
Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.

Title: Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and Applications. (arXiv:2305.16657v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16657
Code URL: null
Copy Paste: [[2305.16657] Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and Applications](http://arxiv.org/abs/2305.16657) #diffusion
Summary:
With the advent of group equivariant convolutions in deep networks literature, spherical CNNs with $\mathsf{SO}(3)$-equivariant layers have been developed to cope with data that are samples of signals on the sphere $S^2$. One can implicitly obtain $\mathsf{SO}(3)$-equivariant convolutions on $S^2$ with significant efficiency gains by explicitly requiring gauge equivariance w.r.t. $\mathsf{SO}(2)$. In this paper, we build on this fact by introducing a higher order generalization of the gauge equivariant convolution, whose implementation is dubbed a gauge equivariant Volterra network (GEVNet). This allows us to model spatially extended nonlinear interactions within a given receptive field while still maintaining equivariance to global isometries. We prove theoretical results regarding the equivariance and construction of higher order gauge equivariant convolutions. Then, we empirically demonstrate the parameter efficiency of our model, first on computer vision benchmark data (e.g. spherical MNIST), and then in combination with a convolutional kernel network (CKN) on neuroimaging data. In the neuroimaging data experiments, the resulting two-part architecture (CKN + GEVNet) is used to automatically discriminate between patients with Lewy Body Disease (DLB), Alzheimer's Disease (AD) and Parkinson's Disease (PD) from diffusion magnetic resonance images (dMRI). The GEVNet extracts micro-architectural features within each voxel, while the CKN extracts macro-architectural features across voxels. This compound architecture is uniquely poised to exploit the intra- and inter-voxel information contained in the dMRI data, leading to improved performance over the classification results obtained from either of the individual components.

Title: Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models. (arXiv:2305.16807v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16807
Code URL: null
Copy Paste: [[2305.16807] Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models](http://arxiv.org/abs/2305.16807) #diffusion
Summary:
In image editing employing diffusion models, it is crucial to preserve the reconstruction quality of the original image while changing its style. Although existing methods ensure reconstruction quality through optimization, a drawback of these is the significant amount of time required for optimization. In this paper, we propose negative-prompt inversion, a method capable of achieving equivalent reconstruction solely through forward propagation without optimization, thereby enabling much faster editing processes. We experimentally demonstrate that the reconstruction quality of our method is comparable to that of existing methods, allowing for inversion at a resolution of 512 pixels and with 50 sampling steps within approximately 5 seconds, which is more than 30 times faster than null-text inversion. Reduction of the computation time by the proposed method further allows us to use a larger number of sampling steps in diffusion models to improve the reconstruction quality with a moderate increase in computation time.

Title: Improved Visual Story Generation with Adaptive Context Modeling. (arXiv:2305.16811v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16811
Code URL: null
Copy Paste: [[2305.16811] Improved Visual Story Generation with Adaptive Context Modeling](http://arxiv.org/abs/2305.16811) #diffusion
Summary:
Diffusion models developed on top of powerful text-to-image generation models like Stable Diffusion achieve remarkable success in visual story generation. However, the best-performing approach considers historically generated results as flattened memory cells, ignoring the fact that not all preceding images contribute equally to the generation of the characters and scenes at the current stage. To address this, we present a simple method that improves the leading system with adaptive context modeling, which is not only incorporated in the encoder but also adopted as additional guidance in the sampling stage to boost the global consistency of the generated story. We evaluate our model on PororoSV and FlintstonesSV datasets and show that our approach achieves state-of-the-art FID scores on both story visualization and continuation scenarios. We conduct detailed model analysis and show that our model excels at generating semantically consistent images for stories.

Title: Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving. (arXiv:2305.16366v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16366
Code URL: https://github.com/hkunlp/subgoal-theorem-prover
Copy Paste: [[2305.16366] Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving](http://arxiv.org/abs/2305.16366) #diffusion
Summary:
Large language models~(LLMs) present an intriguing avenue of exploration in the domain of formal theorem proving. Nonetheless, the full utilization of these models, particularly in terms of demonstration formatting and organization, remains an underexplored area. In an endeavor to enhance the efficacy of LLMs, we introduce a subgoal-based demonstration learning framework, consisting of two primary elements: Firstly, drawing upon the insights of subgoal learning from the domains of reinforcement learning and robotics, we propose the construction of distinct subgoals for each demonstration example and refine these subgoals in accordance with the pertinent theories of subgoal learning. Secondly, we build upon recent advances in diffusion models to predict the optimal organization, simultaneously addressing two intricate issues that persist within the domain of demonstration organization: subset selection and order determination. Through the integration of subgoal-based learning methodologies, we have successfully increased the prevailing proof accuracy from 38.9\% to 44.3\% on the miniF2F benchmark. Furthermore, the adoption of diffusion models for demonstration organization can lead to an additional enhancement in accuracy to 45.5\%, or a $5\times$ improvement in sampling efficiency compared with the long-standing state-of-the-art method. Our code is available at \url{https://github.com/HKUNLP/subgoal-theorem-prover}.

Title: Confidence-Based Feature Imputation for Graphs with Partially Known Features. (arXiv:2305.16618v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16618
Code URL: https://github.com/daehoum1/pcfi
Copy Paste: [[2305.16618] Confidence-Based Feature Imputation for Graphs with Partially Known Features](http://arxiv.org/abs/2305.16618) #diffusion
Summary:
This paper investigates a missing feature imputation problem for graph learning tasks. Several methods have previously addressed learning tasks on graphs with missing features. However, in cases of high rates of missing features, they were unable to avoid significant performance degradation. To overcome this limitation, we introduce a novel concept of channel-wise confidence in a node feature, which is assigned to each imputed channel feature of a node for reflecting certainty of the imputation. We then design pseudo-confidence using the channel-wise shortest path distance between a missing-feature node and its nearest known-feature node to replace unavailable true confidence in an actual learning process. Based on the pseudo-confidence, we propose a novel feature imputation scheme that performs channel-wise inter-node diffusion and node-wise inter-channel propagation. The scheme can endure even at an exceedingly high missing rate (e.g., 99.5\%) and it achieves state-of-the-art accuracy for both semi-supervised node classification and link prediction on various datasets containing a high rate of missing features. Codes are available at \url{https://github.com/daehoum1/pcfi}.

Title: Graph Neural Convection-Diffusion with Heterophily. (arXiv:2305.16780v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16780
Code URL: https://github.com/zknus/graph-diffusion-cde
Copy Paste: [[2305.16780] Graph Neural Convection-Diffusion with Heterophily](http://arxiv.org/abs/2305.16780) #diffusion
Summary:
Graph neural networks (GNNs) have shown promising results across various graph learning tasks, but they often assume homophily, which can result in poor performance on heterophilic graphs. The connected nodes are likely to be from different classes or have dissimilar features on heterophilic graphs. In this paper, we propose a novel GNN that incorporates the principle of heterophily by modeling the flow of information on nodes using the convection-diffusion equation (CDE). This allows the CDE to take into account both the diffusion of information due to homophily and the ``convection'' of information due to heterophily. We conduct extensive experiments, which suggest that our framework can achieve competitive performance on node classification tasks for heterophilic graphs, compared to the state-of-the-art methods. The code is available at \url{https://github.com/zknus/Graph-Diffusion-CDE}.

noise learning

data-free

transformer

Title: EgoHumans: An Egocentric 3D Multi-Human Benchmark. (arXiv:2305.16487v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16487
Code URL: null
Copy Paste: [[2305.16487] EgoHumans: An Egocentric 3D Multi-Human Benchmark](http://arxiv.org/abs/2305.16487) #transformer
Summary:
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild with annotations to support diverse tasks such as human detection, tracking, 2D/3D pose estimation, and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing soccer, fencing, volleyball, etc. Furthermore, our multi-view setup generates accurate 3D ground truth even under severe or complete occlusion. The dataset consists of more than 125k egocentric images, spanning diverse scenes with a particular focus on challenging and unchoreographed multi-human activities and fast-moving egocentric views. We rigorously evaluate existing state-of-the-art methods and highlight their limitations in the egocentric scenario, specifically on multi-human tracking. To address such limitations, we propose EgoFormer, a novel approach with a multi-stream transformer architecture and explicit 3D spatial reasoning to estimate and track the human pose. EgoFormer significantly outperforms prior art by 13.6% IDF1 and 9.3 HOTA on the EgoHumans dataset.

Title: Image Classification of Stroke Blood Clot Origin using Deep Convolutional Neural Networks and Visual Transformers. (arXiv:2305.16492v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16492
Code URL: null
Copy Paste: [[2305.16492] Image Classification of Stroke Blood Clot Origin using Deep Convolutional Neural Networks and Visual Transformers](http://arxiv.org/abs/2305.16492) #transformer
Summary:
Stroke is one of two main causes of death worldwide. Many individuals suffer from ischemic stroke every year. Only in US more over 700,000 individuals meet ischemic stroke due to blood clot blocking an artery to the brain every year. The paper describes particular approach how to apply Artificial Intelligence for purposes of separating two major acute ischemic stroke (AIS) etiology subtypes: cardiac and large artery atherosclerosis. Four deep neural network architectures and simple ensemble method are used in the approach.

Title: Improving Position Encoding of Transformers for Multivariate Time Series Classification. (arXiv:2305.16642v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16642
Code URL: https://github.com/navidfoumani/convtran
Copy Paste: [[2305.16642] Improving Position Encoding of Transformers for Multivariate Time Series Classification](http://arxiv.org/abs/2305.16642) #transformer
Summary:
Transformers have demonstrated outstanding performance in many applications of deep learning. When applied to time series data, transformers require effective position encoding to capture the ordering of the time series data. The efficacy of position encoding in time series analysis is not well-studied and remains controversial, e.g., whether it is better to inject absolute position encoding or relative position encoding, or a combination of them. In order to clarify this, we first review existing absolute and relative position encoding methods when applied in time series classification. We then proposed a new absolute position encoding method dedicated to time series data called time Absolute Position Encoding (tAPE). Our new method incorporates the series length and input embedding dimension in absolute position encoding. Additionally, we propose computationally Efficient implementation of Relative Position Encoding (eRPE) to improve generalisability for time series. We then propose a novel multivariate time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data. The proposed absolute and relative position encoding methods are simple and efficient. They can be easily integrated into transformer blocks and used for downstream tasks such as forecasting, extrinsic regression, and anomaly detection. Extensive experiments on 32 multivariate time-series datasets show that our model is significantly more accurate than state-of-the-art convolution and transformer-based models. Code and models are open-sourced at \url{https://github.com/Navidfoumani/ConvTran}.

Title: Gender, Smoking History and Age Prediction from Laryngeal Images. (arXiv:2305.16661v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16661
Code URL: null
Copy Paste: [[2305.16661] Gender, Smoking History and Age Prediction from Laryngeal Images](http://arxiv.org/abs/2305.16661) #transformer
Summary:
Flexible laryngoscopy is commonly performed by otolaryngologists to detect laryngeal diseases and to recognize potentially malignant lesions. Recently, researchers have introduced machine learning techniques to facilitate automated diagnosis using laryngeal images and achieved promising results. Diagnostic performance can be improved when patients' demographic information is incorporated into models. However, manual entry of patient data is time consuming for clinicians. In this study, we made the first endeavor to employ deep learning models to predict patient demographic information to improve detector model performance. The overall accuracy for gender, smoking history, and age was 85.5%, 65.2%, and 75.9%, respectively. We also created a new laryngoscopic image set for machine learning study and benchmarked the performance of 8 classical deep learning models based on CNNs and Transformers. The results can be integrated into current learning models to improve their performance by incorporating the patient's demographic information.

Title: Talking with Machines: A Comprehensive Survey of Emergent Dialogue Systems. (arXiv:2305.16324v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16324
Code URL: null
Copy Paste: [[2305.16324] Talking with Machines: A Comprehensive Survey of Emergent Dialogue Systems](http://arxiv.org/abs/2305.16324) #transformer
Summary:
From the earliest experiments in the 20th century to the utilization of large language models and transformers, dialogue systems research has continued to evolve, playing crucial roles in numerous fields. This paper offers a comprehensive review of these systems, tracing their historical development and examining their fundamental operations. We analyze popular and emerging datasets for training and survey key contributions in dialogue systems research, including traditional systems and advanced machine learning methods. Finally, we consider conventional and transformer-based evaluation metrics, followed by a short discussion of prevailing challenges and future prospects in the field.

Title: Think Before You Act: Decision Transformers with Internal Working Memory. (arXiv:2305.16338v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16338
Code URL: null
Copy Paste: [[2305.16338] Think Before You Act: Decision Transformers with Internal Working Memory](http://arxiv.org/abs/2305.16338) #transformer
Summary:
Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.

Title: Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model. (arXiv:2305.16340v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16340
Code URL: null
Copy Paste: [[2305.16340] Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model](http://arxiv.org/abs/2305.16340) #transformer
Summary:
Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments. The information across segments can then be aggregated using neurons with recurrence leveraging their inherent memory. Such an approach leads to models with sequential processing capability at a lower computation/memory cost. To investigate this idea, first, we examine the effects of using local attention mechanism on the individual segments. Then we propose a segmented recurrent transformer (SRformer) that combines segmented attention with recurrent attention. It uses recurrent accumulate and fire (RAF) layers to process information between consecutive segments. The loss caused by reducing the attention window length is compensated by updating the product of keys and values with RAF neurons' inherent recurrence. The segmented attention and lightweight RAF gates ensure the efficiency of the proposed transformer. We apply the proposed method to T5 and BART transformers. The modified models are tested on summarization datasets including CNN-dailymail and XSUM. Notably, using segmented inputs of different sizes, the proposed model achieves 4-19% higher ROUGE1 scores than the segmented transformer baseline. Compared to full attention, the proposed model largely reduces the complexity of cross attention and results in around 40% reduction in computation cost.

Title: InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition. (arXiv:2305.16342v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16342
Code URL: null
Copy Paste: [[2305.16342] InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition](http://arxiv.org/abs/2305.16342) #transformer
Summary:
The local and global features are both essential for automatic speech recognition (ASR). Many recent methods have verified that simply combining local and global features can further promote ASR performance. However, these methods pay less attention to the interaction of local and global features, and their series architectures are rigid to reflect local and global relationships. To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR. Specifically, we combine the convolution block with the transformer block in a parallel design. Besides, we propose a bidirectional feature interaction module (BFIM) and a selective fusion module (SFM) to implement the interaction and fusion of local and global features, respectively. Extensive experiments on public ASR datasets demonstrate the effectiveness of our proposed InterFormer and its superior performance over the other Transformer and Conformer models.

Title: Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer. (arXiv:2305.16380v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16380
Code URL: null
Copy Paste: [[2305.16380] Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer](http://arxiv.org/abs/2305.16380) #transformer
Summary:
Transformer architecture has shown impressive performance in multiple research domains and has become the backbone of many neural network models. However, there is limited understanding on how it works. In particular, with a simple predictive loss, how the representation emerges from the gradient \emph{training dynamics} remains a mystery. In this paper, for 1-layer transformer with one self-attention layer plus one decoder layer, we analyze its SGD training dynamics for the task of next token prediction in a mathematically rigorous manner. We open the black box of the dynamic process of how the self-attention layer combines input tokens, and reveal the nature of underlying inductive bias. More specifically, with the assumption (a) no positional encoding, (b) long input sequence, and (c) the decoder layer learns faster than the self-attention layer, we prove that self-attention acts as a \emph{discriminative scanning algorithm}: starting from uniform attention, it gradually attends more to distinct key tokens for a specific next token to be predicted, and pays less attention to common key tokens that occur across different next tokens. Among distinct tokens, it progressively drops attention weights, following the order of low to high co-occurrence between the key and the query token in the training set. Interestingly, this procedure does not lead to winner-takes-all, but decelerates due to a \emph{phase transition} that is controllable by the learning rates of the two layers, leaving (almost) fixed token combination. We verify this \textbf{\emph{scan and snap}} dynamics on synthetic and real-world data (WikiText).

Title: Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities. (arXiv:2305.16407v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16407
Code URL: null
Copy Paste: [[2305.16407] Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities](http://arxiv.org/abs/2305.16407) #transformer
Summary:
The wide accessibility of social media has provided linguistically under-represented communities with an extraordinary opportunity to create content in their native languages. This, however, comes with certain challenges in script normalization, particularly where the speakers of a language in a bilingual community rely on another script or orthography to write their native language. This paper addresses the problem of script normalization for several such languages that are mainly written in a Perso-Arabic script. Using synthetic data with various levels of noise and a transformer-based model, we demonstrate that the problem can be effectively remediated. We conduct a small-scale evaluation of real data as well. Our experiments indicate that script normalization is also beneficial to improve the performance of downstream tasks such as machine translation and language identification.

Title: Neural Machine Translation for Mathematical Formulae. (arXiv:2305.16433v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16433
Code URL: null
Copy Paste: [[2305.16433] Neural Machine Translation for Mathematical Formulae](http://arxiv.org/abs/2305.16433) #transformer
Summary:
We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages. Compared to neural machine translation on natural language, mathematical formulae have a much smaller vocabulary and much longer sequences of symbols, while their translation requires extreme precision to satisfy mathematical information needs. In this work, we perform the tasks of translating from LaTeX to Mathematica as well as from LaTeX to semantic LaTeX. While recurrent, recursive, and transformer networks struggle with preserving all contained information, we find that convolutional sequence-to-sequence networks achieve 95.1% and 90.7% exact matches, respectively.

Title: Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization. (arXiv:2305.16784v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16784
Code URL: null
Copy Paste: [[2305.16784] Incorporating Distributions of Discourse Structure for Long Document Abstractive Summarization](http://arxiv.org/abs/2305.16784) #transformer
Summary:
For text summarization, the role of discourse structure is pivotal in discerning the core content of a text. Regrettably, prior studies on incorporating Rhetorical Structure Theory (RST) into transformer-based summarization models only consider the nuclearity annotation, thereby overlooking the variety of discourse relation types. This paper introduces the 'RSTformer', a novel summarization model that comprehensively incorporates both the types and uncertainty of rhetorical relations. Our RST-attention mechanism, rooted in document-level rhetorical structure, is an extension of the recently devised Longformer framework. Through rigorous evaluation, the model proposed herein exhibits significant superiority over state-of-the-art models, as evidenced by its notable performance on several automatic metrics and human evaluation.

Title: Calibration of Transformer-based Models for Identifying Stress and Depression in Social Media. (arXiv:2305.16797v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16797
Code URL: null
Copy Paste: [[2305.16797] Calibration of Transformer-based Models for Identifying Stress and Depression in Social Media](http://arxiv.org/abs/2305.16797) #transformer
Summary:
In today's fast-paced world, the rates of stress and depression present a surge. Social media provide assistance for the early detection of mental health conditions. Existing methods mainly introduce feature extraction approaches and train shallow machine learning classifiers. Other researches use deep neural networks or transformers. Despite the fact that transformer-based models achieve noticeable improvements, they cannot often capture rich factual knowledge. Although there have been proposed a number of studies aiming to enhance the pretrained transformer-based models with extra information or additional modalities, no prior work has exploited these modifications for detecting stress and depression through social media. In addition, although the reliability of a machine learning model's confidence in its predictions is critical for high-risk applications, there is no prior work taken into consideration the model calibration. To resolve the above issues, we present the first study in the task of depression and stress detection in social media, which injects extra linguistic information in transformer-based models, namely BERT and MentalBERT. Specifically, the proposed approach employs a Multimodal Adaptation Gate for creating the combined embeddings, which are given as input to a BERT (or MentalBERT) model. For taking into account the model calibration, we apply label smoothing. We test our proposed approaches in three publicly available datasets and demonstrate that the integration of linguistic features into transformer-based models presents a surge in the performance. Also, the usage of label smoothing contributes to both the improvement of the model's performance and the calibration of the model. We finally perform a linguistic analysis of the posts and show differences in language between stressful and non-stressful texts, as well as depressive and non-depressive posts.

Title: Stecformer: Spatio-temporal Encoding Cascaded Transformer for Multivariate Long-term Time Series Forecasting. (arXiv:2305.16370v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16370
Code URL: null
Copy Paste: [[2305.16370] Stecformer: Spatio-temporal Encoding Cascaded Transformer for Multivariate Long-term Time Series Forecasting](http://arxiv.org/abs/2305.16370) #transformer
Summary:
Multivariate long-term time series forecasting is of great application across many domains, such as energy consumption and weather forecasting. With the development of transformer-based methods, the performance of multivariate long-term time series forecasting has been significantly improved, however, the study of spatial features extracting in transformer-based model is rare and the consistency of different prediction periods is unsatisfactory due to the large span. In this work, we propose a complete solution to address these problems in terms of feature extraction and target prediction. For extraction, we design an efficient spatio-temporal encoding extractor including a semi-adaptive graph to acquire sufficient spatio-temporal information. For prediction, we propose a Cascaded Decoding Predictor (CDP) to strengthen the correlation between different intervals, which can also be utilized as a generic component to improve the performance of transformer-based methods. The proposed method, termed as Spatio-temporal Encoding Cascaded Transformer (Stecformer), achieving a notable gap over the baseline model and is comparable with the state-of-the-art performance of transformer-based methods on five benchmark datasets. We hope our attempt will serve as a regular configuration in multivariate long-term time series forecasting in the future.

Title: Emergent Agentic Transformer from Chain of Hindsight Experience. (arXiv:2305.16554v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16554
Code URL: null
Copy Paste: [[2305.16554] Emergent Agentic Transformer from Chain of Hindsight Experience](http://arxiv.org/abs/2305.16554) #transformer
Summary:
Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards. Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion tokens, the resulting model, Agentic Transformer (AT), can learn to improve upon itself both at training and test time. As we show on D4RL and ExoRL benchmarks, to the best our knowledge, this is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches, even from sub-optimal data. Our Agentic Transformer also shows a promising scaling trend that bigger models consistently improve results.

Title: Future-conditioned Unsupervised Pretraining for Decision Transformer. (arXiv:2305.16683v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16683
Code URL: https://github.com/fffffarmer/pdt
Copy Paste: [[2305.16683] Future-conditioned Unsupervised Pretraining for Decision Transformer](http://arxiv.org/abs/2305.16683) #transformer
Summary:
Recent research in offline reinforcement learning (RL) has demonstrated that return-conditioned supervised learning is a powerful paradigm for decision-making problems. While promising, return conditioning is limited to training data labeled with rewards and therefore faces challenges in learning from unsupervised data. In this work, we aim to utilize generalized future conditioning to enable efficient unsupervised pretraining from reward-free and sub-optimal offline data. We propose Pretrained Decision Transformer (PDT), a conceptually simple approach for unsupervised RL pretraining. PDT leverages future trajectory information as a privileged context to predict actions during training. The ability to make decisions based on both present and future factors enhances PDT's capability for generalization. Besides, this feature can be easily incorporated into a return-conditioned framework for online finetuning, by assigning return values to possible futures and sampling future embeddings based on their respective values. Empirically, PDT outperforms or performs on par with its supervised pretraining counterpart, especially when dealing with sub-optimal data. Further analysis reveals that PDT can extract diverse behaviors from offline data and controllably sample high-return behaviors by online finetuning. Code is available at here.

Title: A Closer Look at In-Context Learning under Distribution Shifts. (arXiv:2305.16704v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16704
Code URL: https://github.com/facebookresearch/iclmlp
Copy Paste: [[2305.16704] A Closer Look at In-Context Learning under Distribution Shifts](http://arxiv.org/abs/2305.16704) #transformer
Summary:
In-context learning, a capability that enables a model to learn from input examples on the fly without necessitating weight updates, is a defining characteristic of large language models. In this work, we follow the setting proposed in (Garg et al., 2022) to better understand the generality and limitations of in-context learning from the lens of the simple yet fundamental task of linear regression. The key question we aim to address is: Are transformers more adept than some natural and simpler architectures at performing in-context learning under varying distribution shifts? To compare transformers, we propose to use a simple architecture based on set-based Multi-Layer Perceptrons (MLPs). We find that both transformers and set-based MLPs exhibit in-context learning under in-distribution evaluations, but transformers more closely emulate the performance of ordinary least squares (OLS). Transformers also display better resilience to mild distribution shifts, where set-based MLPs falter. However, under severe distribution shifts, both models' in-context learning abilities diminish.

generative

Title: Prompt Evolution for Generative AI: A Classifier-Guided Approach. (arXiv:2305.16347v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16347
Code URL: null
Copy Paste: [[2305.16347] Prompt Evolution for Generative AI: A Classifier-Guided Approach](http://arxiv.org/abs/2305.16347) #generative
Summary:
Synthesis of digital artifacts conditioned on user prompts has become an important paradigm facilitating an explosion of use cases with generative AI. However, such models often fail to connect the generated outputs and desired target concepts/preferences implied by the prompts. Current research addressing this limitation has largely focused on enhancing the prompts before output generation or improving the model's performance up front. In contrast, this paper conceptualizes prompt evolution, imparting evolutionary selection pressure and variation during the generative process to produce multiple outputs that satisfy the target concepts/preferences better. We propose a multi-objective instantiation of this broader idea that uses a multi-label image classifier-guided approach. The predicted labels from the classifiers serve as multiple objectives to optimize, with the aim of producing diversified images that meet user preferences. A novelty of our evolutionary algorithm is that the pre-trained generative model gives us implicit mutation operations, leveraging the model's stochastic generative capability to automate the creation of Pareto-optimized images more faithful to user preferences.

Title: EDM3: Event Detection as Multi-task Text Generation. (arXiv:2305.16357v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16357
Code URL: https://github.com/ujjwalaananth/edm3_eventdetection
Copy Paste: [[2305.16357] EDM3: Event Detection as Multi-task Text Generation](http://arxiv.org/abs/2305.16357) #generative
Summary:
Event detection refers to identifying event occurrences in a text and comprises of two subtasks; event identification and classification. We present EDM3, a novel approach for Event Detection that formulates three generative tasks: identification, classification, and combined detection. We show that EDM3 helps to learn transferable knowledge that can be leveraged to perform Event Detection and its subtasks concurrently, mitigating the error propagation inherent in pipelined approaches. Unlike previous dataset- or domain-specific approaches, EDM3 utilizes the existing knowledge of language models, allowing it to be trained over any classification schema. We evaluate EDM3 on multiple event detection datasets: RAMS, WikiEvents, MAVEN, and MLEE, showing that EDM3 outperforms 1) single-task performance by 8.4% on average and 2) multi-task performance without instructional prompts by 2.4% on average. We obtain SOTA results on RAMS (71.3% vs. 65.1% F-1) and competitive performance on other datasets. We analyze our approach to demonstrate its efficacy in low-resource and multi-sentence settings. We also show the effectiveness of this approach on non-standard event configurations such as multi-word and multi-class event triggers. Overall, our results show that EDM3 is a promising approach for Event Detection that has the potential for real-world applications.

Title: NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery. (arXiv:2305.16598v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16598
Code URL: null
Copy Paste: [[2305.16598] NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery](http://arxiv.org/abs/2305.16598) #generative
Summary:
Norms, which are culturally accepted guidelines for behaviours, can be integrated into conversational models to generate utterances that are appropriate for the socio-cultural context. Existing methods for norm recognition tend to focus only on surface-level features of dialogues and do not take into account the interactions within a conversation. To address this issue, we propose NormMark, a probabilistic generative Markov model to carry the latent features throughout a dialogue. These features are captured by discrete and continuous latent variables conditioned on the conversation history, and improve the model's ability in norm recognition. The model is trainable on weakly annotated data using the variational technique. On a dataset with limited norm annotations, we show that our approach achieves higher F1 score, outperforming current state-of-the-art methods, including GPT3.

Title: Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks. (arXiv:2305.16633v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16633
Code URL: https://github.com/gtfintechlab/zero-shot-finance
Copy Paste: [[2305.16633] Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks](http://arxiv.org/abs/2305.16633) #generative
Summary:
Recently large language models (LLMs) like ChatGPT have shown impressive performance on many natural language processing tasks with zero-shot. In this paper, we investigate the effectiveness of zero-shot LLMs in the financial domain. We compare the performance of ChatGPT along with some open-source generative LLMs in zero-shot mode with RoBERTa fine-tuned on annotated data. We address three inter-related research questions on data annotation, performance gaps, and the feasibility of employing generative models in the finance domain. Our findings demonstrate that ChatGPT performs well even without labeled data but fine-tuned models generally outperform it. Our research also highlights how annotating with generative models can be time-intensive. Our codebase is publicly available on GitHub under CC BY-NC 4.0 license.

Title: Multiview Identifiers Enhanced Generative Retrieval. (arXiv:2305.16675v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16675
Code URL: https://github.com/liyongqi67/minder
Copy Paste: [[2305.16675] Multiview Identifiers Enhanced Generative Retrieval](http://arxiv.org/abs/2305.16675) #generative
Summary:
Instead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.

Title: Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance. (arXiv:2305.16363v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16363
Code URL: null
Copy Paste: [[2305.16363] Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance](http://arxiv.org/abs/2305.16363) #generative
Summary:
Electronic health records (EHR) often contain different rates of representation of certain subpopulations (SP). Factors like patient demographics, clinical condition prevalence, and medical center type contribute to this underrepresentation. Consequently, when training machine learning models on such datasets, the models struggle to generalize well and perform poorly on underrepresented SPs. To address this issue, we propose a novel ensemble framework that utilizes generative models. Specifically, we train a GAN-based synthetic data generator for each SP and incorporate synthetic samples into each SP training set. Ultimately, we train SP-specific prediction models. To properly evaluate this method, we design an evaluation pipeline with 2 real-world use case datasets, queried from the MIMIC database. Our approach shows increased model performance over underrepresented SPs. Our code and models are given as supplementary and will be made available on a public repository.

Title: The Representation Jensen-Shannon Divergence. (arXiv:2305.16446v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16446
Code URL: https://github.com/uk-cliplab/representationjsd
Copy Paste: [[2305.16446] The Representation Jensen-Shannon Divergence](http://arxiv.org/abs/2305.16446) #generative
Summary:
Statistical divergences quantify the difference between probability distributions finding multiple uses in machine-learning. However, a fundamental challenge is to estimate divergence from empirical samples since the underlying distributions of the data are usually unknown. In this work, we propose the representation Jensen-Shannon Divergence, a novel divergence based on covariance operators in reproducing kernel Hilbert spaces (RKHS). Our approach embeds the data distributions in an RKHS and exploits the spectrum of the covariance operators of the representations. We provide an estimator from empirical covariance matrices by explicitly mapping the data to an RKHS using Fourier features. This estimator is flexible, scalable, differentiable, and suitable for minibatch-based optimization problems. Additionally, we provide an estimator based on kernel matrices without having an explicit mapping to the RKHS. We show that this quantity is a lower bound on the Jensen-Shannon divergence, and we propose a variational approach to estimate it. We applied our divergence to two-sample testing outperforming related state-of-the-art techniques in several datasets. We used the representation Jensen-Shannon divergence as a cost function to train generative adversarial networks which intrinsically avoids mode collapse and encourages diversity.

Title: Evaluating generation of chaotic time series by convolutional generative adversarial networks. (arXiv:2305.16729v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16729
Code URL: https://github.com/yymgch/chaosgan
Copy Paste: [[2305.16729] Evaluating generation of chaotic time series by convolutional generative adversarial networks](http://arxiv.org/abs/2305.16729) #generative
Summary:
To understand the ability and limitations of convolutional neural networks to generate time series that mimic complex temporal signals, we trained a generative adversarial network consisting of deep convolutional networks to generate chaotic time series and used nonlinear time series analysis to evaluate the generated time series. A numerical measure of determinism and the Lyapunov exponent, a measure of trajectory instability, showed that the generated time series well reproduce the chaotic properties of the original time series. However, error distribution analyses showed that large errors appeared at a low but non-negligible rate. Such errors would not be expected if the distribution were assumed to be exponential.

large language model

Title: PandaGPT: One Model To Instruction-Follow Them All. (arXiv:2305.16355v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16355
Code URL: null
Copy Paste: [[2305.16355] PandaGPT: One Model To Instruction-Follow Them All](http://arxiv.org/abs/2305.16355) #large language model
Summary:
We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do. Our project page is at https://panda-gpt.github.io/.

Title: Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations. (arXiv:2305.16326v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16326
Code URL: null
Copy Paste: [[2305.16326] Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations](http://arxiv.org/abs/2305.16326) #large language model
Summary:
Biomedical literature is growing rapidly, making it challenging to curate and extract knowledge manually. Biomedical natural language processing (BioNLP) techniques that can automatically extract information from biomedical literature help alleviate this burden. Recently, large Language Models (LLMs), such as GPT-3 and GPT-4, have gained significant attention for their impressive performance. However, their effectiveness in BioNLP tasks and impact on method development and downstream users remain understudied. This pilot study (1) establishes the baseline performance of GPT-3 and GPT-4 at both zero-shot and one-shot settings in eight BioNLP datasets across four applications: named entity recognition, relation extraction, multi-label document classification, and semantic similarity and reasoning, (2) examines the errors produced by the LLMs and categorized the errors into three types: missingness, inconsistencies, and unwanted artificial content, and (3) provides suggestions for using LLMs in BioNLP applications. We make the datasets, baselines, and results publicly available to the community via https://github.com/qingyu-qc/gpt_bionlp_benchmark.

Title: Semantic Composition in Visually Grounded Language Models. (arXiv:2305.16328v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16328
Code URL: null
Copy Paste: [[2305.16328] Semantic Composition in Visually Grounded Language Models](http://arxiv.org/abs/2305.16328) #large language model
Summary:
What is sentence meaning and its ideal representation? Much of the expressive power of human language derives from semantic composition, the mind's ability to represent meaning hierarchically & relationally over constituents. At the same time, much sentential meaning is outside the text and requires grounding in sensory, motor, and experiential modalities to be adequately learned. Although large language models display considerable compositional ability, recent work shows that visually-grounded language models drastically fail to represent compositional structure. In this thesis, we explore whether & how models compose visually grounded semantics, and how we might improve their ability to do so.

Specifically, we introduce 1) WinogroundVQA, a new compositional visual question answering benchmark, 2) Syntactic Neural Module Distillation, a measure of compositional ability in sentence embedding models, 3) Causal Tracing for Image Captioning Models to locate neural representations vital for vision-language composition, 4) Syntactic MeanPool to inject a compositional inductive bias into sentence embeddings, and 5) Cross-modal Attention Congruence Regularization, a self-supervised objective function for vision-language relation alignment. We close by discussing connections of our work to neuroscience, psycholinguistics, formal semantics, and philosophy.

Title: OlaGPT: Empowering LLMs With Human-like Problem-Solving Abilities. (arXiv:2305.16334v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16334
Code URL: null
Copy Paste: [[2305.16334] OlaGPT: Empowering LLMs With Human-like Problem-Solving Abilities](http://arxiv.org/abs/2305.16334) #large language model
Summary:
In most current research, large language models (LLMs) are able to perform reasoning tasks by generating chains of thought through the guidance of specific prompts. However, there still exists a significant discrepancy between their capability in solving complex reasoning problems and that of humans. At present, most approaches focus on chains of thought (COT) and tool use, without considering the adoption and application of human cognitive frameworks. It is well-known that when confronting complex reasoning challenges, humans typically employ various cognitive abilities, and necessitate interaction with all aspects of tools, knowledge, and the external environment information to accomplish intricate tasks. This paper introduces a novel intelligent framework, referred to as OlaGPT. OlaGPT carefully studied a cognitive architecture framework, and propose to simulate certain aspects of human cognition. The framework involves approximating different cognitive modules, including attention, memory, reasoning, learning, and corresponding scheduling and decision-making mechanisms. Inspired by the active learning mechanism of human beings, it proposes a learning unit to record previous mistakes and expert opinions, and dynamically refer to them to strengthen their ability to solve similar problems. The paper also outlines common effective reasoning frameworks for human problem-solving and designs Chain-of-Thought (COT) templates accordingly. A comprehensive decision-making mechanism is also proposed to maximize model accuracy. The efficacy of OlaGPT has been stringently evaluated on multiple reasoning datasets, and the experimental outcomes reveal that OlaGPT surpasses state-of-the-art benchmarks, demonstrating its superior performance. Our implementation of OlaGPT is available on GitHub: \url{https://github.com/oladata-team/OlaGPT}.

Title: Don't Trust GPT When Your Question Is Not In English. (arXiv:2305.16339v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16339
Code URL: null
Copy Paste: [[2305.16339] Don't Trust GPT When Your Question Is Not In English](http://arxiv.org/abs/2305.16339) #large language model
Summary:
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities and have excelled in a variety of natural language processing (NLP)tasks in recent years. Despite the fact that most LLMs are trained predominantly in English, multiple studies have demonstrated their comparative performance in many other languages. However, fundamental questions persist regarding how LLMs acquire their multi-lingual abilities and how performance varies across different languages. These inquiries are crucial for the study of LLMs since users and researchers often come from diverse language backgrounds, potentially influencing their utilization and interpretation of LLMs' results. In this work, we propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings. We investigate the phenomenon of across-language generalizations in LLMs, wherein insufficient multi-lingual training data leads to advanced multi-lingual capabilities. To accomplish this, we employ a novel back-translation-based prompting method. The results show that GPT exhibits highly translating-like behaviour in multilingual settings.

Title: Role-Play with Large Language Models. (arXiv:2305.16367v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16367
Code URL: null
Copy Paste: [[2305.16367] Role-Play with Large Language Models](http://arxiv.org/abs/2305.16367) #large language model
Summary:
As dialogue agents become increasingly human-like in their performance, it is imperative that we develop effective ways to describe their behaviour in high-level terms without falling into the trap of anthropomorphism. In this paper, we foreground the concept of role-play. Casting dialogue agent behaviour in terms of role-play allows us to draw on familiar folk psychological terms, without ascribing human characteristics to language models they in fact lack. Two important cases of dialogue agent behaviour are addressed this way, namely (apparent) deception and (apparent) self-awareness.

Title: On the Tool Manipulation Capability of Open-source Large Language Models. (arXiv:2305.16504v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16504
Code URL: null
Copy Paste: [[2305.16504] On the Tool Manipulation Capability of Open-source Large Language Models](http://arxiv.org/abs/2305.16504) #large language model
Summary:
Recent studies on software tool manipulation with large language models (LLMs) mostly rely on closed model APIs. The industrial adoption of these models is substantially constrained due to the security and robustness risks in exposing information to closed LLM API services. In this paper, we ask can we enhance open-source LLMs to be competitive to leading closed LLM APIs in tool manipulation, with practical amount of human supervision. By analyzing common tool manipulation failures, we first demonstrate that open-source LLMs may require training with usage examples, in-context demonstration and generation style regulation to resolve failures. These insights motivate us to revisit classical methods in LLM literature, and demonstrate that we can adapt them as model alignment with programmatic data generation, system prompts and in-context demonstration retrievers to enhance open-source LLMs for tool manipulation. To evaluate these techniques, we create the ToolBench, a tool manipulation benchmark consisting of diverse software tools for real-world tasks. We demonstrate that our techniques can boost leading open-source LLMs by up to 90% success rate, showing capabilities competitive to OpenAI GPT-4 in 4 out of 8 ToolBench tasks. We show that such enhancement typically requires about one developer day to curate data for each tool, rendering a recipe with practical amount of human supervision.

Title: The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in Open-domain Conversational Question Answering. (arXiv:2305.16519v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16519
Code URL: null
Copy Paste: [[2305.16519] The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in Open-domain Conversational Question Answering](http://arxiv.org/abs/2305.16519) #large language model
Summary:
Large language models are known to produce output which sounds fluent and convincing, but is also often wrong, e.g. "unfaithful" with respect to a rationale as retrieved from a knowledge base. In this paper, we show that task-based systems which exhibit certain advanced linguistic dialog behaviors, such as lexical alignment (repeating what the user said), are in fact preferred and trusted more, whereas other phenomena, such as pronouns and ellipsis are dis-preferred. We use open-domain question answering systems as our test-bed for task based dialog generation and compare several open- and closed-book models. Our results highlight the danger of systems that appear to be trustworthy by parroting user input while providing an unfaithful response.

Title: Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models. (arXiv:2305.16582v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16582
Code URL: null
Copy Paste: [[2305.16582] Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models](http://arxiv.org/abs/2305.16582) #large language model
Summary:
With the widespread use of large language models (LLMs) in NLP tasks, researchers have discovered the potential of Chain-of-thought (CoT) to assist LLMs in accomplishing complex reasoning tasks by generating intermediate steps. However, human thought processes are often non-linear, rather than simply sequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph. By representing thought units as nodes and connections between them as edges, our approach captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes. Similar to Multimodal-CoT, we modeled GoT reasoning as a two-stage framework, generating rationales first and then producing the final answer. Specifically, we employ an additional graph-of-thoughts encoder for GoT representation learning and fuse the GoT representation with the original input representation through a gated fusion mechanism. We implement a GoT reasoning model on the T5 pre-trained model and evaluate its performance on a text-only reasoning task (GSM8K) and a multimodal reasoning task (ScienceQA). Our model achieves significant improvement over the strong CoT baseline with 3.41% and 5.08% on the GSM8K test set with T5-base and T5-large architectures, respectively. Additionally, our model boosts accuracy from 84.91% to 91.54% using the T5-base model and from 91.68% to 92.77% using the T5-large model over the state-of-the-art Multimodal-CoT on the ScienceQA test set. Experiments have shown that GoT achieves comparable results to Multimodal-CoT(large) with over 700M parameters, despite having fewer than 250M backbone model parameters, demonstrating the effectiveness of GoT.

Title: Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning. (arXiv:2305.16646v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16646
Code URL: https://github.com/ilampard/ep_llm
Copy Paste: [[2305.16646] Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning](http://arxiv.org/abs/2305.16646) #large language model
Summary:
Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction accuracy of event sequence models. We design a modeling and prediction framework where a large language model performs abductive reasoning to assist an event sequence model: the event model proposes predictions on future events given the past; instructed by a few expert-annotated demonstrations, the language model learns to suggest possible causes for each proposal; a search module finds out the previous events that match the causes; a scoring function learns to examine whether the retrieved events could actually cause the proposal. Through extensive experiments on two challenging real-world datasets (Amazon Review and GDELT), we demonstrate that our framework -- thanks to the reasoning ability of language models -- could significantly outperform the state-of-the-art event sequence models.

Title: AdaPlanner: Adaptive Planning from Feedback with Language Models. (arXiv:2305.16653v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16653
Code URL: https://github.com/haotiansun14/adaplanner
Copy Paste: [[2305.16653] AdaPlanner: Adaptive Planning from Feedback with Language Models](http://arxiv.org/abs/2305.16653) #large language model
Summary:
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily without planning or rely on static plans that are not adaptable to environmental feedback. Consequently, the sequential decision-making performance of LLM agents degenerates with problem complexity and plan horizons increase. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. In AdaPlanner, the LLM agent adaptively refines its plan from feedback with both in-plan and out-of-plan refinement strategies. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities. Furthermore, we propose a skill discovery mechanism that leverages successful plans as few-shot exemplars, enabling the agent to plan and refine with fewer task demonstrations. Our experiments in the ALFWorld and MiniWoB++ environments demonstrate that AdaPlanner outperforms state-of-the-art baselines by 3.73% and 4.11% while utilizing 2x and 600x fewer samples, respectively.

Title: Can large language models generate salient negative statements?. (arXiv:2305.16755v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16755
Code URL: null
Copy Paste: [[2305.16755] Can large language models generate salient negative statements?](http://arxiv.org/abs/2305.16755) #large language model
Summary:
We examine the ability of large language models (LLMs) to generate salient (interesting) negative statements about real-world entities; an emerging research topic of the last few years. We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation, i.e., pattern-based textual extractions and knowledge-graph-based inferences, as well as crowdsourced gold statements. We measure the correctness and salience of the generated lists about subjects from different domains. Our evaluation shows that guided probes do in fact improve the quality of generated negatives, compared to the zero-shot variant. Nevertheless, using both prompts, LLMs still struggle with the notion of factuality of negatives, frequently generating many ambiguous statements, or statements with negative keywords but a positive meaning.

Title: Leveraging Domain Knowledge for Inclusive and Bias-aware Humanitarian Response Entry Classification. (arXiv:2305.16756v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16756
Code URL: null
Copy Paste: [[2305.16756] Leveraging Domain Knowledge for Inclusive and Bias-aware Humanitarian Response Entry Classification](http://arxiv.org/abs/2305.16756) #large language model
Summary:
Accurate and rapid situation analysis during humanitarian crises is critical to delivering humanitarian aid efficiently and is fundamental to humanitarian imperatives and the Leave No One Behind (LNOB) principle. This data analysis can highly benefit from language processing systems, e.g., by classifying the text data according to a humanitarian ontology. However, approaching this by simply fine-tuning a generic large language model (LLM) involves considerable practical and ethical issues, particularly the lack of effectiveness on data-sparse and complex subdomains, and the encoding of societal biases and unwanted associations. In this work, we aim to provide an effective and ethically-aware system for humanitarian data analysis. We approach this by (1) introducing a novel architecture adjusted to the humanitarian analysis framework, (2) creating and releasing a novel humanitarian-specific LLM called HumBert, and (3) proposing a systematic way to measure and mitigate biases. Our experiments' results show the better performance of our approach on zero-shot and full-training settings in comparison with strong baseline models, while also revealing the existence of biases in the resulting LLMs. Utilizing a targeted counterfactual data augmentation approach, we significantly reduce these biases without compromising performance.

Title: Do GPTs Produce Less Literal Translations?. (arXiv:2305.16806v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2305.16806
Code URL: null
Copy Paste: [[2305.16806] Do GPTs Produce Less Literal Translations?](http://arxiv.org/abs/2305.16806) #large language model
Summary:
Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs. However, there has been relatively little investigation on how such translations differ qualitatively from the translations generated by standard Neural Machine Translation (NMT) models. In this work, we investigate these differences in terms of the literalness of translations produced by the two systems. Using literalness measures involving word alignment and monotonicity, we find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on MT quality metrics. We demonstrate that this finding is borne out in human evaluations as well. We then show that these differences are especially pronounced when translating sentences that contain idiomatic expressions.

segmentation

Title: GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds. (arXiv:2305.16404v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16404
Code URL: https://github.com/vlar-group/growsp
Copy Paste: [[2305.16404] GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds](http://arxiv.org/abs/2305.16404) #segmentation
Summary:
We study the problem of 3D semantic segmentation from raw point clouds. Unlike existing methods which primarily rely on a large amount of human annotations for training neural networks, we propose the first purely unsupervised method, called GrowSP, to successfully identify complex semantic classes for every point in 3D scenes, without needing any type of human labels or pretrained models. The key to our approach is to discover 3D semantic elements via progressive growing of superpoints. Our method consists of three major components, 1) the feature extractor to learn per-point features from input point clouds, 2) the superpoint constructor to progressively grow the sizes of superpoints, and 3) the semantic primitive clustering module to group superpoints into semantic elements for the final semantic segmentation. We extensively evaluate our method on multiple datasets, demonstrating superior performance over all unsupervised baselines and approaching the classic fully-supervised PointNet. We hope our work could inspire more advanced methods for unsupervised 3D semantic learning.

Title: Detect Any Shadow: Segment Anything for Video Shadow Detection. (arXiv:2305.16698v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16698
Code URL: null
Copy Paste: [[2305.16698] Detect Any Shadow: Segment Anything for Video Shadow Detection](http://arxiv.org/abs/2305.16698) #segmentation
Summary:
Segment anything model (SAM) has achieved great success in the field of natural image segmentation. Nevertheless, SAM tends to classify shadows as background, resulting in poor segmentation performance for shadow detection task. In this paper, we propose an simple but effective approach for fine tuning SAM to detect shadows. Additionally, we also combine it with long short-term attention mechanism to extend its capabilities to video shadow detection. Specifically, we first fine tune SAM by utilizing shadow data combined with sparse prompts and apply the fine-tuned model to detect a specific frame (e.g., first frame) in the video with a little user assistance. Subsequently, using the detected frame as a reference, we employ a long short-term network to learn spatial correlations between distant frames and temporal consistency between contiguous frames, thereby achieving shadow information propagation across frames. Extensive experimental results demonstrate that our method outperforms the state-of-the-art techniques, with improvements of 17.2% and 3.3% in terms of MAE and IoU, respectively, validating the effectiveness of our method.

Title: Towards Open-World Segmentation of Parts. (arXiv:2305.16804v1 [cs.CV])

Paper URL: http://arxiv.org/abs/2305.16804
Code URL: https://github.com/tydpan/openpartseg
Copy Paste: [[2305.16804] Towards Open-World Segmentation of Parts](http://arxiv.org/abs/2305.16804) #segmentation
Summary:
Segmenting object parts such as cup handles and animal bodies is important in many real-world applications but requires more annotation effort. The largest dataset nowadays contains merely two hundred object categories, implying the difficulty to scale up part segmentation to an unconstrained setting. To address this, we propose to explore a seemingly simplified but empirically useful and scalable task, class-agnostic part segmentation. In this problem, we disregard the part class labels in training and instead treat all of them as a single part class. We argue and demonstrate that models trained without part classes can better localize parts and segment them on objects unseen in training. We then present two further improvements. First, we propose to make the model object-aware, leveraging the fact that parts are "compositions", whose extents are bounded by the corresponding objects and whose appearances are by nature not independent but bundled. Second, we introduce a novel approach to improve part segmentation on unseen objects, inspired by an interesting finding -- for unseen objects, the pixel-wise features extracted by the model often reveal high-quality part segments. To this end, we propose a novel self-supervised procedure that iterates between pixel clustering and supervised contrastive learning that pulls pixels closer or pushes them away. Via extensive experiments on PartImageNet and Pascal-Part, we show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.

Title: Support Vector Machine Guided Reproducing Kernel Particle Method for Image-Based Modeling of Microstructures. (arXiv:2305.16402v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2305.16402
Code URL: null
Copy Paste: [[2305.16402] Support Vector Machine Guided Reproducing Kernel Particle Method for Image-Based Modeling of Microstructures](http://arxiv.org/abs/2305.16402) #segmentation
Summary:
This work presents an approach for automating the discretization and approximation procedures in constructing digital representations of composites from Micro-CT images featuring intricate microstructures. The proposed method is guided by the Support Vector Machine (SVM) classification, offering an effective approach for discretizing microstructural images. An SVM soft margin training process is introduced as a classification of heterogeneous material points, and image segmentation is accomplished by identifying support vectors through a local regularized optimization problem. In addition, an Interface-Modified Reproducing Kernel Particle Method (IM-RKPM) is proposed for appropriate approximations of weak discontinuities across material interfaces. The proposed method modifies the smooth kernel functions with a regularized heavy-side function concerning the material interfaces to alleviate Gibb's oscillations. This IM-RKPM is formulated without introducing duplicated degrees of freedom associated with the interface nodes commonly needed in the conventional treatments of weak discontinuities in the meshfree methods. Moreover, IM-RKPM can be implemented with various domain integration techniques, such as Stabilized Conforming Nodal Integration (SCNI). The extension of the proposed method to 3-dimension is straightforward, and the effectiveness of the proposed method is validated through the image-based modeling of polymer-ceramic composite microstructures.