2023-12-22

language model

Title: DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines. (arXiv:2312.13382v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13382
Code URL: https://github.com/stanfordnlp/dspy
Copy Paste: [[2312.13382]] DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines(http://arxiv.org/abs/2312.13382)
Summary:
Chaining language model (LM) calls as composable modules is fueling a new powerful way of programming. However, ensuring that LMs adhere to important constraints remains a key challenge, one often addressed with heuristic "prompt engineering". We introduce LM Assertions, a new programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with arbitrary LM Assertions into systems that are more reliable and more accurate. In DSPy, LM Assertions can be integrated at compile time, via automatic prompt optimization, and/or at inference time, via automatic selfrefinement and backtracking. We report on two early case studies for complex question answering (QA), in which the LM program must iteratively retrieve information in multiple hops and synthesize a long-form answer with citations. We find that LM Assertions improve not only compliance with imposed rules and guidelines but also enhance downstream task performance, delivering intrinsic and extrinsic gains up to 35.7% and 13.3%, respectively. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspy

Title: The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction. (arXiv:2312.13558v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13558
Code URL: https://github.com/pratyushasharma/laser
Copy Paste: [[2312.13558]] The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction(http://arxiv.org/abs/2312.13558)
Summary:
Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. Correspondingly, significant resources are allocated towards research that aims to further advance this technology, typically resulting in models of increasing size that are trained on increasing amounts of data. This work, however, demonstrates the surprising result that it is often possible to significantly improve the performance of LLMs by selectively removing higher-order components of their weight matrices. This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed, and requires no additional parameters or data. We show extensive experiments demonstrating the generality of this finding across language models and datasets, and provide in-depth analyses offering insights into both when LASER is effective and the mechanism by which it operates.

Title: On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning. (arXiv:2312.13772v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13772
Code URL: https://github.com/cambridgeltl/ensembled-sicl
Copy Paste: [[2312.13772]] On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning(http://arxiv.org/abs/2312.13772)
Summary:
Following the standard supervised fine-tuning (SFT) paradigm, in-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs), yielding promising performance across various tasks in few-shot data setups. However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration), especially in such limited data setups. In this work, we deliver an in-depth analysis of the behavior across different choices of learning methods from the perspective of both performance and calibration, as well as their interplay. Through extensive controlled experiments, we find that simultaneous gains for both task performance and calibration are difficult to achieve, and the problem of miscalibration exists across all learning methods in low-resource scenarios.To address this challenging trade-off between performance and calibration, we then investigate the potential of self-ensembling techniques applied at different modeling stages (e.g., variations of in-context examples or variations in prompts or different ensembling strategies). We justify the feasibility of self-ensembling on SFT in addition to ICL, to make the predictions more calibrated and have comparable or even better performance. Our work sheds light on which learning paradigm to choose and how to enhance both task performance and calibration of LLMs.

Title: Typhoon: Thai Large Language Models. (arXiv:2312.13951v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13951
Code URL: null
Copy Paste: [[2312.13951]] Typhoon: Thai Large Language Models(http://arxiv.org/abs/2312.13951)
Summary:
Typhoon is a series of Thai large language models (LLMs) developed specifically for the Thai language. This technical report presents challenges and insights in developing Thai LLMs, including data preparation, pretraining, instruction-tuning, and evaluation. As one of the challenges of low-resource languages is the amount of pretraining data, we apply continual training to transfer existing world knowledge from a strong LLM. To evaluate the Thai knowledge encapsulated in each model from the pretraining stage, we develop ThaiExam, a benchmark based on examinations for high-school students and investment professionals in Thailand. In addition, we fine-tune Typhoon to follow Thai instructions, and we evaluate instruction-tuned models on Thai instruction datasets as well as translation, summarization, and question-answering tasks. Experimental results on a suite of Thai benchmarks show that Typhoon outperforms all open-source Thai language models, and its performance is on par with GPT-3.5 in Thai while having only 7 billion parameters and being 2.62 times more efficient in tokenizing Thai text.

Title: Time is Encoded in the Weights of Finetuned Language Models. (arXiv:2312.13401v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13401
Code URL: null
Copy Paste: [[2312.13401]] Time is Encoded in the Weights of Finetuned Language Models(http://arxiv.org/abs/2312.13401)
Summary:
We present time vectors, a simple tool to customize language models to new time periods. Time vectors are created by finetuning a language model on data from a single time (e.g., a year or month), and then subtracting the weights of the original pretrained model. This vector specifies a direction in weight space that, as our experiments show, improves performance on text from that time period. Time vectors specialized to adjacent time periods appear to be positioned closer together in a manifold. Using this structure, we interpolate between time vectors to induce new models that perform better on intervening and future time periods, without any additional training. We demonstrate the consistency of our findings across different tasks, domains, model sizes, and time scales. Our results suggest that time is encoded in the weight space of finetuned models.

Title: Developing Interactive Tourism Planning: A Dialogue Robot System Powered by a Large Language Mode. (arXiv:2312.13545v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13545
Code URL: null
Copy Paste: [[2312.13545]] Developing Interactive Tourism Planning: A Dialogue Robot System Powered by a Large Language Mode(http://arxiv.org/abs/2312.13545)
Summary:
In recent years, large language models (LLMs) have rapidly proliferated and have been utilized in various tasks, including research in dialogue systems. We aimed to construct a system that not only leverages the flexible conversational abilities of LLMs but also their advanced planning capabilities to reduce the speaking load on human interlocutors and efficiently plan trips. Furthermore, we propose a method that divides the complex task of a travel agency into multiple subtasks, managing each as a separate phase to effectively accomplish the task. Our proposed system confirmed a certain level of success by achieving fourth place in the Dialogue Robot Competition 2023 preliminaries rounds. We report on the challenges identified through the competition.

Title: How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark. (arXiv:2312.13547v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13547
Code URL: null
Copy Paste: [[2312.13547]] How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark(http://arxiv.org/abs/2312.13547)
Summary:
Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent ``Sparsity May Cry'' (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach.

Title: Speech Translation with Large Language Models: An Industrial Practice. (arXiv:2312.13585v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13585
Code URL: null
Copy Paste: [[2312.13585]] Speech Translation with Large Language Models: An Industrial Practice(http://arxiv.org/abs/2312.13585)
Summary:
Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST. Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST, establishing a new benchmark in the field of speech translation. Demo: https://speechtranslation.github.io/llm-st/.

Title: Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries. (arXiv:2312.13671v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13671
Code URL: null
Copy Paste: [[2312.13671]] Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries(http://arxiv.org/abs/2312.13671)
Summary:
Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

Title: Exploiting Contextual Target Attributes for Target Sentiment Classification. (arXiv:2312.13766v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13766
Code URL: null
Copy Paste: [[2312.13766]] Exploiting Contextual Target Attributes for Target Sentiment Classification(http://arxiv.org/abs/2312.13766)
Summary:
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task. In this paper, we present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes. Specifically, we design the domain- and target-constrained cloze test, which can leverage the PTLMs' strong language modeling ability to generate the given target's attributes pertaining to the review context. The attributes contain the background and property information of the target, which can help to enrich the semantics of the review context and the target. To exploit the attributes for tackling TSC, we first construct a heterogeneous information graph by treating the attributes as nodes and combining them with (1) the syntax graph automatically produced by the off-the-shelf dependency parser and (2) the semantics graph of the review context, which is derived from the self-attention mechanism. Then we propose a heterogeneous information gated graph convolutional network to model the interactions among the attribute information, the syntactic information, and the contextual information. The experimental results on three benchmark datasets demonstrate the superiority of our model, which achieves new state-of-the-art performance.

Title: Capture the Flag: Uncovering Data Insights with Large Language Models. (arXiv:2312.13876v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13876
Code URL: null
Copy Paste: [[2312.13876]] Capture the Flag: Uncovering Data Insights with Large Language Models(http://arxiv.org/abs/2312.13876)
Summary:
The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. However, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community.

Title: Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs. (arXiv:2312.13881v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13881
Code URL: null
Copy Paste: [[2312.13881]] Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs(http://arxiv.org/abs/2312.13881)
Summary:
Recent advances in natural language processing (NLP) owe their success to pre-training language models on large amounts of unstructured data. Still, there is an increasing effort to combine the unstructured nature of LMs with structured knowledge and reasoning. Particularly in the rapidly evolving field of biomedical NLP, knowledge-enhanced language models (KELMs) have emerged as promising tools to bridge the gap between large language models and domain-specific knowledge, considering the available biomedical knowledge graphs (KGs) curated by experts over the decades. In this paper, we develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models (PLMs). We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical ontology OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. The approach includes partitioning knowledge graphs into smaller subgraphs, fine-tuning adapter modules for each subgraph, and combining the knowledge in a fusion layer. We test the performance on three downstream tasks: document classification,question answering, and natural language inference. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low. Finally, we provide a detailed interpretation of the results and report valuable insights for future work.

Title: Structured Probabilistic Coding. (arXiv:2312.13933v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13933
Code URL: https://github.com/zerohd4869/SPC
Copy Paste: [[2312.13933]] Structured Probabilistic Coding(http://arxiv.org/abs/2312.13933)
Summary:
This paper presents a new supervised representation learning framework, namely Structured Probabilistic Coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target label space. By extracting compact and informative representations from input related to the target task, SPC can enhance the generalization ability of pre-trained language models for better language understanding. Specifically, the hidden representation is encoded into a Gaussian distribution space, while maximizing the prior entropy of latent representations concerning label space. This technique can simultaneously perform information encoding and task prediction in one module to more fully utilize the effective information from input data, and use variational inference in the output space to reduce randomness and uncertainty. To better control the probability distribution in the latent space, a structured regularization is proposed to promote class-level uniformity in the latent space. With the regularization term, SPC can preserve the Gaussian distribution structure of latent code as well as better cover the hidden space with class uniformly. We conduct evaluations on 12 natural language understanding tasks. The results show that our SPC can effectively improve the performance of pre-trained language models for various classification and regression tasks. Experiments demonstrate that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations.

Title: T-Eval: Evaluating the Tool Utilization Capability Step by Step. (arXiv:2312.14033v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14033
Code URL: https://github.com/open-compass/t-eval
Copy Paste: [[2312.14033]] T-Eval: Evaluating the Tool Utilization Capability Step by Step(http://arxiv.org/abs/2312.14033)
Summary:
Large language models (LLM) have achieved remarkable performance on various NLP tasks and are augmented by tools for broader applications. Yet, how to evaluate and analyze the tool-utilization capability of LLMs is still under-explored. In contrast to previous works that evaluate models holistically, we comprehensively decompose the tool utilization into multiple sub-processes, including instruction following, planning, reasoning, retrieval, understanding, and review. Based on that, we further introduce \shortname~to evaluate the tool utilization capability step by step. \shortname~disentangles the tool utilization evaluation into several sub-domains along model capabilities, facilitating the inner understanding of both holistic and isolated competency of LLMs. We conduct extensive experiments on \shortname~and in-depth analysis of various LLMs. \shortname~ not only exhibits consistency with the outcome-oriented evaluation but also provides a more fine-grained analysis of the capabilities of LLMs, providing a new perspective in LLM evaluation on tool-utilization ability. The benchmark will be available at \href{https://github.com/open-compass/T-Eval}{https://github.com/open-compass/T-Eval}.

gpt

Title: Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation. (arXiv:2312.13608v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13608
Code URL: https://github.com/amazingljy1206/argtersely
Copy Paste: [[2312.13608]] Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation(http://arxiv.org/abs/2312.13608)
Summary:
Counter-argument generation -- a captivating area in computational linguistics -- seeks to craft statements that offer opposing views. While most research has ventured into paragraph-level generation, sentence-level counter-argument generation beckons with its unique constraints and brevity-focused challenges. Furthermore, the diverse nature of counter-arguments poses challenges for evaluating model performance solely based on n-gram-based metrics. In this paper, we present the ArgTersely benchmark for sentence-level counter-argument generation, drawing from a manually annotated dataset from the ChangeMyView debate forum. We also propose Arg-LlaMA for generating high-quality counter-argument. For better evaluation, we trained a BERT-based evaluator Arg-Judge with human preference data. We conducted comparative experiments involving various baselines such as LlaMA, Alpaca, GPT-3, and others. The results show the competitiveness of our proposed framework and evaluator in counter-argument generation tasks. Code and data are available at https://github.com/amazingljy1206/ArgTersely.

Title: ChatGPT as a commenter to the news: can LLMs generate human-like opinions?. (arXiv:2312.13961v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13961
Code URL: https://github.com/raydentseng/generated_opinions
Copy Paste: [[2312.13961]] ChatGPT as a commenter to the news: can LLMs generate human-like opinions?(http://arxiv.org/abs/2312.13961)
Summary:
ChatGPT, GPT-3.5, and other large language models (LLMs) have drawn significant attention since their release, and the abilities of these models have been investigated for a wide variety of tasks. In this research we investigate to what extent GPT-3.5 can generate human-like comments on Dutch news articles. We define human likeness as `not distinguishable from human comments', approximated by the difficulty of automatic classification between human and GPT comments. We analyze human likeness across multiple prompting techniques. In particular, we utilize zero-shot, few-shot and context prompts, for two generated personas. We found that our fine-tuned BERT models can easily distinguish human-written comments from GPT-3.5 generated comments, with none of the used prompting methods performing noticeably better. We further analyzed that human comments consistently showed higher lexical diversity than GPT-generated comments. This indicates that although generative LLMs can generate fluent text, their capability to create human-like opinionated comments is still limited.

llm

Title: In-Context Reinforcement Learning for Variable Action Spaces. (arXiv:2312.13327v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13327
Code URL: null
Copy Paste: [[2312.13327]] In-Context Reinforcement Learning for Variable Action Spaces(http://arxiv.org/abs/2312.13327)
Summary:
Recent work has shown that supervised pre-training on learning histories of RL algorithms results in a model that captures the learning process and is able to improve in-context on novel tasks through interactions with an environment. Despite the progress in this area, there is still a gap in the existing literature, particularly in the in-context generalization to new action spaces. While existing methods show high performance on new tasks created by different reward distributions, their architectural design and training process are not suited for the introduction of new actions during evaluation. We aim to bridge this gap by developing an architecture and training methodology specifically for the task of generalizing to new action spaces. Inspired by Headless LLM, we remove the dependence on the number of actions by directly predicting the action embeddings. Furthermore, we use random embeddings to force the semantic inference of actions from context and to prepare for the new unseen embeddings during test time. Using multi-armed bandit environments with a variable number of arms, we show that our model achieves the performance of the data generation algorithm without requiring retraining for each new environment.

long context

lora

Title: Domain Adaptive Graph Classification. (arXiv:2312.13536v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13536
Code URL: null
Copy Paste: [[2312.13536]] Domain Adaptive Graph Classification(http://arxiv.org/abs/2312.13536)
Summary:
Despite the remarkable accomplishments of graph neural networks (GNNs), they typically rely on task-specific labels, posing potential challenges in terms of their acquisition. Existing work have been made to address this issue through the lens of unsupervised domain adaptation, wherein labeled source graphs are utilized to enhance the learning process for target data. However, the simultaneous exploration of graph topology and reduction of domain disparities remains a substantial hurdle. In this paper, we introduce the Dual Adversarial Graph Representation Learning (DAGRL), which explore the graph topology from dual branches and mitigate domain discrepancies via dual adversarial learning. Our method encompasses a dual-pronged structure, consisting of a graph convolutional network branch and a graph kernel branch, which enables us to capture graph semantics from both implicit and explicit perspectives. Moreover, our approach incorporates adaptive perturbations into the dual branches, which align the source and target distribution to address domain discrepancies. Extensive experiments on a wild range graph classification datasets demonstrate the effectiveness of our proposed method.

Title: Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing. (arXiv:2312.14000v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14000
Code URL: https://github.com/hanyas/psoc
Copy Paste: [[2312.14000]] Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing(http://arxiv.org/abs/2312.14000)
Summary:
Stochastic optimal control of dynamical systems is a crucial challenge in sequential decision-making. Recently, control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma. Nonetheless, a majority of these techniques only invoke the inference-control duality to derive a modified risk objective that is then addressed within a reinforcement learning framework. This paper introduces a novel perspective by framing risk-sensitive stochastic control as Markovian score climbing under samples drawn from a conditional particle filter. Our approach, while purely inference-centric, provides asymptotically unbiased estimates for gradient-based policy optimization with optimal importance weighting and no explicit value function learning. To validate our methodology, we apply it to the task of learning neural non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks of stochastic dynamical systems.

Title: Diffusion Reward: Learning Rewards via Conditional Video Diffusion. (arXiv:2312.14134v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14134
Code URL: null
Copy Paste: [[2312.14134]] Diffusion Reward: Learning Rewards via Conditional Video Diffusion(http://arxiv.org/abs/2312.14134)
Summary:
Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is observed when conditioned on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert-like behaviors. We show the efficacy of our method over 10 robotic manipulation tasks from MetaWorld and Adroit with visual input and sparse reward. Moreover, Diffusion Reward could even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: https://diffusion-reward.github.io/.

hallucination

prompt

code

Title: Multimodal Federated Learning with Missing Modality via Prototype Mask and Contrast. (arXiv:2312.13508v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13508
Code URL: null
Copy Paste: [[2312.13508]] Multimodal Federated Learning with Missing Modality via Prototype Mask and Contrast(http://arxiv.org/abs/2312.13508)
Summary:
In real-world scenarios, multimodal federated learning often faces the practical challenge of intricate modality missing, which poses constraints on building federated frameworks and significantly degrades model inference accuracy. Existing solutions for addressing missing modalities generally involve developing modality-specific encoders on clients and training modality fusion modules on servers. However, these methods are primarily constrained to specific scenarios with either unimodal clients or complete multimodal clients, struggling to generalize effectively in the intricate modality missing scenarios. In this paper, we introduce a prototype library into the FedAvg-based Federated Learning framework, thereby empowering the framework with the capability to alleviate the global model performance degradation resulting from modality missing during both training and testing. The proposed method utilizes prototypes as masks representing missing modalities to formulate a task-calibrated training loss and a model-agnostic uni-modality inference strategy. In addition, a proximal term based on prototypes is constructed to enhance local training. Experimental results demonstrate the state-of-the-art performance of our approach. Compared to the baselines, our method improved inference accuracy by 3.7\% with 50\% modality missing during training and by 23.8\% during uni-modality inference. Code is available at https://github.com/BaoGuangYin/PmcmFL.

Title: Automated Clinical Coding for Outpatient Departments. (arXiv:2312.13533v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13533
Code URL: null
Copy Paste: [[2312.13533]] Automated Clinical Coding for Outpatient Departments(http://arxiv.org/abs/2312.13533)
Summary:
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.

Title: EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models. (arXiv:2312.14069v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14069
Code URL: null
Copy Paste: [[2312.14069]] EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models(http://arxiv.org/abs/2312.14069)
Summary:
We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level.

Title: Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks. (arXiv:2312.13311v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13311
Code URL: https://github.com/belis0811/bwbpf
Copy Paste: [[2312.13311]] Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks(http://arxiv.org/abs/2312.13311)
Summary:
Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training. This includes decisions on how to decouple network blocks and which auxiliary networks to use for each block. In our work, we introduce a novel BP-free approach: a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize distinct sub-neural networks separately, where the global loss is only responsible for updating the output layer. The local error signals used in the BP-free model can be computed in parallel, enabling a potential speed-up in the weight update process through parallel implementation. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations, outperforming models trained with end-to-end backpropagation and other state-of-the-art block-wise learning techniques on datasets such as CIFAR-10 and Tiny-ImageNet. The code is released at https://github.com/Belis0811/BWBPF.

Title: MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records. (arXiv:2312.13454v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13454
Code URL: null
Copy Paste: [[2312.13454]] MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records(http://arxiv.org/abs/2312.13454)
Summary:
Objective: To improve survival analysis using EHR data, we aim to develop a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard.

Materials and Methods: Our technical contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) inferring patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival and guided topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-G using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8,211 subjects with 75,187 outpatient claim data of 1,767 unique ICD codes; the MIMIC-III consisting of 1,458 subjects with multi-modal EHR records.

Results: Compared to the baselines, MixEHR-G achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-G associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge.

Conclusion: The integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG led to not only competitive mortality prediction but also meaningful phenotype topics for systematic survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

Title: CR-SAM: Curvature Regularized Sharpness-Aware Minimization. (arXiv:2312.13555v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13555
Code URL: null
Copy Paste: [[2312.13555]] CR-SAM: Curvature Regularized Sharpness-Aware Minimization(http://arxiv.org/abs/2312.13555)
Summary:
The capacity to generalize to future unseen data stands as one of the utmost crucial attributes of deep neural networks. Sharpness-Aware Minimization (SAM) aims to enhance the generalizability by minimizing worst-case loss using one-step gradient ascent as an approximation. However, as training progresses, the non-linearity of the loss landscape increases, rendering one-step gradient ascent less effective. On the other hand, multi-step gradient ascent will incur higher training cost. In this paper, we introduce a normalized Hessian trace to accurately measure the curvature of loss landscape on {\em both} training and test sets. In particular, to counter excessive non-linearity of loss landscape, we propose Curvature Regularized SAM (CR-SAM), integrating the normalized Hessian trace as a SAM regularizer. Additionally, we present an efficient way to compute the trace via finite differences with parallelism. Our theoretical analysis based on PAC-Bayes bounds establishes the regularizer's efficacy in reducing generalization error. Empirical evaluation on CIFAR and ImageNet datasets shows that CR-SAM consistently enhances classification performance for ResNet and Vision Transformer (ViT) models across various datasets. Our code is available at https://github.com/TrustAIoT/CR-SAM.

Title: Adapt & Align: Continual Learning with Generative Models Latent Space Alignment. (arXiv:2312.13699v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13699
Code URL: https://github.com/jrx-napoli/cl_classifier
Copy Paste: [[2312.13699]] Adapt & Align: Continual Learning with Generative Models Latent Space Alignment(http://arxiv.org/abs/2312.13699)
Summary:
In this work, we introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models. Neural Networks suffer from abrupt loss in performance when retrained with additional training data from different distributions. At the same time, training with additional data without access to the previous examples rarely improves the model's performance. In this work, we propose a new method that mitigates those problems by employing generative models and splitting the process of their update into two parts. In the first one, we train a local generative model using only data from a new task. In the second phase, we consolidate latent representations from the local model with a global one that encodes knowledge of all past experiences. We introduce our approach with Variational Auteoncoders and Generative Adversarial Networks. Moreover, we show how we can use those generative models as a general method for continual knowledge consolidation that can be used in downstream tasks such as classification.

chat

retrieval augmented generation

Title: RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios. (arXiv:2312.13303v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13303
Code URL: null
Copy Paste: [[2312.13303]] RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios(http://arxiv.org/abs/2312.13303)
Summary:
Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporate preferences and conditions to facilitate controllable generation for AV training and evaluation. Traditional methods, mainly relying on memorizing the distribution of training datasets, often fall short in generating unseen scenarios. Inspired by the success of retrieval augmented generation in large language models, we present RealGen, a novel retrieval-based in-context learning framework for traffic scenario generation. RealGen synthesizes new scenarios by combining behaviors from multiple retrieved examples in a gradient-free way, which may originate from templates or tagged scenarios. This in-context learning framework endows versatile generative capabilities, including the ability to edit scenarios, compose various behaviors, and produce critical scenarios. Evaluations show that RealGen offers considerable flexibility and controllability, marking a new direction in the field of controllable traffic scenario generation. Check our project website for more information: https://realgen.github.io.

rag

Title: Fine-tuning Graph Neural Networks by Preserving Graph Generative Patterns. (arXiv:2312.13583v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13583
Code URL: https://github.com/zjunet/G-Tuning
Copy Paste: [[2312.13583]] Fine-tuning Graph Neural Networks by Preserving Graph Generative Patterns(http://arxiv.org/abs/2312.13583)
Summary:
Recently, the paradigm of pre-training and fine-tuning graph neural networks has been intensively studied and applied in a wide range of graph mining tasks. Its success is generally attributed to the structural consistency between pre-training and downstream datasets, which, however, does not hold in many real-world scenarios. Existing works have shown that the structural divergence between pre-training and downstream graphs significantly limits the transferability when using the vanilla fine-tuning strategy. This divergence leads to model overfitting on pre-training graphs and causes difficulties in capturing the structural properties of the downstream graphs. In this paper, we identify the fundamental cause of structural divergence as the discrepancy of generative patterns between the pre-training and downstream graphs. Furthermore, we propose G-Tuning to preserve the generative patterns of downstream graphs. Given a downstream graph G, the core idea is to tune the pre-trained GNN so that it can reconstruct the generative patterns of G, the graphon W. However, the exact reconstruction of a graphon is known to be computationally expensive. To overcome this challenge, we provide a theoretical analysis that establishes the existence of a set of alternative graphons called graphon bases for any given graphon. By utilizing a linear combination of these graphon bases, we can efficiently approximate W. This theoretical finding forms the basis of our proposed model, as it enables effective learning of the graphon bases and their associated coefficients. Compared with existing algorithms, G-Tuning demonstrates an average improvement of 0.5% and 2.6% on in-domain and out-of-domain transfer learning experiments, respectively.

Title: Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion. (arXiv:2312.13616v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13616
Code URL: null
Copy Paste: [[2312.13616]] Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion(http://arxiv.org/abs/2312.13616)
Summary:
Generating counterfactual explanations is one of the most effective approaches for uncovering the inner workings of black-box neural network models and building user trust. While remarkable strides have been made in generative modeling using diffusion models in domains like vision, their utility in generating counterfactual explanations in structured modalities remains unexplored. In this paper, we introduce Structured Counterfactual Diffuser or SCD, the first plug-and-play framework leveraging diffusion for generating counterfactual explanations in structured data. SCD learns the underlying data distribution via a diffusion model which is then guided at test time to generate counterfactuals for any arbitrary black-box model, input, and desired prediction. Our experiments show that our counterfactuals not only exhibit high plausibility compared to the existing state-of-the-art but also show significantly better proximity and diversity.

Title: ProvFL: Client-Driven Interpretability of Global Model Predictions in Federated Learning. (arXiv:2312.13632v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13632
Code URL: null
Copy Paste: [[2312.13632]] ProvFL: Client-Driven Interpretability of Global Model Predictions in Federated Learning(http://arxiv.org/abs/2312.13632)
Summary:
Federated Learning (FL) trains a collaborative machine learning model by aggregating multiple privately trained clients' models over several training rounds. Such a long, continuous action of model aggregations poses significant challenges in reasoning about the origin and composition of such a global model. Regardless of the quality of the global model or if it has a fault, understanding the model's origin is equally important for debugging, interpretability, and explainability in federated learning. FL application developers often question: (1) what clients contributed towards a global model and (2) if a global model predicts a label, which clients are responsible for it?

We introduce, neuron provenance, a fine-grained lineage capturing mechanism that tracks the flow of information between the individual participating clients in FL and the final global model. We operationalize this concept in ProvFL that functions on two key principles. First, recognizing that monitoring every neuron of every client's model statically is ineffective and noisy due to the uninterpretable nature of individual neurons, ProvFL dynamically isolates influential and sensitive neurons in the global model, significantly reducing the search space. Second, as multiple clients' models are fused in each round to form a global model, tracking each client's contribution becomes challenging. ProvFL leverages the invertible nature of fusion algorithms to precisely isolate each client's contribution derived from selected neurons. When asked to localize the clients responsible for the given behavior (i.e., prediction) of the global model, ProvFL successfully localizes them with an average provenance accuracy of 97%. Additionally, ProvFL outperforms the state-of-the-art FL fault localization approach by an average margin of 50%.

Title: Critic-Guided Decision Transformer for Offline Reinforcement Learning. (arXiv:2312.13716v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13716
Code URL: null
Copy Paste: [[2312.13716]] Critic-Guided Decision Transformer for Offline Reinforcement Learning(http://arxiv.org/abs/2312.13716)
Summary:
Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the action distribution based on target returns for each state in a supervised manner. However, prevailing RCSL methods largely focus on deterministic trajectory modeling, disregarding stochastic state transitions and the diversity of future trajectory distributions. A fundamental challenge arises from the inconsistency between the sampled returns within individual trajectories and the expected returns across multiple trajectories. Fortunately, value-based methods offer a solution by leveraging a value function to approximate the expected returns, thereby addressing the inconsistency effectively. Building upon these insights, we propose a novel approach, termed the Critic-Guided Decision Transformer (CGDT), which combines the predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer. By incorporating a learned value function, known as the critic, CGDT ensures a direct alignment between the specified target returns and the expected returns of actions. This integration bridges the gap between the deterministic nature of RCSL and the probabilistic characteristics of value-based methods. Empirical evaluations on stochastic environments and D4RL benchmark datasets demonstrate the superiority of CGDT over traditional RCSL methods. These results highlight the potential of CGDT to advance the state of the art in offline RL and extend the applicability of RCSL to a wide range of RL tasks.

Title: Solving Long-run Average Reward Robust MDPs via Stochastic Games. (arXiv:2312.13912v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.13912
Code URL: null
Copy Paste: [[2312.13912]] Solving Long-run Average Reward Robust MDPs via Stochastic Games(http://arxiv.org/abs/2312.13912)
Summary:
Markov decision processes (MDPs) provide a standard framework for sequential decision making under uncertainty. However, transition probabilities in MDPs are often estimated from data and MDPs do not take data uncertainty into account. Robust Markov decision processes (RMDPs) address this shortcoming of MDPs by assigning to each transition an uncertainty set rather than a single probability value. The goal of solving RMDPs is then to find a policy which maximizes the worst-case performance over the uncertainty sets. In this work, we consider polytopic RMDPs in which all uncertainty sets are polytopes and study the problem of solving long-run average reward polytopic RMDPs. Our focus is on computational complexity aspects and efficient algorithms. We present a novel perspective on this problem and show that it can be reduced to solving long-run average reward turn-based stochastic games with finite state and action spaces. This reduction allows us to derive several important consequences that were hitherto not known to hold for polytopic RMDPs. First, we derive new computational complexity bounds for solving long-run average reward polytopic RMDPs, showing for the first time that the threshold decision problem for them is in NP coNP and that they admit a randomized algorithm with sub-exponential expected runtime. Second, we present Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. Our experimental evaluation shows that RPPI is much more efficient in solving long-run average reward polytopic RMDPs compared to state-of-the-art methods based on value iteration.

Title: Structure-Aware Path Inference for Neural Finite State Transducers. (arXiv:2312.13614v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13614
Code URL: null
Copy Paste: [[2312.13614]] Structure-Aware Path Inference for Neural Finite State Transducers(http://arxiv.org/abs/2312.13614)
Summary:
Neural finite-state transducers (NFSTs) form an expressive family of neurosymbolic sequence transduction models. An NFST models each string pair as having been generated by a latent path in a finite-state transducer. As they are deep generative models, both training and inference of NFSTs require inference networks that approximate posterior distributions over such latent variables. In this paper, we focus on the resulting challenge of imputing the latent alignment path that explains a given pair of input and output strings (e.g., during training). We train three autoregressive approximate models for amortized inference of the path, which can then be used as proposal distributions for importance sampling. All three models perform lookahead. Our most sophisticated (and novel) model leverages the FST structure to consider the graph of future paths; unfortunately, we find that it loses out to the simpler approaches -- except on an artificial task that we concocted to confuse the simpler approaches.

Title: Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data Heterogeneity. (arXiv:2312.13380v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13380
Code URL: https://github.com/yiyuec/fed-qssl
Copy Paste: [[2312.13380]] Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data Heterogeneity(http://arxiv.org/abs/2312.13380)
Summary:
Motivated by high resource costs of centralized machine learning schemes as well as data privacy concerns, federated learning (FL) emerged as an efficient alternative that relies on aggregating locally trained models rather than collecting clients' potentially private data. In practice, available resources and data distributions vary from one client to another, creating an inherent system heterogeneity that leads to deterioration of the performance of conventional FL algorithms. In this work, we present a federated quantization-based self-supervised learning scheme (Fed-QSSL) designed to address heterogeneity in FL systems. At clients' side, to tackle data heterogeneity we leverage distributed self-supervised learning while utilizing low-bit quantization to satisfy constraints imposed by local infrastructure and limited communication resources. At server's side, Fed-QSSL deploys de-quantization, weighted aggregation and re-quantization, ultimately creating models personalized to both data distribution as well as specific infrastructure of each client's device. We validated the proposed algorithm on real world datasets, demonstrating its efficacy, and theoretically analyzed impact of low-bit training on the convergence and robustness of the learned models.

Title: InvertibleNetworks.jl: A Julia package for scalable normalizing flows. (arXiv:2312.13480v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13480
Code URL: null
Copy Paste: [[2312.13480]] InvertibleNetworks(http://arxiv.org/abs/2312.13480)
Summary:
InvertibleNetworks.jl is a Julia package designed for the scalable implementation of normalizing flows, a method for density estimation and sampling in high-dimensional distributions. This package excels in memory efficiency by leveraging the inherent invertibility of normalizing flows, which significantly reduces memory requirements during backpropagation compared to existing normalizing flow packages that rely on automatic differentiation frameworks. InvertibleNetworks.jl has been adapted for diverse applications, including seismic imaging, medical imaging, and CO2 monitoring, demonstrating its effectiveness in learning high-dimensional distributions.

multi-run

chain-of-thought

tree-of-thought

agent

Title: Towards Fair Graph Federated Learning via Incentive Mechanisms. (arXiv:2312.13306v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13306
Code URL: https://github.com/zjunet/fairgraphfl
Copy Paste: [[2312.13306]] Towards Fair Graph Federated Learning via Incentive Mechanisms(http://arxiv.org/abs/2312.13306)
Summary:
Graph federated learning (FL) has emerged as a pivotal paradigm enabling multiple agents to collaboratively train a graph model while preserving local data privacy. Yet, current efforts overlook a key issue: agents are self-interested and would hesitant to share data without fair and satisfactory incentives. This paper is the first endeavor to address this issue by studying the incentive mechanism for graph federated learning. We identify a unique phenomenon in graph federated learning: the presence of agents posing potential harm to the federation and agents contributing with delays. This stands in contrast to previous FL incentive mechanisms that assume all agents contribute positively and in a timely manner. In view of this, this paper presents a novel incentive mechanism tailored for fair graph federated learning, integrating incentives derived from both model gradient and payoff. To achieve this, we first introduce an agent valuation function aimed at quantifying agent contributions through the introduction of two criteria: gradient alignment and graph diversity. Moreover, due to the high heterogeneity in graph federated learning, striking a balance between accuracy and fairness becomes particularly crucial. We introduce motif prototypes to enhance accuracy, communicated between the server and agents, enhancing global model aggregation and aiding agents in local model optimization. Extensive experiments show that our model achieves the best trade-off between accuracy and the fairness of model gradient, as well as superior payoff fairness.

Title: Adversarial Markov Games: On Adaptive Decision-Based Attacks and Defenses. (arXiv:2312.13435v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.13435
Code URL: null
Copy Paste: [[2312.13435]] Adversarial Markov Games: On Adaptive Decision-Based Attacks and Defenses(http://arxiv.org/abs/2312.13435)
Summary:
Despite considerable efforts on making them robust, real-world ML-based systems remain vulnerable to decision based attacks, as definitive proofs of their operational robustness have so far proven intractable. The canonical approach in robustness evaluation calls for adaptive attacks, that is with complete knowledge of the defense and tailored to bypass it. In this study, we introduce a more expansive notion of being adaptive and show how attacks but also defenses can benefit by it and by learning from each other through interaction. We propose and evaluate a framework for adaptively optimizing black-box attacks and defenses against each other through the competitive game they form. To reliably measure robustness, it is important to evaluate against realistic and worst-case attacks. We thus augment both attacks and the evasive arsenal at their disposal through adaptive control, and observe that the same can be done for defenses, before we evaluate them first apart and then jointly under a multi-agent perspective. We demonstrate that active defenses, which control how the system responds, are a necessary complement to model hardening when facing decision-based attacks; then how these defenses can be circumvented by adaptive attacks, only to finally elicit active and adaptive defenses. We validate our observations through a wide theoretical and empirical investigation to confirm that AI-enabled adversaries pose a considerable threat to black-box ML-based systems, rekindling the proverbial arms race where defenses have to be AI-enabled too. Succinctly, we address the challenges posed by adaptive adversaries and develop adaptive defenses, thereby laying out effective strategies in ensuring the robustness of ML-based systems deployed in the real-world.

Title: Understanding and Estimating Domain Complexity Across Domains. (arXiv:2312.13487v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.13487
Code URL: null
Copy Paste: [[2312.13487]] Understanding and Estimating Domain Complexity Across Domains(http://arxiv.org/abs/2312.13487)
Summary:
Artificial Intelligence (AI) systems, trained in controlled environments, often struggle in real-world complexities. We propose a general framework for estimating domain complexity across diverse environments, like open-world learning and real-world applications. This framework distinguishes between intrinsic complexity (inherent to the domain) and extrinsic complexity (dependent on the AI agent). By analyzing dimensionality, sparsity, and diversity within these categories, we offer a comprehensive view of domain challenges. This approach enables quantitative predictions of AI difficulty during environment transitions, avoids bias in novel situations, and helps navigate the vast search spaces of open-world domains.

Title: Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System. (arXiv:2312.13816v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13816
Code URL: null
Copy Paste: [[2312.13816]] Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System(http://arxiv.org/abs/2312.13816)
Summary:
At the Dialogue Robot Competition 2023 (DRC2023), which was held to improve the capability of dialogue robots, our team developed a system that could build common ground and take more natural turns based on user utterance texts. Our system generated queries for sightseeing spot searches using the common ground and engaged in dialogue while waiting for user comprehension.

Title: Learning Human-like Representations to Enable Learning Human Values. (arXiv:2312.14106v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14106
Code URL: null
Copy Paste: [[2312.14106]] Learning Human-like Representations to Enable Learning Human Values(http://arxiv.org/abs/2312.14106)
Summary:
How can we build AI systems that are aligned with human values and objectives in order to avoid causing harm or violating societal standards for acceptable behavior? Making AI systems learn human-like representations of the world has many known benefits, including improving generalization, robustness to domain shifts, and few-shot learning performance, among others. We propose that this kind of representational alignment between machine learning (ML) models and humans is also a necessary condition for value alignment, where ML systems conform to human values and societal norms. We focus on ethics as one aspect of value alignment and train multiple ML agents (support vector regression and kernel regression) in a multi-armed bandit setting, where rewards are sampled from a distribution that reflects the morality of the chosen action. We then study the relationship between each agent's degree of representational alignment with humans and their performance when learning to take the most ethical actions.

Title: Manipulating Trajectory Prediction with Backdoors. (arXiv:2312.13863v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13863
Code URL: null
Copy Paste: [[2312.13863]] Manipulating Trajectory Prediction with Backdoors(http://arxiv.org/abs/2312.13863)
Summary:
Autonomous vehicles ought to predict the surrounding agents' trajectories to allow safe maneuvers in uncertain and complex traffic situations. As companies increasingly apply trajectory prediction in the real world, security becomes a relevant concern. In this paper, we focus on backdoors - a security threat acknowledged in other fields but so far overlooked for trajectory prediction. To this end, we describe and investigate four triggers that could affect trajectory prediction. We then show that these triggers (for example, a braking vehicle), when correlated with a desired output (for example, a curve) during training, cause the desired output of a state-of-the-art trajectory prediction model. In other words, the model has good benign performance but is vulnerable to backdoors. This is the case even if the trigger maneuver is performed by a non-casual agent behind the target vehicle. As a side-effect, our analysis reveals interesting limitations within trajectory prediction models. Finally, we evaluate a range of defenses against backdoors. While some, like simple offroad checks, do not enable detection for all triggers, clustering is a promising candidate to support manual inspection to find backdoors.