2023-12-25

language model

Title: Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning. (arXiv:2312.14184v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14184
Code URL: null
Copy Paste: [[2312.14184]] Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning(http://arxiv.org/abs/2312.14184)
Summary:
This study assesses the ability of state-of-the-art large language models (LLMs) including GPT-3.5, GPT-4, Falcon, and LLaMA 2 to identify patients with mild cognitive impairment (MCI) from discharge summaries and examines instances where the models' responses were misaligned with their reasoning. Utilizing the MIMIC-IV v2.2 database, we focused on a cohort aged 65 and older, verifying MCI diagnoses against ICD codes and expert evaluations. The data was partitioned into training, validation, and testing sets in a 7:2:1 ratio for model fine-tuning and evaluation, with an additional metastatic cancer dataset from MIMIC III used to further assess reasoning consistency. GPT-4 demonstrated superior interpretative capabilities, particularly in response to complex prompts, yet displayed notable response-reasoning inconsistencies. In contrast, open-source models like Falcon and LLaMA 2 achieved high accuracy but lacked explanatory reasoning, underscoring the necessity for further research to optimize both performance and interpretability. The study emphasizes the significance of prompt engineering and the need for further exploration into the unexpected reasoning-response misalignment observed in GPT-4. The results underscore the promise of incorporating LLMs into healthcare diagnostics, contingent upon methodological advancements to ensure accuracy and clinical coherence of AI-generated outputs, thereby improving the trustworthiness of LLMs for medical decision-making.

Title: Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method. (arXiv:2312.14188v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14188
Code URL: null
Copy Paste: [[2312.14188]] Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method(http://arxiv.org/abs/2312.14188)
Summary:
Theorem proving is a fundamental task in mathematics. With the advent of large language models (LLMs) and interactive theorem provers (ITPs) like Lean, there has been growing interest in integrating LLMs and ITPs to automate theorem proving. In this approach, the LLM generates proof steps (tactics), and the ITP checks the applicability of the tactics at the current goal. The two systems work together to complete the proof. In this paper, we introduce DS-Prover, a novel dynamic sampling method for theorem proving. This method dynamically determines the number of tactics to apply to expand the current goal, taking into account the remaining time compared to the total allocated time for proving a theorem. This makes the proof search process more efficient by adjusting the balance between exploration and exploitation as time passes. We also augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises. This gives the model more examples to learn from and helps it to predict the tactics with premises more accurately. We perform our experiments using the Mathlib dataset of the Lean theorem prover and report the performance on two standard datasets, MiniF2F and ProofNet. Our methods achieve significant performance gains on both datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the best-reported Pass@1 of 29.6% using Lean.

Title: Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. (arXiv:2312.14197v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14197
Code URL: null
Copy Paste: [[2312.14197]] Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models(http://arxiv.org/abs/2312.14197)
Summary:
Recent remarkable advancements in large language models (LLMs) have led to their widespread adoption in various applications. A key feature of these applications is the combination of LLMs with external content, where user instructions and third-party content are combined to create prompts for LLM processing. These applications, however, are vulnerable to indirect prompt injection attacks, where malicious instructions embedded within external content compromise LLM's output, causing their responses to deviate from user expectations. Despite the discovery of this security issue, no comprehensive analysis of indirect prompt injection attacks on different LLMs is available due to the lack of a benchmark. Furthermore, no effective defense has been proposed.

In this work, we introduce the first benchmark, BIPIA, to measure the robustness of various LLMs and defenses against indirect prompt injection attacks. Our experiments reveal that LLMs with greater capabilities exhibit more vulnerable to indirect prompt injection attacks for text tasks, resulting in a higher ASR. We hypothesize that indirect prompt injection attacks are mainly due to the LLMs' inability to distinguish between instructions and external content. Based on this conjecture, we propose four black-box methods based on prompt learning and a white-box defense methods based on fine-tuning with adversarial training to enable LLMs to distinguish between instructions and external content and ignore instructions in the external content. Our experimental results show that our black-box defense methods can effectively reduce ASR but cannot completely thwart indirect prompt injection attacks, while our white-box defense method can reduce ASR to nearly zero with little adverse impact on the LLM's performance on general tasks. We hope that our benchmark and defenses can inspire future work in this important area.

Title: Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models. (arXiv:2312.14202v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14202
Code URL: null
Copy Paste: [[2312.14202]] Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models(http://arxiv.org/abs/2312.14202)
Summary:
This study explores the idea of AI Personality or AInality suggesting that Large Language Models (LLMs) exhibit patterns similar to human personalities. Assuming that LLMs share these patterns with humans, we investigate using human-centered psychometric tests such as the Myers-Briggs Type Indicator (MBTI), Big Five Inventory (BFI), and Short Dark Triad (SD3) to identify and confirm LLM personality types. By introducing role-play prompts, we demonstrate the adaptability of LLMs, showing their ability to switch dynamically between different personality types. Using projective tests, such as the Washington University Sentence Completion Test (WUSCT), we uncover hidden aspects of LLM personalities that are not easily accessible through direct questioning. Projective tests allowed for a deep exploration of LLMs cognitive processes and thought patterns and gave us a multidimensional view of AInality. Our machine learning analysis revealed that LLMs exhibit distinct AInality traits and manifest diverse personality types, demonstrating dynamic shifts in response to external instructions. This study pioneers the application of projective tests on LLMs, shedding light on their diverse and adaptable AInality traits.

Title: Experimenting with Large Language Models and vector embeddings in NASA SciX. (arXiv:2312.14211v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14211
Code URL: null
Copy Paste: [[2312.14211]] Experimenting with Large Language Models and vector embeddings in NASA SciX(http://arxiv.org/abs/2312.14211)
Summary:
Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when using Retrieval Augmented Generation. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds.

Title: SimLM: Can Language Models Infer Parameters of Physical Systems?. (arXiv:2312.14215v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14215
Code URL: null
Copy Paste: [[2312.14215]] SimLM: Can Language Models Infer Parameters of Physical Systems?(http://arxiv.org/abs/2312.14215)
Summary:
Recent developments in large-scale machine learning models for general-purpose understanding, translation and generation of language are driving impact across a variety of sectors including medicine, robotics, and scientific discovery. The strength of such Large Language Models (LLMs) stems from the large corpora that they are trained with. While this imbues them with a breadth of capabilities, they have been found unsuitable for some specific types of problems such as advanced mathematics. In this paper, we highlight the inability of LLMs to reason about physics tasks. We demonstrate that their ability to infer parameters of physical systems can be improved, without retraining, by augmenting their context with feedback from physical simulation.

Title: Deep de Finetti: Recovering Topic Distributions from Large Language Models. (arXiv:2312.14226v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14226
Code URL: null
Copy Paste: [[2312.14226]] Deep de Finetti: Recovering Topic Distributions from Large Language Models(http://arxiv.org/abs/2312.14226)
Summary:
Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal representations of LLMs encode one aspect of latent structure, namely syntax; here we investigate a complementary aspect, namely the document's topic structure. We motivate the hypothesis that LLMs capture topic structure by connecting LLM optimization to implicit Bayesian inference. De Finetti's theorem shows that exchangeable probability distributions can be represented as a mixture with respect to a latent generating distribution. Although text is not exchangeable at the level of syntax, exchangeability is a reasonable starting assumption for topic structure. We thus hypothesize that predicting the next token in text will lead LLMs to recover latent topic distributions. We examine this hypothesis using Latent Dirichlet Allocation (LDA), an exchangeable probabilistic topic model, as a target, and we show that the representations formed by LLMs encode both the topics used to generate synthetic data and those used to explain natural corpus data.

Title: Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models. (arXiv:2312.14346v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14346
Code URL: null
Copy Paste: [[2312.14346]] Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models(http://arxiv.org/abs/2312.14346)
Summary:
Large Language Models (LLMs) are adept at text manipulation -- tasks such as machine translation and text summarization. However, these models can also be prone to hallucination, which can be detrimental to the faithfulness of any answers that the model provides. Recent works in combating hallucinations in LLMs deal with identifying hallucinated sentences and categorizing the different ways in which models hallucinate. This paper takes a deep dive into LLM behavior with respect to hallucinations, defines a token-level approach to identifying different kinds of hallucinations, and further utilizes this token-level tagging to improve the interpretability and faithfulness of LLMs in dialogue summarization tasks. Through this, the paper presents a new, enhanced dataset and a new training paradigm.

Title: A Unified Industrial Large Knowledge Model Framework in Smart Manufacturing. (arXiv:2312.14428v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14428
Code URL: null
Copy Paste: [[2312.14428]] A Unified Industrial Large Knowledge Model Framework in Smart Manufacturing(http://arxiv.org/abs/2312.14428)
Summary:
The recent emergence of large language models (LLMs) shows the potential for artificial general intelligence, revealing new opportunities in industry 4.0 and smart manufacturing. However, a notable gap exists in applying these LLMs in industry, primarily due to their training on general knowledge rather than domain-specific knowledge. Such specialized domain knowledge is vital for effectively addressing the complex needs of industrial applications. To bridge this gap, this paper proposes an Industrial Large Knowledge Model (ILKM) framework emphasizing their potential to revolutionize the industry in smart manufacturing. In addition, ILKMs and LLMs are compared from eight perspectives. Finally, "6S Principle" is proposed as the guideline for the development of ILKMs in smart manufacturing.

Title: Language Model is a Branch Predictor for Simultaneous Machine Translation. (arXiv:2312.14488v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14488
Code URL: null
Copy Paste: [[2312.14488]] Language Model is a Branch Predictor for Simultaneous Machine Translation(http://arxiv.org/abs/2312.14488)
Summary:
The primary objective of simultaneous machine translation (SiMT) is to minimize latency while preserving the quality of the final translation. Drawing inspiration from CPU branch prediction techniques, we propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency. Specifically, we utilize a language model as a branch predictor to predict potential branch directions, namely, future source words. Subsequently, we utilize the predicted source words to decode the output in advance. When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output. To further reduce computational costs, we share the parameters of the encoder and the branch predictor, and utilize a pre-trained language model for initialization. Our proposed method can be seamlessly integrated with any SiMT model. Extensive experimental results demonstrate that our approach can improve translation quality and latency at the same time. Our code is available at https://github.com/YinAoXiong/simt_branch_predictor .

Title: Large Language Model (LLM) Bias Index -- LLMBI. (arXiv:2312.14769v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14769
Code URL: null
Copy Paste: [[2312.14769]] Large Language Model (LLM) Bias Index -- LLMBI(http://arxiv.org/abs/2312.14769)
Summary:
The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4. We recognise the increasing prevalence and impact of LLMs across diverse sectors. This research introduces a novel metric, LLMBI, to systematically measure and mitigate biases potentially skewing model responses. We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases.

To operationalise this metric, we engaged in a multi-step process involving collecting and annotating LLM responses, applying sophisticated Natural Language Processing (NLP) techniques for bias detection, and computing the LLMBI score through a specially crafted mathematical formula. The formula integrates weighted averages of various bias dimensions, a penalty for dataset diversity deficiencies, and a correction for sentiment biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection.

The research reveals LLMs, whilst demonstrating impressive capabilities in text generation, exhibit varying degrees of bias across different dimensions. LLMBI provides a quantifiable measure to compare biases across models and over time, offering a vital tool for systems engineers, researchers and regulators in enhancing the fairness and reliability of LLMs. It highlights the potential of LLMs in mimicking unbiased human-like responses. Additionally, it underscores the necessity of continuously monitoring and recalibrating such models to align with evolving societal norms and ethical standards.

Title: YAYI 2: Multilingual Open-Source Large Language Models. (arXiv:2312.14862v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14862
Code URL: null
Copy Paste: [[2312.14862]] YAYI 2: Multilingual Open-Source Large Language Models(http://arxiv.org/abs/2312.14862)
Summary:
As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and gained comparable performances to proprietary models. However, these models are primarily designed for English scenarios and exhibit poor performances in Chinese contexts. In this technical report, we propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback. Extensive experiments on multiple benchmarks, such as MMLU and CMMLU, consistently demonstrate that the proposed YAYI 2 outperforms other similar sized open-source models.

Title: Robust Knowledge Extraction from Large Language Models using Social Choice Theory. (arXiv:2312.14877v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14877
Code URL: null
Copy Paste: [[2312.14877]] Robust Knowledge Extraction from Large Language Models using Social Choice Theory(http://arxiv.org/abs/2312.14877)
Summary:
Large-language models (LLMs) have the potential to support a wide range of applications like conversational agents, creative writing, text improvement, and general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they generate answers at random and their answers are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.

Title: NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes. (arXiv:2312.14890v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14890
Code URL: null
Copy Paste: [[2312.14890]] NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes(http://arxiv.org/abs/2312.14890)
Summary:
Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large Language Models (LLMs) is critical: numerous benchmarks have been established to assess the reasoning abilities of LLMs. However, current benchmarks are inadequate in offering a rigorous evaluation of the full extent of reasoning abilities that LLMs are capable of achieving. They are also prone to the risk of overfitting, as these benchmarks, being publicly accessible and static, allow models to potentially tailor their responses to specific benchmark metrics, thereby inflating their performance. Addressing these limitations, our research introduces a new benchmark, named NPHardEval. This benchmark is designed to evaluate the reasoning abilities of LLMs across a broad spectrum of 900 algorithmic questions, extending up to the NP-Hard complexity class. These questions are meticulously chosen to represent a wide range of complexity class below the NP-hard complexity class, offering a rigorous measure of the reasoning ability of LLMs. Through this study, we shed light on the current state of reasoning in LLMs, providing an objective and rigorous perspective through the comparison of LLMs' performance across complex classes. Moreover, this benchmark is designed with a dynamic update mechanism, where the datapoints are refreshed on a monthly basis. Such regular updates play a crucial role in mitigating the risk of LLMs overfitting to the benchmark, promoting a more accurate and reliable assessment of their reasoning capabilities. The benchmark dataset and code of NPHardEval are available at https://github.com/casmlab/NPHardEval.

Title: Dynamic Topic Language Model on Heterogeneous Children's Mental Health Clinical Notes. (arXiv:2312.14180v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14180
Code URL: null
Copy Paste: [[2312.14180]] Dynamic Topic Language Model on Heterogeneous Children's Mental Health Clinical Notes(http://arxiv.org/abs/2312.14180)
Summary:
Mental health diseases affect children's lives and well-beings which have received increased attention since the COVID-19 pandemic. Analyzing psychiatric clinical notes with topic models is critical to evaluate children's mental status over time. However, few topic models are built for longitudinal settings, and they fail to keep consistent topics and capture temporal trajectories for each document. To address these challenges, we develop a longitudinal topic model with time-invariant topics and individualized temporal dependencies on the evolving document metadata. Our model preserves the semantic meaning of discovered topics over time and incorporates heterogeneity among documents. In particular, when documents can be categorized, we propose an unsupervised topics learning approach to maximize topic heterogeneity across different document groups. We also present an efficient variational optimization procedure adapted for the multistage longitudinal setting. In this case study, we apply our method to the psychiatric clinical notes from a large tertiary pediatric hospital in Southern California and achieve a 38% increase in the overall coherence of extracted topics. Our real data analysis reveals that children tend to express more negative emotions during state shutdowns and more positive when schools reopen. Furthermore, it suggests that sexual and gender minority (SGM) children display more pronounced reactions to major COVID-19 events and a greater sensitivity to vaccine-related news than non-SGM children. This study examines the progression of children's mental health during the pandemic and offers clinicians valuable insights to recognize the disparities in children's mental health related to their sexual and gender identities.

Title: Efficacy of Machine-Generated Instructions. (arXiv:2312.14423v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14423
Code URL: null
Copy Paste: [[2312.14423]] Efficacy of Machine-Generated Instructions(http://arxiv.org/abs/2312.14423)
Summary:
Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We conducted a quantitative study to figure out the efficacy of machine-generated annotations, where we compare the results of a fine-tuned BERT model with human v/s machine-generated annotations. Applying our methods to the vanilla GPT-3 model, we saw that machine generated annotations were 78.54% correct and the fine-tuned model achieved a 96.01% model performance compared to the performance with human-labelled annotations. This result shows that machine-generated annotations are a resource and cost effective way to fine-tune down-stream models.

Title: Reasons to Reject? Aligning Language Models with Judgments. (arXiv:2312.14591v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14591
Code URL: https://github.com/wwxu21/cut
Copy Paste: [[2312.14591]] Reasons to Reject? Aligning Language Models with Judgments(http://arxiv.org/abs/2312.14591)
Summary:
As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.

Title: Numerical Reasoning for Financial Reports. (arXiv:2312.14870v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14870
Code URL: null
Copy Paste: [[2312.14870]] Numerical Reasoning for Financial Reports(http://arxiv.org/abs/2312.14870)
Summary:
Financial reports offer critical insights into a company's operations, yet their extensive length typically spanning 30 40 pages poses challenges for swift decision making in dynamic markets. To address this, we leveraged finetuned Large Language Models (LLMs) to distill key indicators and operational metrics from these reports basis questions from the user. We devised a method to locate critical data, and leverage the FinQA dataset to fine-tune both Llama-2 7B and T5 models for customized question answering. We achieved results comparable to baseline on the final numerical answer, a competitive accuracy in numerical reasoning and calculation.

Title: A Survey of Reinforcement Learning from Human Feedback. (arXiv:2312.14925v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14925
Code URL: null
Copy Paste: [[2312.14925]] A Survey of Reinforcement Learning from Human Feedback(http://arxiv.org/abs/2312.14925)
Summary:
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of Large Language Models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in targeting the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between machine agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

gpt

Title: Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection. (arXiv:2312.14406v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14406
Code URL: null
Copy Paste: [[2312.14406]] Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection(http://arxiv.org/abs/2312.14406)
Summary:
In this work, we introduce an innovative autoregressive model leveraging Generative Pretrained Transformer (GPT) architectures, tailored for fraud detection in payment systems. Our approach innovatively confronts token explosion and reconstructs behavioral sequences, providing a nuanced understanding of transactional behavior through temporal and contextual analysis. Utilizing unsupervised pretraining, our model excels in feature representation without the need for labeled data. Additionally, we integrate a differential convolutional approach to enhance anomaly detection, bolstering the security and efficacy of one of the largest online payment merchants in China. The scalability and adaptability of our model promise broad applicability in various transactional contexts.

llm

Title: Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs. (arXiv:2312.14345v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14345
Code URL: null
Copy Paste: [[2312.14345]] Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs(http://arxiv.org/abs/2312.14345)
Summary:
The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps. In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results.

Title: Zero-shot Causal Graph Extrapolation from Text via LLMs. (arXiv:2312.14670v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14670
Code URL: null
Copy Paste: [[2312.14670]] Zero-shot Causal Graph Extrapolation from Text via LLMs(http://arxiv.org/abs/2312.14670)
Summary:
We evaluate the ability of large language models (LLMs) to infer causal relations from natural language. Compared to traditional natural language processing and deep learning techniques, LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples. This motivates us to extend our approach to extrapolating causal graphs through iterated pairwise queries. We perform a preliminary analysis on a benchmark of biomedical abstracts with ground-truth causal graphs validated by experts. The results are promising and support the adoption of LLMs for such a crucial step in causal inference, especially in medical domains, where the amount of scientific text to analyse might be huge, and the causal statements are often implicit.

Title: Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion. (arXiv:2312.14327v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14327
Code URL: null
Copy Paste: [[2312.14327]] Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion(http://arxiv.org/abs/2312.14327)
Summary:
Abbreviation expansion is a strategy used to speed up communication by limiting the amount of typing and using a language model to suggest expansions. Here we look at personalizing a Large Language Model's (LLM) suggestions based on prior conversations to enhance the relevance of predictions, particularly when the user data is small (~1000 samples). Specifically, we compare fine-tuning, prompt-tuning, and retrieval augmented generation of expanded text suggestions for abbreviated inputs. Our case study with a deployed 8B parameter LLM on a real user living with ALS, and experiments on movie character personalization indicates that (1) customization may be necessary in some scenarios and prompt-tuning generalizes well to those, (2) fine-tuning on in-domain data (with as few as 600 samples) still shows some gains, however (3) retrieval augmented few-shot selection also outperforms fine-tuning. (4) Parameter efficient tuning allows for efficient and scalable personalization. For prompt-tuning, we also find that initializing the learned "soft-prompts" to user relevant concept tokens leads to higher accuracy than random initialization.

long context

lora

Title: SEOpinion: Summarization and Exploration Opinion of E-Commerce Websites. (arXiv:2312.14171v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14171
Code URL: null
Copy Paste: [[2312.14171]] SEOpinion: Summarization and Exploration Opinion of E-Commerce Websites(http://arxiv.org/abs/2312.14171)
Summary:
E-Commerce (EC) websites provide a large amount of useful information that exceed human cognitive processing ability. In order to help customers in comparing alternatives when buying a product, previous studies designed opinion summarization systems based on customer reviews. They ignored templates' information provided by manufacturers, although these descriptive information have much product aspects or characteristics. Therefore, this paper proposes a methodology coined as SEOpinion (Summa-rization and Exploration of Opinions) which provides a summary for the product aspects and spots opinion(s) regarding them, using a combination of templates' information with the customer reviews in two main phases. First, the Hierarchical Aspect Extraction (HAE) phase creates a hierarchy of product aspects from the template. Subsequently, the Hierarchical Aspect-based Opinion Summarization (HAOS) phase enriches this hierarchy with customers' opinions; to be shown to other potential buyers. To test the feasibility of using Deep Learning-based BERT techniques with our approach, we have created a corpus by gathering information from the top five EC websites for laptops. The experimental results show that Recurrent Neural Network (RNN) achieves better results (77.4% and 82.6% in terms of F1-measure for the first and second phase) than the Convolutional Neural Network (CNN) and the Support Vector Machine (SVM) technique.

Title: Not All Tasks Are Equally Difficult: Multi-Task Reinforcement Learning with Dynamic Depth Routing. (arXiv:2312.14472v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14472
Code URL: null
Copy Paste: [[2312.14472]] Not All Tasks Are Equally Difficult: Multi-Task Reinforcement Learning with Dynamic Depth Routing(http://arxiv.org/abs/2312.14472)
Summary:
Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy. To enhance data efficiency by sharing parameters across multiple tasks, a common practice segments the network into distinct modules and trains a routing network to recombine these modules into task-specific policies. However, existing routing approaches employ a fixed number of modules for all tasks, neglecting that tasks with varying difficulties commonly require varying amounts of knowledge. This work presents a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of certain intermediate modules, thereby flexibly choosing different numbers of modules for each task. Under this framework, we further introduce a ResRouting method to address the issue of disparate routing paths between behavior and target policies during off-policy training. In addition, we design an automatic route-balancing mechanism to encourage continued routing exploration for unmastered tasks without disturbing the routing of mastered ones. We conduct extensive experiments on various robotics manipulation tasks in the Meta-World benchmark, where D2R achieves state-of-the-art performance with significantly improved learning efficiency.

Title: Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration. (arXiv:2312.14470v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14470
Code URL: null
Copy Paste: [[2312.14470]] Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration(http://arxiv.org/abs/2312.14470)
Summary:
This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step. Existing studies have considered safe RL with hard instantaneous constraints, but their approaches rely on several key assumptions: $(i)$ the RL agent knows a safe action set for {\it every} state or knows a {\it safe graph} in which all the state-action-state triples are safe, and $(ii)$ the constraint/cost functions are {\it linear}. In this paper, we consider safe RL with instantaneous hard constraints without assumption $(i)$ and generalize $(ii)$ to Reproducing Kernel Hilbert Space (RKHS). Our proposed algorithm, LSVI-AE, achieves $\tilde{\cO}(\sqrt{d^3H^4K})$ regret and $\tilde{\cO}(H \sqrt{dK})$ hard constraint violation when the cost function is linear and $\cO(H\gamma_K \sqrt{K})$ hard constraint violation when the cost function belongs to RKHS. Here $K$ is the learning horizon, $H$ is the length of each episode, and $\gamma_K$ is the information gain w.r.t the kernel used to approximate cost functions. Our results achieve the optimal dependency on the learning horizon $K$, matching the lower bound we provide in this paper and demonstrating the efficiency of LSVI-AE. Notably, the design of our approach encourages aggressive policy exploration, providing a unique perspective on safe RL with general cost functions and no prior knowledge of safe actions, which may be of independent interest.

Title: Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis. (arXiv:2312.14748v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14748
Code URL: null
Copy Paste: [[2312.14748]] Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis(http://arxiv.org/abs/2312.14748)
Summary:
The realm of AIOps is transforming IT landscapes with the power of AI and ML. Despite the challenge of limited labeled data, supervised models show promise, emphasizing the importance of leveraging labels for training, especially in deep learning contexts. This study enhances the field by introducing a taxonomy for log anomalies and exploring automated data labeling to mitigate labeling challenges. It goes further by investigating the potential of diverse anomaly detection techniques and their alignment with specific anomaly types. However, the exploration doesn't stop at anomaly detection. The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies. This uncharted territory holds immense potential for revolutionizing IT systems management. In essence, this paper enriches our understanding of anomaly detection, and automated labeling, and sets the stage for transformative root cause analysis. Together, these advances promise more resilient IT systems, elevating operational efficiency and user satisfaction in an ever-evolving technological landscape.

hallucination

Title: On Early Detection of Hallucinations in Factual Question Answering. (arXiv:2312.14183v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14183
Code URL: null
Copy Paste: [[2312.14183]] On Early Detection of Hallucinations in Factual Question Answering(http://arxiv.org/abs/2312.14183)
Summary:
While large language models (LLMs) have taken great strides towards helping humans with a plethora of tasks like search and summarization, hallucinations remain a major impediment towards gaining user trust. The fluency and coherence of model generations even when hallucinating makes it difficult to detect whether or not a model is hallucinating. In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations. Specifically, we probe LLMs at 1) the inputs via Integrated Gradients based token attribution, 2) the outputs via the Softmax probabilities, and 3) the internal state via self-attention and fully-connected layer activations for signs of hallucinations on open-ended question answering tasks. Our results show that the distributions of these artifacts differ between hallucinated and non-hallucinated generations. Building on this insight, we train binary classifiers that use these artifacts as input features to classify model generations into hallucinations and non-hallucinations. These hallucination classifiers achieve up to 0.80 AUROC. We further show that tokens preceding a hallucination can predict the subsequent hallucination before it occurs.

Title: Context-aware Decoding Reduces Hallucination in Query-focused Summarization. (arXiv:2312.14335v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14335
Code URL: null
Copy Paste: [[2312.14335]] Context-aware Decoding Reduces Hallucination in Query-focused Summarization(http://arxiv.org/abs/2312.14335)
Summary:
Query-focused summarization (QFS) aims to provide a summary of a single document/multi documents that can satisfy the information needs of a given query. It is useful for various real-world applications, such as abstractive snippet generation or more recent retrieval augmented generation (RAG). A prototypical QFS pipeline consists of a retriever (sparse or dense retrieval) and a generator (usually a large language model). However, applying large language models (LLM) potentially leads to hallucinations, especially when the evidence contradicts the prior belief of LLMs. There has been growing interest in developing new decoding methods to improve generation quality and reduce hallucination. In this work, we conduct a large-scale reproducibility on one recently proposed decoding method -- Context-aware Decoding (CAD). In addition to replicating CAD's experiments on news summarization datasets, we include experiments on QFS datasets, and conduct more rigorous analysis on computational complexity and hyperparameter sensitivity. Experiments with eight different language models show that performance-wise, CAD improves QFS quality by (1) reducing factuality errors/hallucinations while (2) mostly retaining the match of lexical patterns, measured by ROUGE scores, while also at a cost of increased inference-time FLOPs and reduced decoding speed. The code implementation based on Huggingface Library is made available https://github.com/zhichaoxu-shufe/context-aware-decoding-qfs

Title: Theory of Hallucinations based on Equivariance. (arXiv:2312.14504v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14504
Code URL: null
Copy Paste: [[2312.14504]] Theory of Hallucinations based on Equivariance(http://arxiv.org/abs/2312.14504)
Summary:
Equivariance is an important feature in machine learning, including language models. It ensures that any sequences of phrases with the same meanings are interpreted consistently. For example, the sentence 'There is a cat on the table' should be interpreted by language models as it is, regardless of variations in its token-level expression. Building on this insight, I propose a new theory suggesting that insufficient equivariance in language models can lead to hallucinations. According to this theory, which is both intuitive and novel, language models trained on relatively small datasets tend to misinterpret input texts and/or generate incorrect texts (i.e., hallucinations). To test this theory, I developed a toy model known as 'dancing men', which is a character-level substitution cipher. Additionally, I propose a novel technique based on the T5 (Text To Text Transfer Transformer) model to efficiently decipher these codes without relying on frequency analysis. I have found that this T5 model can almost completely solve the cipher, demonstrating its ability to acquire equivariance in this frame. This method could be scaled up to word-level and sentence-level substitution ciphers, analogous to large language models without tokenizers or dictionaries. This scalability makes it suitable for investigating the proposed link between inadequate equivariance acquisition and the emergence of hallucinations.

prompt

Title: SIG: Speaker Identification in Literature via Prompt-Based Generation. (arXiv:2312.14590v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14590
Code URL: null
Copy Paste: [[2312.14590]] SIG: Speaker Identification in Literature via Prompt-Based Generation(http://arxiv.org/abs/2312.14590)
Summary:
Identifying speakers of quotations in narratives is an important task in literary analysis, with challenging scenarios including the out-of-domain inference for unseen speakers, and non-explicit cases where there are no speaker mentions in surrounding context. In this work, we propose a simple and effective approach SIG, a generation-based method that verbalizes the task and quotation input based on designed prompt templates, which also enables easy integration of other auxiliary tasks that further bolster the speaker identification performance. The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate. Based on our approach design, SIG supports out-of-domain evaluation, and achieves open-world classification paradigm that is able to accept any forms of candidate input. We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task, where empirical results suggest that SIG outperforms previous baselines of complicated designs, as well as the zero-shot ChatGPT, especially excelling at those hard non-explicit scenarios by up to 17% improvement. Additional experiments on another dataset WP further corroborate the efficacy of SIG.

Title: Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks. (arXiv:2312.14440v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14440
Code URL: null
Copy Paste: [[2312.14440]] Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks(http://arxiv.org/abs/2312.14440)
Summary:
The widespread use of Text-to-Image (T2I) models in content generation requires careful examination of their safety, including their robustness to adversarial attacks. Despite extensive research into this, the reasons for their effectiveness are underexplored. This paper presents an empirical study on adversarial attacks against T2I models, focusing on analyzing factors associated with attack success rates (ASRs). We introduce a new attack objective - entity swapping using adversarial suffixes and two gradient-based attack algorithms. Human and automatic evaluations reveal the asymmetric nature of ASRs on entity swap: for example, it is easier to replace "human" with "robot" in the prompt "a human dancing in the rain." with an adversarial suffix but is significantly harder in reverse. We further propose probing metrics to establish indicative signals from the model's beliefs to the adversarial ASR. We identify conditions resulting in a 60% success probability for adversarial attacks and others where this likelihood drops below 5%.

Title: Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models. (arXiv:2312.14923v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14923
Code URL: null
Copy Paste: [[2312.14923]] Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models(http://arxiv.org/abs/2312.14923)
Summary:
The rapid growth of machine learning has spurred legislative initiatives such as ``the Right to be Forgotten,'' allowing users to request data removal. In response, ``machine unlearning'' proposes the selective removal of unwanted data without the need for retraining from scratch. While the Neural-Tangent-Kernel-based (NTK-based) unlearning method excels in performance, it suffers from significant computational complexity, especially for large-scale models and datasets. Our work introduces ``Fast-NTK,'' a novel NTK-based unlearning algorithm that significantly reduces the computational complexity by incorporating parameter-efficient fine-tuning methods, such as fine-tuning batch normalization layers in a CNN or visual prompts in a vision transformer. Our experimental results demonstrate scalability to much larger neural networks and datasets (e.g., 88M parameters; 5k images), surpassing the limitations of previous full-model NTK-based approaches designed for smaller cases (e.g., 8M parameters; 500 images). Notably, our approach maintains a performance comparable to the traditional method of retraining on the retain set alone. Fast-NTK can thus enable for practical and scalable NTK-based unlearning in deep neural networks.

code

Title: WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation. (arXiv:2312.14187v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14187
Code URL: null
Copy Paste: [[2312.14187]] WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation(http://arxiv.org/abs/2312.14187)
Summary:
Recent work demonstrates that, after being fine-tuned on a high-quality instruction dataset, the resulting model can obtain impressive capabilities to address a wide range of tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality. In this paper, we extend the generalization of instruction tuning by classifying the instruction data to 4 code-related tasks and propose a LLM-based Generator-Discriminator data process framework to generate diverse, high-quality instruction data from open source code. Hence, we introduce CodeOcean, a dataset comprising 20,000 instruction instances across 4 universal code-related tasks,which is aimed at augmenting the effectiveness of instruction tuning and improving the generalization ability of fine-tuned model. Subsequently, we present WaveCoder, a fine-tuned Code LLM with Widespread And Versatile Enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of Code Language Models (LLMs). Our experiments demonstrate that Wavecoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, Wavecoder exhibits high efficiency in previous code generation tasks. This paper thus offers a significant contribution to the field of instruction data generation and fine-tuning models, providing new insights and tools for enhancing performance in code-related tasks.

Title: Hierarchical Topology Isomorphism Expertise Embedded Graph Contrastive Learning. (arXiv:2312.14222v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14222
Code URL: null
Copy Paste: [[2312.14222]] Hierarchical Topology Isomorphism Expertise Embedded Graph Contrastive Learning(http://arxiv.org/abs/2312.14222)
Summary:
Graph contrastive learning (GCL) aims to align the positive features while differentiating the negative features in the latent space by minimizing a pair-wise contrastive loss. As the embodiment of an outstanding discriminative unsupervised graph representation learning approach, GCL achieves impressive successes in various graph benchmarks. However, such an approach falls short of recognizing the topology isomorphism of graphs, resulting in that graphs with relatively homogeneous node features cannot be sufficiently discriminated. By revisiting classic graph topology recognition works, we disclose that the corresponding expertise intuitively complements GCL methods. To this end, we propose a novel hierarchical topology isomorphism expertise embedded graph contrastive learning, which introduces knowledge distillations to empower GCL models to learn the hierarchical topology isomorphism expertise, including the graph-tier and subgraph-tier. On top of this, the proposed method holds the feature of plug-and-play, and we empirically demonstrate that the proposed method is universal to multiple state-of-the-art GCL models. The solid theoretical analyses are further provided to prove that compared with conventional GCL methods, our method acquires the tighter upper bound of Bayes classification error. We conduct extensive experiments on real-world benchmarks to exhibit the performance superiority of our method over candidate GCL methods, e.g., for the real-world graph representation learning experiments, the proposed method beats the state-of-the-art method by 0.23\% on unsupervised representation learning setting, 0.43\% on transfer learning setting. Our code is available at https://github.com/jyf123/HTML.

Title: ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection. (arXiv:2312.14535v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14535
Code URL: null
Copy Paste: [[2312.14535]] ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection(http://arxiv.org/abs/2312.14535)
Summary:
Graph anomaly detection is crucial for identifying nodes that deviate from regular behavior within graphs, benefiting various domains such as fraud detection and social network. Although existing reconstruction-based methods have achieved considerable success, they may face the \textit{Anomaly Overfitting} and \textit{Homophily Trap} problems caused by the abnormal patterns in the graph, breaking the assumption that normal nodes are often better reconstructed than abnormal ones. Our observations indicate that models trained on graphs with fewer anomalies exhibit higher detection performance. Based on this insight, we introduce a novel two-stage framework called Anomaly-Denoised Autoencoders for Graph Anomaly Detection (ADA-GAD). In the first stage, we design a learning-free anomaly-denoised augmentation method to generate graphs with reduced anomaly levels. We pretrain graph autoencoders on these augmented graphs at multiple levels, which enables the graph autoencoders to capture normal patterns. In the next stage, the decoders are retrained for detection on the original graph, benefiting from the multi-level representations learned in the previous stage. Meanwhile, we propose the node anomaly distribution regularization to further alleviate \textit{Anomaly Overfitting}. We validate the effectiveness of our approach through extensive experiments on both synthetic and real-world datasets.

Title: The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs. (arXiv:2312.14792v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14792
Code URL: null
Copy Paste: [[2312.14792]] The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs(http://arxiv.org/abs/2312.14792)
Summary:
The joint source coding and modulation (JSCM) framework was enabled by recent developments in deep learning, which allows to automatically learn from data, and in an end-to-end fashion, the best compression codes and modulation schemes. In this paper, we show the existence of a strict tradeoff between channel rate, distortion, perception, and classification accuracy in a JSCM scenario. We then propose two image compression methods to navigate that tradeoff: an inverse-domain generative adversarial network (ID-GAN), which achieves extreme compression, and a simpler, heuristic method that reveals insights about the performance of ID-GAN. Experiment results not only corroborate the theoretical findings, but also demonstrate that the proposed ID-GAN algorithm significantly improves system performance compared to traditional separation-based methods and recent deep JSCM architectures.

Title: TACO: Topics in Algorithmic COde generation dataset. (arXiv:2312.14852v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14852
Code URL: https://github.com/flagopen/taco
Copy Paste: [[2312.14852]] TACO: Topics in Algorithmic COde generation dataset(http://arxiv.org/abs/2312.14852)
Summary:
We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem understanding and reasoning abilities in real-world programming scenarios. There are 25433 and 1000 coding problems in training and test set, as well as up to 1.55 million diverse solution answers. Moreover, each TACO problem includes several fine-grained labels such as task topics, algorithms, programming skills, and difficulty levels, providing a more precise reference for the training and evaluation of code generation models. The dataset and evaluation scripts are available on Hugging Face Hub (https://huggingface.co/datasets/BAAI/TACO) and Github (https://github.com/FlagOpen/TACO).

Title: Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising. (arXiv:2312.14708v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14708
Code URL: https://github.com/souro/polarity-denoising-sentiment-transfer
Copy Paste: [[2312.14708]] Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising(http://arxiv.org/abs/2312.14708)
Summary:
Text sentiment transfer aims to flip the sentiment polarity of a sentence (positive to negative or vice versa) while preserving its sentiment-independent content. Although current models show good results at changing the sentiment, content preservation in transferred sentences is insufficient. In this paper, we present a sentiment transfer model based on polarity-aware denoising, which accurately controls the sentiment attributes in generated text, preserving the content to a great extent and helping to balance the style-content trade-off. Our proposed model is structured around two key stages in the sentiment transfer process: better representation learning using a shared encoder and sentiment-controlled generation using separate sentiment-specific decoders. Empirical results show that our methods outperforms state-of-the-art baselines in terms of content preservation while staying competitive in terms of style transfer accuracy and fluency.

Title: Semantic Parsing for Complex Data Retrieval: Targeting Query Plans vs. SQL for No-Code Access to Relational Databases. (arXiv:2312.14798v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14798
Code URL: null
Copy Paste: [[2312.14798]] Semantic Parsing for Complex Data Retrieval: Targeting Query Plans vs(http://arxiv.org/abs/2312.14798)
Summary:
Large Language Models (LLMs) have spurred progress in text-to-SQL, the task of generating SQL queries from natural language questions based on a given database schema. Despite the declarative nature of SQL, it continues to be a complex programming language. In this paper, we investigate the potential of an alternative query language with simpler syntax and modular specification of complex queries. The purpose is to create a query language that can be learned more easily by modern neural semantic parsing architectures while also enabling non-programmers to better assess the validity of the query plans produced by an interactive query plan assistant.

The proposed alternative query language is called Query Plan Language (QPL). It is designed to be modular and can be translated into a restricted form of SQL Common Table Expressions (CTEs). The aim of QPL is to make complex data retrieval accessible to non-programmers by allowing users to express their questions in natural language while also providing an easier-to-verify target language. The paper demonstrates how neural LLMs can benefit from QPL's modularity to generate complex query plans in a compositional manner. This involves a question decomposition strategy and a planning stage.

We conduct experiments on a version of the Spider text-to-SQL dataset that has been converted to QPL. The hierarchical structure of QPL programs enables us to measure query complexity naturally. Based on this assessment, we identify the low accuracy of existing text-to-SQL systems on complex compositional queries. We present ways to address the challenge of complex queries in an iterative, user-controlled manner, using fine-tuned LLMs and a variety of prompting strategies in a compositional manner.

Title: Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective. (arXiv:2312.14329v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14329
Code URL: https://github.com/joaocarv/invariant-anomaly-detection
Copy Paste: [[2312.14329]] Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective(http://arxiv.org/abs/2312.14329)
Summary:
Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts. We begin by elucidating a simple yet necessary statistical property that ensures invariant representations, which is critical for robust AD under both domain and covariate shifts. From this property, we derive a regularization term which, when minimized, leads to partial distribution invariance across environments. Through extensive experimental evaluation on both synthetic and real-world tasks, covering a range of six different AD methods, we demonstrated significant improvements in out-of-distribution performance. Under both covariate and domain shift, models regularized with our proposed term showed marked increased robustness. Code is available at: https://github.com/JoaoCarv/invariant-anomaly-detection.

Title: PUMA: Efficient Continual Graph Learning with Graph Condensation. (arXiv:2312.14439v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14439
Code URL: https://github.com/superallen13/puma
Copy Paste: [[2312.14439]] PUMA: Efficient Continual Graph Learning with Graph Condensation(http://arxiv.org/abs/2312.14439)
Summary:
When handling streaming graphs, existing graph representation learning models encounter a catastrophic forgetting problem, where previously learned knowledge of these models is easily overwritten when learning with newly incoming graphs. In response, Continual Graph Learning emerges as a novel paradigm enabling graph representation learning from static to streaming graphs. Our prior work, CaT is a replay-based framework with a balanced continual learning procedure, which designs a small yet effective memory bank for replaying data by condensing incoming graphs. Although the CaT alleviates the catastrophic forgetting problem, there exist three issues: (1) The graph condensation algorithm derived in CaT only focuses on labelled nodes while neglecting abundant information carried by unlabelled nodes; (2) The continual training scheme of the CaT overemphasises on the previously learned knowledge, limiting the model capacity to learn from newly added memories; (3) Both the condensation process and replaying process of the CaT are time-consuming. In this paper, we propose a psudo-label guided memory bank (PUMA) CGL framework, extending from the CaT to enhance its efficiency and effectiveness by overcoming the above-mentioned weaknesses and limits. To fully exploit the information in a graph, PUMA expands the coverage of nodes during graph condensation with both labelled and unlabelled nodes. Furthermore, a training-from-scratch strategy is proposed to upgrade the previous continual learning scheme for a balanced training between the historical and the new graphs. Besides, PUMA uses a one-time prorogation and wide graph encoders to accelerate the graph condensation and the graph encoding process in the training stage to improve the efficiency of the whole framework. Extensive experiments on four datasets demonstrate the state-of-the-art performance and efficiency over existing methods.

Title: SAVAE: Leveraging the variational Bayes autoencoder for survival analysis. (arXiv:2312.14651v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14651
Code URL: https://github.com/patricia-a-apellaniz/savae
Copy Paste: [[2312.14651]] SAVAE: Leveraging the variational Bayes autoencoder for survival analysis(http://arxiv.org/abs/2312.14651)
Summary:
As in many fields of medical research, survival analysis has witnessed a growing interest in the application of deep learning techniques to model complex, high-dimensional, heterogeneous, incomplete, and censored medical data. Current methods often make assumptions about the relations between data that may not be valid in practice. In response, we introduce SAVAE (Survival Analysis Variational Autoencoder), a novel approach based on Variational Autoencoders. SAVAE contributes significantly to the field by introducing a tailored ELBO formulation for survival analysis, supporting various parametric distributions for covariates and survival time (as long as the log-likelihood is differentiable). It offers a general method that consistently performs well on various metrics, demonstrating robustness and stability through different experiments. Our proposal effectively estimates time-to-event, accounting for censoring, covariate interactions, and time-varying risk associations. We validate our model in diverse datasets, including genomic, clinical, and demographic data, with varying levels of censoring. This approach demonstrates competitive performance compared to state-of-the-art techniques, as assessed by the Concordance Index and the Integrated Brier Score. SAVAE also offers an interpretable model that parametrically models covariates and time. Moreover, its generative architecture facilitates further applications such as clustering, data imputation, and the generation of synthetic patient data through latent space inference from survival data.

Title: Spatiotemporal-Linear: Towards Universal Multivariate Time Series Forecasting. (arXiv:2312.14869v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14869
Code URL: null
Copy Paste: [[2312.14869]] Spatiotemporal-Linear: Towards Universal Multivariate Time Series Forecasting(http://arxiv.org/abs/2312.14869)
Summary:
Within the field of complicated multivariate time series forecasting (TSF), popular techniques frequently rely on intricate deep learning architectures, ranging from transformer-based designs to recurrent neural networks. However, recent findings suggest that simple Linear models can surpass sophisticated constructs on diverse datasets. These models directly map observation to multiple future time steps, thereby minimizing error accumulation in iterative multi-step prediction. Yet, these models fail to incorporate spatial and temporal information within the data, which is critical for capturing patterns and dependencies that drive insightful predictions. This oversight often leads to performance bottlenecks, especially under specific sequence lengths and dataset conditions, preventing their universal application. In response, we introduce the SpatioTemporal-Linear (STL) framework. STL seamlessly integrates time-embedded and spatially-informed bypasses to augment the Linear-based architecture. These extra routes offer a more robust and refined regression to the data, particularly when the amount of observation is limited and the capacity of simple linear layers to capture dependencies declines. Empirical evidence highlights STL's prowess, outpacing both Linear and Transformer benchmarks across varied observation and prediction durations and datasets. Such robustness accentuates its suitability across a spectrum of applications, including but not limited to, traffic trajectory and rare disease progression forecasting. Through this discourse, we not only validate the STL's distinctive capacities to become a more general paradigm in multivariate time-series prediction using deep-learning techniques but also stress the need to tackle data-scarce prediction scenarios for universal application. Code will be made available.

chat

Title: Aurora:Activating Chinese chat capability for Mistral-8x7B sparse Mixture-of-Experts through Instruction-Tuning. (arXiv:2312.14557v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14557
Code URL: https://github.com/WangRongsheng/Aurora
Copy Paste: [[2312.14557]] Aurora:Activating Chinese chat capability for Mistral-8x7B sparse Mixture-of-Experts through Instruction-Tuning(http://arxiv.org/abs/2312.14557)
Summary:
Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture. Our code, data and model are publicly available at: https://github.com/WangRongsheng/Aurora

retrieval augmented generation

rag

Title: Auto311: A Confidence-guided Automated System for Non-emergency Call. (arXiv:2312.14185v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14185
Code URL: null
Copy Paste: [[2312.14185]] Auto311: A Confidence-guided Automated System for Non-emergency Call(http://arxiv.org/abs/2312.14185)
Summary:
Emergency and non-emergency response systems are essential services provided by local governments and critical to protecting lives, the environment, and property. The effective handling of (non-)emergency calls is critical for public safety and well-being. By reducing the burden through non-emergency callers, residents in critical need of assistance through 911 will receive a fast and effective response. Collaborating with the Department of Emergency Communications (DEC) in Nashville, we analyzed 11,796 non-emergency call recordings and developed Auto311, the first automated system to handle 311 non-emergency calls, which (1) effectively and dynamically predicts ongoing non-emergency incident types to generate tailored case reports during the call; (2) itemizes essential information from dialogue contexts to complete the generated reports; and (3) strategically structures system-caller dialogues with optimized confidence. We used real-world data to evaluate the system's effectiveness and deployability. The experimental results indicate that the system effectively predicts incident type with an average F-1 score of 92.54%. Moreover, the system successfully itemizes critical information from relevant contexts to complete reports, evincing a 0.93 average consistency score compared to the ground truth. Additionally, emulations demonstrate that the system effectively decreases conversation turns as the utterance size gets more extensive and categorizes the ongoing call with 94.49% mean accuracy.

Title: Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI. (arXiv:2312.14229v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14229
Code URL: null
Copy Paste: [[2312.14229]] Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI(http://arxiv.org/abs/2312.14229)
Summary:
With the wide adoption of AI applications, there is a pressing need of enabling real-time neural network (NN) inference on small embedded devices, but deploying NNs and achieving high performance of NN inference on these small devices is challenging due to their extremely weak capabilities. Although NN partitioning and offloading can contribute to such deployment, they are incapable of minimizing the local costs at embedded devices. Instead, we suggest to address this challenge via agile NN offloading, which migrates the required computations in NN offloading from online inference to offline learning. In this paper, we present AgileNN, a new NN offloading technique that achieves real-time NN inference on weak embedded devices by leveraging eXplainable AI techniques, so as to explicitly enforce feature sparsity during the training phase and minimize the online computation and communication costs. Experiment results show that AgileNN's inference latency is >6x lower than the existing schemes, ensuring that sensory data on embedded devices can be timely consumed. It also reduces the local device's resource consumption by >8x, without impairing the inference accuracy.

Title: Adaptive Reconvergence-driven AIG Rewriting via Strategy Learning. (arXiv:2312.14536v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14536
Code URL: null
Copy Paste: [[2312.14536]] Adaptive Reconvergence-driven AIG Rewriting via Strategy Learning(http://arxiv.org/abs/2312.14536)
Summary:
Rewriting is a common procedure in logic synthesis aimed at improving the performance, power, and area (PPA) of circuits. The traditional reconvergence-driven And-Inverter Graph (AIG) rewriting method focuses solely on optimizing the reconvergence cone through Boolean algebra minimization. However, there exist opportunities to incorporate other node-rewriting algorithms that are better suited for specific cones. In this paper, we propose an adaptive reconvergence-driven AIG rewriting algorithm that combines two key techniques: multi-strategy-based AIG rewriting and strategy learning-based algorithm selection. The multi-strategy-based rewriting method expands upon the traditional approach by incorporating support for multi-node-rewriting algorithms, thus expanding the optimization space. Additionally, the strategy learning-based algorithm selection method determines the most suitable node-rewriting algorithm for a given cone. Experimental results demonstrate that our proposed method yields a significant average improvement of 5.567\% in size and 5.327\% in depth.

Title: Collaborative Synthesis of Patient Records through Multi-Visit Health State Inference. (arXiv:2312.14646v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14646
Code URL: https://github.com/p1nksnow/MSIC
Copy Paste: [[2312.14646]] Collaborative Synthesis of Patient Records through Multi-Visit Health State Inference(http://arxiv.org/abs/2312.14646)
Summary:
Electronic health records (EHRs) have become the foundation of machine learning applications in healthcare, while the utility of real patient records is often limited by privacy and security concerns. Synthetic EHR generation provides an additional perspective to compensate for this limitation. Most existing methods synthesize new records based on real EHR data, without consideration of different types of events in EHR data, which cannot control the event combinations in line with medical common sense. In this paper, we propose MSIC, a Multi-visit health Status Inference model for Collaborative EHR synthesis to address these limitations. First, we formulate the synthetic EHR generation process as a probabilistic graphical model and tightly connect different types of events by modeling the latent health states. Then, we derive a health state inference method tailored for the multi-visit scenario to effectively utilize previous records to synthesize current and future records. Furthermore, we propose to generate medical reports to add textual descriptions for each medical event, providing broader applications for synthesized EHR data. For generating different paragraphs in each visit, we incorporate a multi-generator deliberation framework to collaborate the message passing of multiple generators and employ a two-phase decoding strategy to generate high-quality reports. Our extensive experiments on the widely used benchmarks, MIMIC-III and MIMIC-IV, demonstrate that MSIC advances state-of-the-art results on the quality of synthetic data while maintaining low privacy risks.

Title: Automatic Data Retrieval for Cross Lingual Summarization. (arXiv:2312.14542v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.14542
Code URL: null
Copy Paste: [[2312.14542]] Automatic Data Retrieval for Cross Lingual Summarization(http://arxiv.org/abs/2312.14542)
Summary:
Cross-lingual summarization involves the summarization of text written in one language to a different one. There is a body of research addressing cross-lingual summarization from English to other European languages. In this work, we aim to perform cross-lingual summarization from English to Hindi. We propose pairing up the coverage of newsworthy events in textual and video format can prove to be helpful for data acquisition for cross lingual summarization. We analyze the data and propose methods to match articles to video descriptions that serve as document and summary pairs. We also outline filtering methods over reasonable thresholds to ensure the correctness of the summaries. Further, we make available 28,583 mono and cross-lingual article-summary pairs https://github.com/tingc9/Cross-Sum-News-Aligned. We also build and analyze multiple baselines on the collected data and report error analysis.

Title: Clustering and Uncertainty Analysis to Improve the Machine Learning-based Predictions of SAFARI-1 Control Follower Assembly Axial Neutron Flux Profiles. (arXiv:2312.14193v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14193
Code URL: null
Copy Paste: [[2312.14193]] Clustering and Uncertainty Analysis to Improve the Machine Learning-based Predictions of SAFARI-1 Control Follower Assembly Axial Neutron Flux Profiles(http://arxiv.org/abs/2312.14193)
Summary:
The goal of this work is to develop accurate Machine Learning (ML) models for predicting the assembly axial neutron flux profiles in the SAFARI-1 research reactor, trained by measurement data from historical cycles. The data-driven nature of ML models makes them susceptible to uncertainties which are introduced by sources such as noise in training data, incomplete coverage of the domain, extrapolation and imperfect model architectures. To this end, we also aim at quantifying the approximation uncertainties of the ML model predictions. Previous work using Deep Neural Networks (DNNs) has been successful for fuel assemblies in SAFARI-1, however, not as accurate for control follower assemblies. The aim of this work is to improve the ML models for the control assemblies by a combination of supervised and unsupervised ML algorithms. The $k$-means and Affinity Propagation unsupervised ML algorithms are employed to identify clusters in the set of the measured axial neutron flux profiles. Then, regression-based supervised ML models using DNN (with prediction uncertainties quantified with Monte Carlo dropout) and Gaussian Process (GP) are trained for different clusters and the prediction uncertainty is estimated. It was found that applying the proposed procedure improves the prediction accuracy for the control assemblies and reduces the prediction uncertainty. Flux shapes predicted by DNN and GP are very close, and the overall accuracy became comparable to the fuel assemblies. The prediction uncertainty is however smaller for GP models.

Title: Fine-grained Forecasting Models Via Gaussian Process Blurring Effect. (arXiv:2312.14280v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14280
Code URL: https://github.com/sepkfr/fine-grained-gaussian-process-forcating
Copy Paste: [[2312.14280]] Fine-grained Forecasting Models Via Gaussian Process Blurring Effect(http://arxiv.org/abs/2312.14280)
Summary:
Time series forecasting is a challenging task due to the existence of complex and dynamic temporal dependencies. This can lead to incorrect predictions by even the best forecasting models. Using more training data is one way to improve the accuracy, but this source is often limited. In contrast, we are building on successful denoising approaches for image generation by advocating for an end-to-end forecasting and denoising paradigm.

We propose an end-to-end forecast-blur-denoise forecasting framework by encouraging a division of labors between the forecasting and the denoising models. The initial forecasting model is directed to focus on accurately predicting the coarse-grained behavior, while the denoiser model focuses on capturing the fine-grained behavior that is locally blurred by integrating a Gaussian Process model. All three parts are interacting for the best end-to-end performance. Our extensive experiments demonstrate that our proposed approach is able to improve the forecasting accuracy of several state-of-the-art forecasting models as well as several other denoising approaches.

Title: Federated Quantum Long Short-term Memory (FedQLSTM). (arXiv:2312.14309v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14309
Code URL: null
Copy Paste: [[2312.14309]] Federated Quantum Long Short-term Memory (FedQLSTM)(http://arxiv.org/abs/2312.14309)
Summary:
Quantum federated learning (QFL) can facilitate collaborative learning across multiple clients using quantum machine learning (QML) models, while preserving data privacy. Although recent advances in QFL span different tasks like classification while leveraging several data types, no prior work has focused on developing a QFL framework that utilizes temporal data to approximate functions useful to analyze the performance of distributed quantum sensing networks. In this paper, a novel QFL framework that is the first to integrate quantum long short-term memory (QLSTM) models with temporal data is proposed. The proposed federated QLSTM (FedQLSTM) framework is exploited for performing the task of function approximation. In this regard, three key use cases are presented: Bessel function approximation, sinusoidal delayed quantum feedback control function approximation, and Struve function approximation. Simulation results confirm that, for all considered use cases, the proposed FedQLSTM framework achieves a faster convergence rate under one local training epoch, minimizing the overall computations, and saving 25-33% of the number of communication rounds needed until convergence compared to an FL framework with classical LSTM models.

Title: Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations. (arXiv:2312.14359v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14359
Code URL: null
Copy Paste: [[2312.14359]] Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations(http://arxiv.org/abs/2312.14359)
Summary:
Today's most powerful machine learning approaches are typically designed to train stateless architectures with predefined layers and differentiable activation functions. While these approaches have led to unprecedented successes in areas such as natural language processing and image recognition, the trained models are also susceptible to making mistakes that a human would not. In this paper, we take the view that true intelligence may require the ability of a machine learning model to manage internal state, but that we have not yet discovered the most effective algorithms for training such models. We further postulate that such algorithms might not necessarily be based on gradient descent over a deep architecture, but rather, might work best with an architecture that has discrete activations and few initial topological constraints (such as multiple predefined layers). We present one attempt in our ongoing efforts to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights, and show that it is able to form useful representations of natural language text, but is also limited in its ability to leverage large quantities of training data. We then provide ideas for improving the algorithm and for designing other training algorithms for similar architectures. Finally, we discuss potential benefits that could be gained if an effective training algorithm is found, and suggest experiments for evaluating whether these benefits exist in practice.

Title: Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits. (arXiv:2312.14405v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14405
Code URL: null
Copy Paste: [[2312.14405]] Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits(http://arxiv.org/abs/2312.14405)
Summary:
In recent years, analog circuits have received extensive attention and are widely used in many emerging applications. The high demand for analog circuits necessitates shorter circuit design cycles. To achieve the desired performance and specifications, various geometrical symmetry constraints must be carefully considered during the analog layout process. However, the manual labeling of these constraints by experienced analog engineers is a laborious and time-consuming process. To handle the costly runtime issue, we propose a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout. The proposed framework leverages the connection characteristics of circuits and the devices'information to learn the general rules of symmetric constraints, which effectively facilitates the extraction of device-level constraints on circuit netlists. The experimental results demonstrate that compared to state-of-the-art symmetric constraint detection approaches, our framework achieves higher accuracy and lower false positive rate.

Title: Room Occupancy Prediction: Exploring the Power of Machine Learning and Temporal Insights. (arXiv:2312.14426v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14426
Code URL: null
Copy Paste: [[2312.14426]] Room Occupancy Prediction: Exploring the Power of Machine Learning and Temporal Insights(http://arxiv.org/abs/2312.14426)
Summary:
Energy conservation in buildings is a paramount concern to combat greenhouse gas emissions and combat climate change. The efficient management of room occupancy, involving actions like lighting control and climate adjustment, is a pivotal strategy to curtail energy consumption. In contexts where surveillance technology isn't viable, non-intrusive sensors are employed to estimate room occupancy. In this study, we present a predictive framework for room occupancy that leverages a diverse set of machine learning models, with Random Forest consistently achieving the highest predictive accuracy. Notably, this dataset encompasses both temporal and spatial dimensions, revealing a wealth of information. Intriguingly, our framework demonstrates robust performance even in the absence of explicit temporal modeling. These findings underscore the remarkable predictive power of traditional machine learning models. The success can be attributed to the presence of feature redundancy, the simplicity of linear spatial and temporal patterns, and the advantages of high-frequency data sampling. While these results are compelling, it's essential to remain open to the possibility that explicitly modeling the temporal dimension could unlock deeper insights or further enhance predictive capabilities in specific scenarios. In summary, our research not only validates the effectiveness of our prediction framework for continuous and classification tasks but also underscores the potential for improvements through the inclusion of temporal aspects. The study highlights the promise of machine learning in shaping energy-efficient practices and room occupancy management.

Title: How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?. (arXiv:2312.14452v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14452
Code URL: null
Copy Paste: [[2312.14452]] How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection?(http://arxiv.org/abs/2312.14452)
Summary:
Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.

Title: Non-Denoising Forward-Time Diffusions. (arXiv:2312.14589v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14589
Code URL: null
Copy Paste: [[2312.14589]] Non-Denoising Forward-Time Diffusions(http://arxiv.org/abs/2312.14589)
Summary:
The scope of this paper is generative modeling through diffusion processes. An approach falling within this paradigm is the work of Song et al. (2021), which relies on a time-reversal argument to construct a diffusion process targeting the desired data distribution. We show that the time-reversal argument, common to all denoising diffusion probabilistic modeling proposals, is not necessary. We obtain diffusion processes targeting the desired data distribution by taking appropriate mixtures of diffusion bridges. The resulting transport is exact by construction, allows for greater flexibility in choosing the dynamics of the underlying diffusion, and can be approximated by means of a neural network via novel training objectives. We develop a unifying view of the drift adjustments corresponding to our and to time-reversal approaches and make use of this representation to inspect the inner workings of diffusion-based generative models. Finally, we leverage on scalable simulation and inference techniques common in spatial statistics to move beyond fully factorial distributions in the underlying diffusion dynamics. The methodological advances contained in this work contribute toward establishing a general framework for generative modeling based on diffusion processes.

multi-run

chain-of-thought

tree-of-thought

agent

Title: Benchmarking Multi-Agent Preference-based Reinforcement Learning for Human-AI Teaming. (arXiv:2312.14292v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14292
Code URL: null
Copy Paste: [[2312.14292]] Benchmarking Multi-Agent Preference-based Reinforcement Learning for Human-AI Teaming(http://arxiv.org/abs/2312.14292)
Summary:
Preference-based Reinforcement Learning (PbRL) is an active area of research, and has made significant strides in single-agent actor and in observer human-in-the-loop scenarios. However, its application within the co-operative multi-agent RL frameworks, where humans actively participate and express preferences for agent behavior, remains largely uncharted. We consider a two-agent (Human-AI) cooperative setup where both the agents are rewarded according to human's reward function for the team. However, the agent does not have access to it, and instead, utilizes preference-based queries to elicit its objectives and human's preferences for the robot in the human-robot team. We introduce the notion of Human-Flexibility, i.e. whether the human partner is amenable to multiple team strategies, with a special case being Specified Orchestration where the human has a single team policy in mind (most constrained case). We propose a suite of domains to study PbRL for Human-AI cooperative setup which explicitly require forced cooperation. Adapting state-of-the-art single-agent PbRL algorithms to our two-agent setting, we conduct a comprehensive benchmarking study across our domain suite. Our findings highlight the challenges associated with high degree of Human-Flexibility and the limited access to the human's envisioned policy in PbRL for Human-AI cooperation. Notably, we observe that PbRL algorithms exhibit effective performance exclusively in the case of Specified Orchestration which can be seen as an upper bound PbRL performance for future research.

Title: AdapTraj: A Multi-Source Domain Generalization Framework for Multi-Agent Trajectory Prediction. (arXiv:2312.14394v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14394
Code URL: null
Copy Paste: [[2312.14394]] AdapTraj: A Multi-Source Domain Generalization Framework for Multi-Agent Trajectory Prediction(http://arxiv.org/abs/2312.14394)
Summary:
Multi-agent trajectory prediction, as a critical task in modeling complex interactions of objects in dynamic systems, has attracted significant research attention in recent years. Despite the promising advances, existing studies all follow the assumption that data distribution observed during model learning matches that encountered in real-world deployments. However, this assumption often does not hold in practice, as inherent distribution shifts might exist in the mobility patterns for deployment environments, thus leading to poor domain generalization and performance degradation. Consequently, it is appealing to leverage trajectories from multiple source domains to mitigate such discrepancies for multi-agent trajectory prediction task. However, the development of multi-source domain generalization in this task presents two notable issues: (1) negative transfer; (2) inadequate modeling for external factors. To address these issues, we propose a new causal formulation to explicitly model four types of features: domain-invariant and domain-specific features for both the focal agent and neighboring agents. Building upon the new formulation, we propose AdapTraj, a multi-source domain generalization framework specifically tailored for multi-agent trajectory prediction. AdapTraj serves as a plug-and-play module that is adaptable to a variety of models. Extensive experiments on four datasets with different domains demonstrate that AdapTraj consistently outperforms other baselines by a substantial margin.

Title: The Fairness Fair: Bringing Human Perception into Collective Decision-Making. (arXiv:2312.14402v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14402
Code URL: null
Copy Paste: [[2312.14402]] The Fairness Fair: Bringing Human Perception into Collective Decision-Making(http://arxiv.org/abs/2312.14402)
Summary:
Fairness is one of the most desirable societal principles in collective decision-making. It has been extensively studied in the past decades for its axiomatic properties and has received substantial attention from the multiagent systems community in recent years for its theoretical and computational aspects in algorithmic decision-making. However, these studies are often not sufficiently rich to capture the intricacies of human perception of fairness in the ambivalent nature of the real-world problems. We argue that not only fair solutions should be deemed desirable by social planners (designers), but they should be governed by human and societal cognition, consider perceived outcomes based on human judgement, and be verifiable. We discuss how achieving this goal requires a broad transdisciplinary approach ranging from computing and AI to behavioral economics and human-AI interaction. In doing so, we identify shortcomings and long-term challenges of the current literature of fair division, describe recent efforts in addressing them, and more importantly, highlight a series of open research directions.

Title: Hierarchical Multi-Agent Reinforcement Learning for Assessing False-Data Injection Attacks on Transportation Networks. (arXiv:2312.14625v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14625
Code URL: null
Copy Paste: [[2312.14625]] Hierarchical Multi-Agent Reinforcement Learning for Assessing False-Data Injection Attacks on Transportation Networks(http://arxiv.org/abs/2312.14625)
Summary:
The increasing reliance of drivers on navigation applications has made transportation networks more susceptible to data-manipulation attacks by malicious actors. Adversaries may exploit vulnerabilities in the data collection or processing of navigation services to inject false information, and to thus interfere with the drivers' route selection. Such attacks can significantly increase traffic congestions, resulting in substantial waste of time and resources, and may even disrupt essential services that rely on road networks. To assess the threat posed by such attacks, we introduce a computational framework to find worst-case data-injection attacks against transportation networks. First, we devise an adversarial model with a threat actor who can manipulate drivers by increasing the travel times that they perceive on certain roads. Then, we employ hierarchical multi-agent reinforcement learning to find an approximate optimal adversarial strategy for data manipulation. We demonstrate the applicability of our approach through simulating attacks on the Sioux Falls, ND network topology.

Title: Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning. (arXiv:2312.14878v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.14878
Code URL: null
Copy Paste: [[2312.14878]] Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning(http://arxiv.org/abs/2312.14878)
Summary:
A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.

Title: Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels. (arXiv:2312.14259v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.14259
Code URL: null
Copy Paste: [[2312.14259]] Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels(http://arxiv.org/abs/2312.14259)
Summary:
Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by communication delays and/or channel noise. This results in agents possibly not receiving the intended action from the learner, subsequently leading to misguided feedback. In this paper, we introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels with different action erasure probabilities. We illustrate that, in contrast to existing bandit algorithms, which experience linear regret, our algorithms assure sub-linear regret guarantees. Our proposed solutions are founded on a meticulously crafted repetition protocol and scheduling of learning across heterogeneous channels. To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels. We substantiate the superior performance of our algorithm through numerical experiments, emphasizing their practical significance in addressing issues related to communication constraints and delays in multi-agent environments.