language model

Title: Agent-OM: Leveraging Large Language Models for Ontology Matching. (arXiv:2312.00326v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.00326
Code URL: null
Copy Paste: [[2312.00326]] Agent-OM: Leveraging Large Language Models for Ontology Matching(http://arxiv.org/abs/2312.00326)
Summary:
Ontology matching (OM) enables semantic interoperability between different ontologies and resolves their conceptual heterogeneity by aligning related entities. OM systems currently have two prevailing design paradigms: conventional knowledge-based expert systems and newer machine learning-based predictive systems. While large language models (LLMs) and LLM-based agents have become revolutionary in data engineering and have been applied creatively in various domains, their potential for OM remains underexplored. This study introduces a novel agent-powered LLM-based design paradigm for OM systems. With thoughtful consideration of several specific challenges to leverage LLMs for OM, we propose a generic framework, namely Agent-OM, consisting of two Siamese agents for retrieval and matching, with a set of simple prompt-based OM tools. Our framework is implemented in a proof-of-concept system. Evaluations of three Ontology Alignment Evaluation Initiative (OAEI) tracks over state-of-the-art OM systems show that our system can achieve very close results to the best long-standing performance on simple OM tasks and significantly improve the performance on complex and few-shot OM tasks.

Title: On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs. (arXiv:2312.00353v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00353
Code URL: null
Copy Paste: [[2312.00353]] On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs(http://arxiv.org/abs/2312.00353)
Summary:
This paper examines the capacity of LLMs to reason with knowledge graphs using their internal knowledge graph, i.e., the knowledge graph they learned during pre-training. Two research questions are formulated to investigate the accuracy of LLMs in recalling information from pre-training knowledge graphs and their ability to infer knowledge graph relations from context. To address these questions, we employ LLMs to perform four distinct knowledge graph reasoning tasks. Furthermore, we identify two types of hallucinations that may occur during knowledge reasoning with LLMs: content and ontology hallucination. Our experimental results demonstrate that LLMs can successfully tackle both simple and complex knowledge graph reasoning tasks from their own memory, as well as infer from input context.

Title: A Bayesian approach for prompt optimization in pre-trained language models. (arXiv:2312.00471v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00471
Code URL: null
Copy Paste: [[2312.00471]] A Bayesian approach for prompt optimization in pre-trained language models(http://arxiv.org/abs/2312.00471)
Summary:
A prompt is a sequence of symbol or tokens, selected from a vocabulary according to some rule, which is prepended/concatenated to a textual query. A key problem is how to select the sequence of tokens: in this paper we formulate it as a combinatorial optimization problem. The high dimensionality of the token space com-pounded by the length of the prompt sequence requires a very efficient solution. In this paper we propose a Bayesian optimization method, executed in a continuous em-bedding of the combinatorial space. In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out requiring access to the large language model (LLM) and can be used also when LLM is available only as a black-box. This is critically important if LLMs are made available in the Model as a Service (MaaS) manner as in GPT-4. The current manu-script is focused on the optimization of discrete prompts for classification tasks. The discrete prompts give rise to difficult combinatorial optimization problem which easily become intractable given the dimension of the token space in realistic applications. The optimization method considered in this paper is Bayesian optimization (BO) which has become the dominant approach in black-box optimization for its sample efficiency along with its modular structure and versatility. In this paper we use BoTorch, a library for Bayesian optimization research built on top of pyTorch. Albeit preliminary and obtained using a 'vanilla' version of BO, the experiments on RoB-ERTa on six benchmarks, show a good performance across a variety of tasks and enable an analysis of the tradeoff between size of the search space, accuracy and wall clock time.

Title: SurreyAI 2023 Submission for the Quality Estimation Shared Task. (arXiv:2312.00525v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00525
Code URL: null
Copy Paste: [[2312.00525]] SurreyAI 2023 Submission for the Quality Estimation Shared Task(http://arxiv.org/abs/2312.00525)
Summary:
Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available. This paper describes the approach adopted by the SurreyAI team for addressing the Sentence-Level Direct Assessment shared task in WMT23. The proposed approach builds upon the TransQuest framework, exploring various autoencoder pre-trained language models within the MonoTransQuest architecture using single and ensemble settings. The autoencoder pre-trained language models employed in the proposed systems are XLMV, InfoXLM-large, and XLMR-large. The evaluation utilizes Spearman and Pearson correlation coefficients, assessing the relationship between machine-predicted quality scores and human judgments for 5 language pairs (English-Gujarati, English-Hindi, English-Marathi, English-Tamil and English-Telugu). The MonoTQ-InfoXLM-large approach emerges as a robust strategy, surpassing all other individual models proposed in this study by significantly improving over the baseline for the majority of the language pairs.

Title: Questioning Biases in Case Judgment Summaries: Legal Datasets or Large Language Models?. (arXiv:2312.00554v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00554
Code URL: null
Copy Paste: [[2312.00554]] Questioning Biases in Case Judgment Summaries: Legal Datasets or Large Language Models?(http://arxiv.org/abs/2312.00554)
Summary:
The evolution of legal datasets and the advent of large language models (LLMs) have significantly transformed the legal field, particularly in the generation of case judgment summaries. However, a critical concern arises regarding the potential biases embedded within these summaries. This study scrutinizes the biases present in case judgment summaries produced by legal datasets and large language models. The research aims to analyze the impact of biases on legal decision making. By interrogating the accuracy, fairness, and implications of biases in these summaries, this study contributes to a better understanding of the role of technology in legal contexts and the implications for justice systems worldwide. In this study, we investigate biases wrt Gender-related keywords, Race-related keywords, Keywords related to crime against women, Country names and religious keywords. The study shows interesting evidences of biases in the outputs generated by the large language models and pre-trained abstractive summarization models. The reasoning behind these biases needs further studies.

Title: Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals. (arXiv:2312.00751v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00751
Code URL: null
Copy Paste: [[2312.00751]] Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals(http://arxiv.org/abs/2312.00751)
Summary:
Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. We then propose a novel regularizer that penalizes the norm of the difference between the smooth output tokens from self-attention and the input tokens to preserve the fidelity of the tokens. Minimizing the resulting regularized energy functional, we derive the Neural Transformer with a Regularized Nonlocal Functional (NeuTRENO), a novel class of transformer models that can mitigate the over-smoothing issue. We empirically demonstrate the advantages of NeuTRENO over the baseline transformers and state-of-the-art methods in reducing the over-smoothing of token representations on various practical tasks, including object classification, image segmentation, and language modeling.

Title: SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection. (arXiv:2312.00292v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00292
Code URL: null
Copy Paste: [[2312.00292]] SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection(http://arxiv.org/abs/2312.00292)
Summary:
Deception is the intentional practice of twisting information. It is a nuanced societal practice deeply intertwined with human societal evolution, characterized by a multitude of facets. This research explores the problem of deception through the lens of psychology, employing a framework that categorizes deception into three forms: lies of omission, lies of commission, and lies of influence. The primary focus of this study is specifically on investigating only lies of omission. We propose a novel framework for deception detection leveraging NLP techniques. We curated an annotated dataset of 876,784 samples by amalgamating a popular large-scale fake news dataset and scraped news headlines from the Twitter handle of Times of India, a well-known Indian news media house. Each sample has been labeled with four layers, namely: (i) the type of omission (speculation, bias, distortion, sounds factual, and opinion), (ii) colors of lies(black, white, etc), and (iii) the intention of such lies (to influence, etc) (iv) topic of lies (political, educational, religious, etc). We present a novel multi-task learning pipeline that leverages the dataless merging of fine-tuned language models to address the deception detection task mentioned earlier. Our proposed model achieved an F1 score of 0.87, demonstrating strong performance across all layers including the type, color, intent, and topic aspects of deceptive content. Finally, our research explores the relationship between lies of omission and propaganda techniques. To accomplish this, we conducted an in-depth analysis, uncovering compelling findings. For instance, our analysis revealed a significant correlation between loaded language and opinion, shedding light on their interconnectedness. To encourage further research in this field, we will be making the models and dataset available with the MIT License, making it favorable for open-source research.

Title: CoLLiE: Collaborative Training of Large Language Models in an Efficient Way. (arXiv:2312.00407v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00407
Code URL: https://github.com/openlmlab/collie
Copy Paste: [[2312.00407]] CoLLiE: Collaborative Training of Large Language Models in an Efficient Way(http://arxiv.org/abs/2312.00407)
Summary:
Large language models (LLMs) are increasingly pivotal in a wide range of natural language processing tasks. Access to pre-trained models, courtesy of the open-source community, has made it possible to adapt these models to specific applications for enhanced performance. However, the substantial resources required for training these models necessitate efficient solutions. This paper introduces CoLLiE, an efficient library that facilitates collaborative training of large language models using 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion, Adan, Sophia, LOMO and AdaLomo. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization. CoLLiE has proven superior training efficiency in comparison with prevalent solutions in pre-training and fine-tuning scenarios. Furthermore, we provide an empirical evaluation of the correlation between model size and GPU memory consumption under different optimization methods, as well as an analysis of the throughput. Lastly, we carry out a comprehensive comparison of various optimizers and PEFT methods within the instruction-tuning context. CoLLiE is available at https://github.com/OpenLMLab/collie.

Title: Summarization-based Data Augmentation for Document Classification. (arXiv:2312.00513v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00513
Code URL: https://github.com/etsurin/summaug
Copy Paste: [[2312.00513]] Summarization-based Data Augmentation for Document Classification(http://arxiv.org/abs/2312.00513)
Summary:
Despite the prevalence of pretrained language models in natural language understanding tasks, understanding lengthy text such as document is still challenging due to the data sparseness problem. Inspired by that humans develop their ability of understanding lengthy text from reading shorter text, we propose a simple yet effective summarization-based data augmentation, SUMMaug, for document classification. We first obtain easy-to-learn examples for the target document classification task by summarizing the input of the original training examples, while optionally merging the original labels to conform to the summarized input. We then use the generated pseudo examples to perform curriculum learning. Experimental results on two datasets confirmed the advantage of our method compared to existing baseline methods in terms of robustness and accuracy. We release our code and data at https://github.com/etsurin/summaug.

Title: Explanatory Argument Extraction of Correct Answers in Resident Medical Exams. (arXiv:2312.00567v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00567
Code URL: null
Copy Paste: [[2312.00567]] Explanatory Argument Extraction of Correct Answers in Resident Medical Exams(http://arxiv.org/abs/2312.00567)
Summary:
Developing the required technology to assist medical experts in their everyday activities is currently a hot topic in the Artificial Intelligence research field. Thus, a number of large language models (LLMs) and automated benchmarks have recently been proposed with the aim of facilitating information extraction in Evidence-Based Medicine (EBM) using natural language as a tool for mediating in human-AI interaction. The most representative benchmarks are limited to either multiple-choice or long-form answers and are available only in English. In order to address these shortcomings, in this paper we present a new dataset which, unlike previous work: (i) includes not only explanatory arguments for the correct answer, but also arguments to reason why the incorrect answers are not correct; (ii) the explanations are written originally by medical doctors to answer questions from the Spanish Residency Medical Exams. Furthermore, this new benchmark allows us to setup a novel extractive task which consists of identifying the explanation of the correct answer written by medical doctors. An additional benefit of our setting is that we can leverage the extractive QA paradigm to automatically evaluate performance of LLMs without resorting to costly manual evaluation by medical experts. Comprehensive experimentation with language models for Spanish shows that sometimes multilingual models fare better than monolingual ones, even outperforming models which have been adapted to the medical domain. Furthermore, results across the monolingual models are mixed, with supposedly smaller and inferior models performing competitively. In any case, the obtained results show that our novel dataset and approach can be an effective technique to help medical practitioners in identifying relevant evidence-based explanations for medical questions.

Title: Nonparametric Variational Regularisation of Pretrained Transformers. (arXiv:2312.00662v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00662
Code URL: null
Copy Paste: [[2312.00662]] Nonparametric Variational Regularisation of Pretrained Transformers(http://arxiv.org/abs/2312.00662)
Summary:
The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model to the new domain is large. Nonparametric Variational Information Bottleneck (NVIB) has been proposed as a regulariser for training cross-attention in Transformers, potentially addressing the overfitting problem. We extend the NVIB framework to replace all types of attention functions in Transformers, and show that existing pretrained Transformers can be reinterpreted as Nonparametric Variational (NV) models using a proposed identity initialisation. We then show that changing the initialisation introduces a novel, information-theoretic post-training regularisation in the attention mechanism, which improves out-of-domain generalisation without any training. This success supports the hypothesis that pretrained Transformers are implicitly NV Bayesian models.

Title: The Efficiency Spectrum of Large Language Models: An Algorithmic Survey. (arXiv:2312.00678v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00678
Code URL: https://github.com/tding1/efficient-llm-survey
Copy Paste: [[2312.00678]] The Efficiency Spectrum of Large Language Models: An Algorithmic Survey(http://arxiv.org/abs/2312.00678)
Summary:
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algorithmic and hardware solutions, have been developed to enhance the efficiency of LLMs. This survey delivers a comprehensive review of algorithmic advancements aimed at improving LLM efficiency. Unlike other surveys that typically focus on specific areas such as training or model compression, this paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs. Specifically, it covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. This paper aims to serve as a valuable resource for researchers and practitioners, laying the groundwork for future innovations in this critical research area. Our repository of relevant references is maintained at url{https://github.com/tding1/Efficient-LLM-Survey}.

Title: Contextualized word senses: from attention to compositionality. (arXiv:2312.00680v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00680
Code URL: null
Copy Paste: [[2312.00680]] Contextualized word senses: from attention to compositionality(http://arxiv.org/abs/2312.00680)
Summary:
The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very fruitful, they continue to be models with little or no interpretability and explainability. One of the tasks for which they are best suited is the encoding of the contextual sense of words using contextualized embeddings. In this paper we propose a transparent, interpretable, and linguistically motivated strategy for encoding the contextual sense of words by modeling semantic compositionality. Particular attention is given to dependency relations and semantic notions such as selection preferences and paradigmatic classes. A partial implementation of the proposed model is carried out and compared with Transformer-based architectures for a given semantic task, namely the similarity calculation of word senses in context. The results obtained show that it is possible to be competitive with linguistically motivated models instead of using the black boxes underlying complex neural architectures.

Title: SeaLLMs -- Large Language Models for Southeast Asia. (arXiv:2312.00738v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00738
Code URL: https://github.com/damo-nlp-sg/seallms
Copy Paste: [[2312.00738]] SeaLLMs -- Large Language Models for Southeast Asia(http://arxiv.org/abs/2312.00738)
Summary:
Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are built upon the Llama-2 model and further advanced through continued pre-training with an extended vocabulary, specialized instruction and alignment tuning to better capture the intricacies of regional languages. This allows them to respect and reflect local cultural norms, customs, stylistic preferences, and legal considerations. Our comprehensive evaluation demonstrates that SeaLLM-13b models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities relative to comparable open-source models. Moreover, they outperform ChatGPT-3.5 in non-Latin languages, such as Thai, Khmer, Lao, and Burmese, by large margins while remaining lightweight and cost-effective to operate.

Title: LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices. (arXiv:2312.00388v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00388
Code URL: null
Copy Paste: [[2312.00388]] LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices(http://arxiv.org/abs/2312.00388)
Summary:
Deploying Large Language Models (LLMs) locally on mobile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collaborative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by processing information locally. LinguaLinked uses three key strategies. First, an optimized model assignment technique segments LLMs and uses linear optimization to align segments with each device's capabilities. Second, an optimized data transmission mechanism ensures efficient and structured data flow between model segments while also maintaining the integrity of the original model structure. Finally, LinguaLinked incorporates a runtime load balancer that actively monitors and redistributes tasks among mobile devices to prevent bottlenecks, enhancing the system's overall efficiency and responsiveness. We demonstrate that LinguaLinked facilitates efficient LLM inference while maintaining consistent throughput and minimal latency through extensive testing across various mobile devices, from high-end to low-end Android devices. In our evaluations, compared to the baseline, LinguaLinked achieves an inference performance acceleration of $1.11\times$ to $1.61\times$ in single-threaded settings, $1.73\times$ to $2.65\times$ with multi-threading. Additionally, runtime load balancing yields an overall inference acceleration of $1.29\times$ to $1.32\times$.

Title: Pathway to a fully data-driven geotechnics: lessons from materials informatics. (arXiv:2312.00581v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00581
Code URL: null
Copy Paste: [[2312.00581]] Pathway to a fully data-driven geotechnics: lessons from materials informatics(http://arxiv.org/abs/2312.00581)
Summary:
This paper elucidates the challenges and opportunities inherent in integrating data-driven methodologies into geotechnics, drawing inspiration from the success of materials informatics. Highlighting the intricacies of soil complexity, heterogeneity, and the lack of comprehensive data, the discussion underscores the pressing need for community-driven database initiatives and open science movements. By leveraging the transformative power of deep learning, particularly in feature extraction from high-dimensional data and the potential of transfer learning, we envision a paradigm shift towards a more collaborative and innovative geotechnics field. The paper concludes with a forward-looking stance, emphasizing the revolutionary potential brought about by advanced computational tools like large language models in reshaping geotechnics informatics.

Title: Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation. (arXiv:2312.00645v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00645
Code URL: null
Copy Paste: [[2312.00645]] Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation(http://arxiv.org/abs/2312.00645)
Summary:
There is a growing need to gain insight into language model capabilities that relate to sensitive topics, such as bioterrorism or cyberwarfare. However, traditional open source benchmarks are not fit for the task, due to the associated practice of publishing the correct answers in human-readable form. At the same time, enforcing mandatory closed-quarters evaluations might stifle development and erode trust. In this context, we propose hashmarking, a protocol for evaluating language models in the open without having to disclose the correct answers. In its simplest form, a hashmark is a benchmark whose reference solutions have been cryptographically hashed prior to publication. Following an overview of the proposed evaluation protocol, we go on to assess its resilience against traditional attack vectors (e.g. rainbow table attacks), as well as against failure modes unique to increasingly capable generative models.

gpt

Title: Robust Concept Erasure via Kernelized Rate-Distortion Maximization. (arXiv:2312.00194v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00194
Code URL: https://github.com/brcsomnath/kram
Copy Paste: [[2312.00194]] Robust Concept Erasure via Kernelized Rate-Distortion Maximization(http://arxiv.org/abs/2312.00194)
Summary:
Distributed representations provide a vector space that captures meaningful relationships between data instances. The distributed nature of these representations, however, entangles together multiple attributes or concepts of data instances (e.g., the topic or sentiment of a text, characteristics of the author (age, gender, etc), etc). Recent work has proposed the task of concept erasure, in which rather than making a concept predictable, the goal is to remove an attribute from distributed representations while retaining other information from the original representation space as much as possible. In this paper, we propose a new distance metric learning-based objective, the Kernelized Rate-Distortion Maximizer (KRaM), for performing concept erasure. KRaM fits a transformation of representations to match a specified distance measure (defined by a labeled concept to erase) using a modified rate-distortion function. Specifically, KRaM's objective function aims to make instances with similar concept labels dissimilar in the learned representation space while retaining other information. We find that optimizing KRaM effectively erases various types of concepts: categorical, continuous, and vector-valued variables from data representations across diverse domains. We also provide a theoretical analysis of several properties of KRaM's objective. To assess the quality of the learned representations, we propose an alignment score to evaluate their similarity with the original representation space. Additionally, we conduct experiments to showcase KRaM's efficacy in various settings, from erasing binary gender variables in word embeddings to vector-valued variables in GPT-3 representations.

llm

Title: Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games. (arXiv:2312.00746v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.00746
Code URL: null
Copy Paste: [[2312.00746]] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games(http://arxiv.org/abs/2312.00746)
Summary:
In this study, we explore the application of Large Language Models (LLMs) in "Jubensha" (Chinese murder mystery role-playing games), a novel area in AI-driven gaming. We introduce the first Chinese dataset specifically for Jubensha, including character scripts and game rules, to foster AI agent development in this complex narrative environment. Our work also presents a unique multi-agent interaction framework using LLMs, allowing AI agents to autonomously engage in the game, enhancing the dynamics of Jubensha gameplay. To evaluate these AI agents, we developed specialized methods targeting their mastery of case information and reasoning skills. Furthermore, we incorporated the latest advancements in in-context learning to improve the agents' performance in critical aspects like information gathering, murderer detection, and logical reasoning. The experimental results validate the effectiveness of our proposed methods. This work aims to offer a fresh perspective on understanding LLM capabilities and establish a new benchmark for evaluating large language model-based agents to researchers in the field.

Title: Instruction-tuning Aligns LLMs to the Human Brain. (arXiv:2312.00575v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00575
Code URL: null
Copy Paste: [[2312.00575]] Instruction-tuning Aligns LLMs to the Human Brain(http://arxiv.org/abs/2312.00575)
Summary:
Instruction-tuning is a widely adopted method of finetuning that enables large language models (LLMs) to generate output that more closely resembles human responses to natural language queries, in many cases leading to human-level performance on diverse testbeds. However, it remains unclear whether instruction-tuning truly makes LLMs more similar to how humans process language. We investigate the effect of instruction-tuning on LLM-human similarity in two ways: (1) brain alignment, the similarity of LLM internal representations to neural activity in the human language system, and (2) behavioral alignment, the similarity of LLM and human behavior on a reading task. We assess 25 vanilla and instruction-tuned LLMs across three datasets involving humans reading naturalistic stories and sentences. We discover that instruction-tuning generally enhances brain alignment by an average of 6%, but does not have a similar effect on behavioral alignment. To identify the factors underlying LLM-brain alignment, we compute correlations between the brain alignment of LLMs and various model properties, such as model size, various problem-solving abilities, and performance on tasks requiring world knowledge spanning various domains. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and brain alignment, suggesting that mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.

long context

lora

Title: Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration. (arXiv:2312.00267v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00267
Code URL: null
Copy Paste: [[2312.00267]] Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration(http://arxiv.org/abs/2312.00267)
Summary:
Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and formalize this as an offline contextual dueling bandit problem. We give an upper-confidence-bound style algorithm for this problem and prove a polynomial worst-case regret bound. We then provide empirical confirmation in a synthetic setting that our approach outperforms existing methods. After, we extend the setting and methodology for practical use in RLHF training of large language models. Here, our method is able to reach better performance with fewer samples of human preferences than multiple baselines on three real-world datasets.

Title: Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?. (arXiv:2312.00455v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.00455
Code URL: null
Copy Paste: [[2312.00455]] Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?(http://arxiv.org/abs/2312.00455)
Summary:
Can we build an artificial system that would be able to generate endless surprises if ran "forever" in Minecraft? While there is not a single path toward solving that grand challenge, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft. Our framework for an open-ended system includes two components: a complex system used to recursively grow and complexify artifacts over time, and a discovery algorithm that leverages the concept of meta-diversity search. Since complex systems have shown to enable the emergence of considerable complexity from set of simple rules, we believe them to be great candidates to generate all sort of artifacts in Minecraft. Yet, the space of possible artifacts that can be generated by these systems is often unknown, challenging to characterize and explore. Therefore automating the long-term discovery of novel and increasingly complex artifacts in these systems is an exciting research field. To approach these challenges, we formulate the problem of meta-diversity search where an artificial "discovery assistant" incrementally learns a diverse set of representations to characterize behaviors and searches to discover diverse patterns within each of them. A successful discovery assistant should continuously seek for novel sources of diversities while being able to quickly specialize the search toward a new unknown type of diversity. To implement those ideas in the Minecraft environment, we simulate an artificial "chemistry" system based on Lenia continuous cellular automaton for generating artifacts, as well as an artificial "discovery assistant" (called Holmes) for the artifact-discovery process. Holmes incrementally learns a hierarchy of modular representations to characterize divergent sources of diversity and uses a goal-based intrinsically-motivated exploration as the diversity search strategy.

Title: Relevance-guided Neural Machine Translation. (arXiv:2312.00214v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00214
Code URL: null
Copy Paste: [[2312.00214]] Relevance-guided Neural Machine Translation(http://arxiv.org/abs/2312.00214)
Summary:
With the advent of the Transformer architecture, Neural Machine Translation (NMT) results have shown great improvement lately. However, results in low-resource conditions still lag behind in both bilingual and multilingual setups, due to the limited amount of available monolingual and/or parallel data; hence, the need for methods addressing data scarcity in an efficient, and explainable way, is eminent. We propose an explainability-based training approach for NMT, applied in Unsupervised and Supervised model training, for translation of three languages of varying resources, French, Gujarati, Kazakh, to and from English. Our results show our method can be promising, particularly when training in low-resource conditions, outperforming simple training baselines; though the improvement is marginal, it sets the ground for further exploration of the approach and the parameters, and its extension to other languages.

Title: Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs. (arXiv:2312.00552v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00552
Code URL: null
Copy Paste: [[2312.00552]] Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs(http://arxiv.org/abs/2312.00552)
Summary:
Unsupervised relation extraction (URE) aims to extract relations between named entities from raw text without requiring manual annotations or pre-existing knowledge bases. In recent studies of URE, researchers put a notable emphasis on contrastive learning strategies for acquiring relation representations. However, these studies often overlook two important aspects: the inclusion of diverse positive pairs for contrastive learning and the exploration of appropriate loss functions. In this paper, we propose AugURE with both within-sentence pairs augmentation and augmentation through cross-sentence pairs extraction to increase the diversity of positive pairs and strengthen the discriminative power of contrastive learning. We also identify the limitation of noise-contrastive estimation (NCE) loss for relation representation learning and propose to apply margin loss for sentence pairs. Experiments on NYT-FB and TACRED datasets demonstrate that the proposed relation representation learning and a simple K-Means clustering achieves state-of-the-art performance.

hallucination

prompt

code

Title: PEFTDebias : Capturing debiasing information using PEFTs. (arXiv:2312.00434v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00434
Code URL: null
Copy Paste: [[2312.00434]] PEFTDebias : Capturing debiasing information using PEFTs(http://arxiv.org/abs/2312.00434)
Summary:
The increasing use of foundation models highlights the urgent need to address and eliminate implicit biases present in them that arise during pretraining. In this paper, we introduce PEFTDebias, a novel approach that employs parameter-efficient fine-tuning (PEFT) to mitigate the biases within foundation models. PEFTDebias consists of two main phases: an upstream phase for acquiring debiasing parameters along a specific bias axis, and a downstream phase where these parameters are incorporated into the model and frozen during the fine-tuning process. By evaluating on four datasets across two bias axes namely gender and race, we find that downstream biases can be effectively reduced with PEFTs. In addition, we show that these parameters possess axis-specific debiasing characteristics, enabling their effective transferability in mitigating biases in various downstream tasks. To ensure reproducibility, we release the code to do our experiments.

Title: Japanese Tort-case Dataset for Rationale-supported Legal Judgment Prediction. (arXiv:2312.00480v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00480
Code URL: null
Copy Paste: [[2312.00480]] Japanese Tort-case Dataset for Rationale-supported Legal Judgment Prediction(http://arxiv.org/abs/2312.00480)
Summary:
This paper presents the first dataset for Japanese Legal Judgment Prediction (LJP), the Japanese Tort-case Dataset (JTD), which features two tasks: tort prediction and its rationale extraction. The rationale extraction task identifies the court's accepting arguments from alleged arguments by plaintiffs and defendants, which is a novel task in the field. JTD is constructed based on annotated 3,477 Japanese Civil Code judgments by 41 legal experts, resulting in 7,978 instances with 59,697 of their alleged arguments from the involved parties. Our baseline experiments show the feasibility of the proposed two tasks, and our error analysis by legal experts identifies sources of errors and suggests future directions of the LJP research.

Title: Removing Biases from Molecular Representations via Information Maximization. (arXiv:2312.00718v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00718
Code URL: https://github.com/uhlerlab/infocore
Copy Paste: [[2312.00718]] Removing Biases from Molecular Representations via Information Maximization(http://arxiv.org/abs/2312.00718)
Summary:
High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

Title: Text Attribute Control via Closed-Loop Disentanglement. (arXiv:2312.00277v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00277
Code URL: null
Copy Paste: [[2312.00277]] Text Attribute Control via Closed-Loop Disentanglement(http://arxiv.org/abs/2312.00277)
Summary:
Changing an attribute of a text without changing the content usually requires to first disentangle the text into irrelevant attributes and content representations. After that, in the inference phase, the representation of one attribute is tuned to a different value, expecting that the corresponding attribute of the text can also be changed accordingly. The usual way of disentanglement is to add some constraints on the latent space of an encoder-decoder architecture, including adversarial-based constraints and mutual-information-based constraints. However, the previous semi-supervised processes of attribute change are usually not enough to guarantee the success of attribute change and content preservation. In this paper, we propose a novel approach to achieve a robust control of attributes while enhancing content preservation. In this approach, we use a semi-supervised contrastive learning method to encourage the disentanglement of attributes in latent spaces. Differently from previous works, we re-disentangle the reconstructed sentence and compare the re-disentangled latent space with the original latent space, which makes a closed-loop disentanglement process. This also helps content preservation. In addition, the contrastive learning method is also able to replace the role of minimizing mutual information and adversarial training in the disentanglement process, which alleviates the computation cost. We conducted experiments on three text datasets, including the Yelp Service review dataset, the Amazon Product review dataset, and the GoEmotions dataset. The experimental results show the effectiveness of our model.

Title: PsyAttention: Psychological Attention Model for Personality Detection. (arXiv:2312.00293v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00293
Code URL: null
Copy Paste: [[2312.00293]] PsyAttention: Psychological Attention Model for Personality Detection(http://arxiv.org/abs/2312.00293)
Summary:
Work on personality detection has tended to incorporate psychological features from different personality models, such as BigFive and MBTI. There are more than 900 psychological features, each of which is helpful for personality detection. However, when used in combination, the application of different calculation standards among these features may result in interference between features calculated using distinct systems, thereby introducing noise and reducing performance. This paper adapts different psychological models in the proposed PsyAttention for personality detection, which can effectively encode psychological features, reducing their number by 85%. In experiments on the BigFive and MBTI models, PysAttention achieved average accuracy of 65.66% and 86.30%, respectively, outperforming state-of-the-art methods, indicating that it is effective at encoding psychological features.

Title: DeepEn2023: Energy Datasets for Edge Artificial Intelligence. (arXiv:2312.00103v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00103
Code URL: null
Copy Paste: [[2312.00103]] DeepEn2023: Energy Datasets for Edge Artificial Intelligence(http://arxiv.org/abs/2312.00103)
Summary:
Climate change poses one of the most significant challenges to humanity. As a result of these climatic changes, the frequency of weather, climate, and water-related disasters has multiplied fivefold over the past 50 years, resulting in over 2 million deaths and losses exceeding $3.64 trillion USD. Leveraging AI-powered technologies for sustainable development and combating climate change is a promising avenue. Numerous significant publications are dedicated to using AI to improve renewable energy forecasting, enhance waste management, and monitor environmental changes in real time. However, very few research studies focus on making AI itself environmentally sustainable. This oversight regarding the sustainability of AI within the field might be attributed to a mindset gap and the absence of comprehensive energy datasets. In addition, with the ubiquity of edge AI systems and applications, especially on-device learning, there is a pressing need to measure, analyze, and optimize their environmental sustainability, such as energy efficiency. To this end, in this paper, we propose large-scale energy datasets for edge AI, named DeepEn2023, covering a wide range of kernels, state-of-the-art deep neural network models, and popular edge AI applications. We anticipate that DeepEn2023 will improve transparency in sustainability in on-device deep learning across a range of edge AI systems and applications. For more information, including access to the dataset and code, please visit https://amai-gsu.github.io/DeepEn2023.

Title: Automating Continual Learning. (arXiv:2312.00276v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00276
Code URL: https://github.com/idsia/automated-cl
Copy Paste: [[2312.00276]] Automating Continual Learning(http://arxiv.org/abs/2312.00276)
Summary:
General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are forgotten when a new task is learned. Instead of hand-crafting new algorithms for avoiding CF, we propose Automated Continual Learning (ACL) to train self-referential neural networks to meta-learn their own in-context continual (meta-)learning algorithms. ACL encodes all desiderata -- good performance on both old and new tasks -- into its meta-learning objectives. Our experiments demonstrate that ACL effectively solves "in-context catastrophic forgetting"; our ACL-learned algorithms outperform hand-crafted ones, e.g., on the Split-MNIST benchmark in the replay-free setting, and enables continual learning of diverse tasks consisting of multiple few-shot and standard image classification datasets.

Title: Learning to forecast diagnostic parameters using pre-trained weather embedding. (arXiv:2312.00290v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00290
Code URL: null
Copy Paste: [[2312.00290]] Learning to forecast diagnostic parameters using pre-trained weather embedding(http://arxiv.org/abs/2312.00290)
Summary:
Data-driven weather prediction (DDWP) models are increasingly becoming popular for weather forecasting. However, while operational weather forecasts predict a wide variety of weather variables, DDWPs currently forecast a specific set of key prognostic variables. Non-prognostic ("diagnostic") variables are sometimes modeled separately as dependent variables of the prognostic variables (c.f. FourCastNet), or by including the diagnostic variable as a target in the DDWP. However, the cost of training and deploying bespoke models for each diagnostic variable can increase dramatically with more diagnostic variables, and limit the operational use of such models. Likewise, retraining an entire DDWP each time a new diagnostic variable is added is also cost-prohibitive. We present an two-stage approach that allows new diagnostic variables to be added to an end-to-end DDWP model without the expensive retraining. In the first stage, we train an autoencoder that learns to embed prognostic variables into a latent space. In the second stage, the autoencoder is frozen and "downstream" models are trained to predict diagnostic variables using only the latent representations of prognostic variables as input. Our experiments indicate that models trained using the two-stage approach offer accuracy comparable to training bespoke models, while leading to significant reduction in resource utilization during training and inference. This approach allows for new "downstream" models to be developed as needed, without affecting existing models and thus reducing the friction in operationalizing new models.

Title: Hypergraph Node Representation Learning with One-Stage Message Passing. (arXiv:2312.00336v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00336
Code URL: null
Copy Paste: [[2312.00336]] Hypergraph Node Representation Learning with One-Stage Message Passing(http://arxiv.org/abs/2312.00336)
Summary:
Hypergraphs as an expressive and general structure have attracted considerable attention from various research domains. Most existing hypergraph node representation learning techniques are based on graph neural networks, and thus adopt the two-stage message passing paradigm (i.e. node -> hyperedge -> node). This paradigm only focuses on local information propagation and does not effectively take into account global information, resulting in less optimal representations. Our theoretical analysis of representative two-stage message passing methods shows that, mathematically, they model different ways of local message passing through hyperedges, and can be unified into one-stage message passing (i.e. node -> node). However, they still only model local information. Motivated by this theoretical analysis, we propose a novel one-stage message passing paradigm to model both global and local information propagation for hypergraphs. We integrate this paradigm into HGraphormer, a Transformer-based framework for hypergraph node representation learning. HGraphormer injects the hypergraph structure information (local information) into Transformers (global information) by combining the attention matrix and hypergraph Laplacian. Extensive experiments demonstrate that HGraphormer outperforms recent hypergraph learning methods on five representative benchmark datasets on the semi-supervised hypernode classification task, setting new state-of-the-art performance, with accuracy improvements between 2.52% and 6.70%. Our code and datasets are available.

Title: On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals. (arXiv:2312.00502v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00502
Code URL: https://github.com/aristotelisballas/listen2yourheart
Copy Paste: [[2312.00502]] On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals(http://arxiv.org/abs/2312.00502)
Summary:
Objective: Despite the recent increase in research activity, deep-learning models have not yet been widely accepted in medicine. The shortage of high-quality annotated data often hinders the development of robust and generalizable models, which do not suffer from degraded effectiveness when presented with newly-collected, out-of-distribution (OOD) datasets. Methods: Contrastive Self-Supervised Learning (SSL) offers a potential solution to the scarcity of labeled data as it takes advantage of unlabeled data to increase model effectiveness and robustness. In this research, we propose applying contrastive SSL for detecting abnormalities in phonocardiogram (PCG) samples by learning a generalized representation of the signal. Specifically, we perform an extensive comparative evaluation of a wide range of audio-based augmentations and evaluate trained classifiers on multiple datasets across different downstream tasks. Results: We experimentally demonstrate that, depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32% when evaluated on unseen data, while SSL models only lose up to 10% or even improve in some cases. Conclusions: Contrastive SSL pretraining can assist in providing robust classifiers which can generalize to unseen, OOD data, without relying on time- and labor-intensive annotation processes by medical experts. Furthermore, the proposed extensive evaluation protocol sheds light on the most promising and appropriate augmentations for robust PCG signal processing. Significance: We provide researchers and practitioners with a roadmap towards producing robust models for PCG classification, in addition to an open-source codebase for developing novel approaches.

Title: Spatio-Temporal-Decoupled Masked Pre-training for Traffic Forecasting. (arXiv:2312.00516v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00516
Code URL: https://github.com/jimmy-7664/std_mae
Copy Paste: [[2312.00516]] Spatio-Temporal-Decoupled Masked Pre-training for Traffic Forecasting(http://arxiv.org/abs/2312.00516)
Summary:
Accurate forecasting of multivariate traffic flow time series remains challenging due to substantial spatio-temporal heterogeneity and complex long-range correlative patterns. To address this, we propose Spatio-Temporal-Decoupled Masked Pre-training (STD-MAE), a novel framework that employs masked autoencoders to learn and encode complex spatio-temporal dependencies via pre-training. Specifically, we use two decoupled masked autoencoders to reconstruct the traffic data along spatial and temporal axes using a self-supervised pre-training approach. These mask reconstruction mechanisms capture the long-range correlations in space and time separately. The learned hidden representations are then used to augment the downstream spatio-temporal traffic predictor. A series of quantitative and qualitative evaluations on four widely-used traffic benchmarks (PEMS03, PEMS04, PEMS07, and PEMS08) are conducted to verify the state-of-the-art performance, with STD-MAE explicitly enhancing the downstream spatio-temporal models' ability to capture long-range intricate spatial and temporal patterns. Codes are available at https://github.com/Jimmy-7664/STD_MAE.

Title: Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version). (arXiv:2312.00592v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00592
Code URL: null
Copy Paste: [[2312.00592]] Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version)(http://arxiv.org/abs/2312.00592)
Summary:
Reinforcement learning (RL) for robot control typically requires a detailed representation of the environment state, including information about task-relevant objects not directly measurable. Keypoint detectors, such as spatial autoencoders (SAEs), are a common approach to extracting a low-dimensional representation from high-dimensional image data. SAEs aim at spatial features such as object positions, which are often useful representations in robotic RL. However, whether an SAE is actually able to track objects in the scene and thus yields a spatial state representation well suited for RL tasks has rarely been examined due to a lack of established metrics. In this paper, we propose to assess the performance of an SAE instance by measuring how well keypoints track ground truth objects in images. We present a computationally lightweight metric and use it to evaluate common baseline SAE architectures on image data from a simulated robot task. We find that common SAEs differ substantially in their spatial extraction capability. Furthermore, we validate that SAEs that perform well in our metric achieve superior performance when used in downstream RL. Thus, our metric is an effective and lightweight indicator of RL performance before executing expensive RL training. Building on these insights, we identify three key modifications of SAE architectures to improve tracking performance. We make our code available at anonymous.4open.science/r/sae-rl.

chat

retrieval augmented generation

rag

Title: Target-agnostic Source-free Domain Adaptation for Regression Tasks. (arXiv:2312.00540v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00540
Code URL: null
Copy Paste: [[2312.00540]] Target-agnostic Source-free Domain Adaptation for Regression Tasks(http://arxiv.org/abs/2312.00540)
Summary:
Unsupervised domain adaptation (UDA) seeks to bridge the domain gap between the target and source using unlabeled target data. Source-free UDA removes the requirement for labeled source data at the target to preserve data privacy and storage. However, work on source-free UDA assumes knowledge of domain gap distribution, and hence is limited to either target-aware or classification task. To overcome it, we propose TASFAR, a novel target-agnostic source-free domain adaptation approach for regression tasks. Using prediction confidence, TASFAR estimates a label density map as the target label distribution, which is then used to calibrate the source model on the target domain. We have conducted extensive experiments on four regression tasks with various domain gaps, namely, pedestrian dead reckoning for different users, image-based people counting in different scenes, housing-price prediction at different districts, and taxi-trip duration prediction from different departure points. TASFAR is shown to substantially outperform the state-of-the-art source-free UDA approaches by averagely reducing 22% errors for the four tasks and achieve notably comparable accuracy as source-based UDA without using source data.

Title: Simple Transferability Estimation for Regression Tasks. (arXiv:2312.00656v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00656
Code URL: https://github.com/cuongnn218/regression_transferability
Copy Paste: [[2312.00656]] Simple Transferability Estimation for Regression Tasks(http://arxiv.org/abs/2312.00656)
Summary:
We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel theoretical results connecting our approaches to the actual transferability of the optimal target models obtained from the transfer learning process. Despite their simplicity, our approaches significantly outperform existing state-of-the-art regression transferability estimators in both accuracy and efficiency. On two large-scale keypoint regression benchmarks, our approaches yield 12% to 36% better results on average while being at least 27% faster than previous state-of-the-art methods.

Title: Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space. (arXiv:2312.00727v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00727
Code URL: null
Copy Paste: [[2312.00727]] Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space(http://arxiv.org/abs/2312.00727)
Summary:
This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states from observations in a continuous state space poses a significant challenge, largely due to the intractable likelihood. To tackle this issue, we propose a stochastic model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics and partial observation environments. We leveraged the Predictive State Representation (PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future multi-step observations analytically, and the results in this context are provable. Furthermore, we derived essential operators from the kernel Bayes' rule, enabling the recursive estimation of future observations using various operators. Under the assumption of \textit{undercompleness}, a polynomial sample complexity is established for the RL algorithm for the infinite size of observation and action spaces, ensuring an $\epsilon-$suboptimal safe policy guarantee.

Title: GFN-SR: Symbolic Regression with Generative Flow Networks. (arXiv:2312.00396v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00396
Code URL: null
Copy Paste: [[2312.00396]] GFN-SR: Symbolic Regression with Generative Flow Networks(http://arxiv.org/abs/2312.00396)
Summary:
Symbolic regression (SR) is an area of interpretable machine learning that aims to identify mathematical expressions, often composed of simple functions, that best fit in a given set of covariates $X$ and response $y$. In recent years, deep symbolic regression (DSR) has emerged as a popular method in the field by leveraging deep reinforcement learning to solve the complicated combinatorial search problem. In this work, we propose an alternative framework (GFN-SR) to approach SR with deep learning. We model the construction of an expression tree as traversing through a directed acyclic graph (DAG) so that GFlowNet can learn a stochastic policy to generate such trees sequentially. Enhanced with an adaptive reward baseline, our method is capable of generating a diverse set of best-fitting expressions. Notably, we observe that GFN-SR outperforms other SR algorithms in noisy data regimes, owing to its ability to learn a distribution of rewards over a space of candidate solutions.

Title: A Causality-Aware Pattern Mining Scheme for Group Activity Recognition in a Pervasive Sensor Space. (arXiv:2312.00404v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00404
Code URL: null
Copy Paste: [[2312.00404]] A Causality-Aware Pattern Mining Scheme for Group Activity Recognition in a Pervasive Sensor Space(http://arxiv.org/abs/2312.00404)
Summary:
Human activity recognition (HAR) is a key challenge in pervasive computing and its solutions have been presented based on various disciplines. Specifically, for HAR in a smart space without privacy and accessibility issues, data streams generated by deployed pervasive sensors are leveraged. In this paper, we focus on a group activity by which a group of users perform a collaborative task without user identification and propose an efficient group activity recognition scheme which extracts causality patterns from pervasive sensor event sequences generated by a group of users to support as good recognition accuracy as the state-of-the-art graphical model. To filter out irrelevant noise events from a given data stream, a set of rules is leveraged to highlight causally related events. Then, a pattern-tree algorithm extracts frequent causal patterns by means of a growing tree structure. Based on the extracted patterns, a weighted sum-based pattern matching algorithm computes the likelihoods of stored group activities to the given test event sequence by means of matched event pattern counts for group activity recognition. We evaluate the proposed scheme using the data collected from our testbed and CASAS datasets where users perform their tasks on a daily basis and validate its effectiveness in a real environment. Experiment results show that the proposed scheme performs higher recognition accuracy and with a small amount of runtime overhead than the existing schemes.

Title: Interpretable Meta-Learning of Physical Systems. (arXiv:2312.00477v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00477
Code URL: null
Copy Paste: [[2312.00477]] Interpretable Meta-Learning of Physical Systems(http://arxiv.org/abs/2312.00477)
Summary:
Machine learning methods can be a valuable aid in the scientific process, but they need to face challenging settings where data come from inhomogeneous experimental conditions. Recent meta-learning methods have made significant progress in multi-task learning, but they rely on black-box neural networks, resulting in high computational costs and limited interpretability. Leveraging the structure of the learning problem, we argue that multi-environment generalization can be achieved using a simpler learning model, with an affine structure with respect to the learning task. Crucially, we prove that this architecture can identify the physical parameters of the system, enabling interpreable learning. We demonstrate the competitive generalization performance and the low computational cost of our method by comparing it to state-of-the-art algorithms on physical systems, ranging from toy models to complex, non-analytical systems. The interpretability of our method is illustrated with original applications to physical-parameter-induced adaptation and to adaptive control.

Title: REDUCR: Robust Data Downsampling Using Class Priority Reweighting. (arXiv:2312.00486v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00486
Code URL: null
Copy Paste: [[2312.00486]] REDUCR: Robust Data Downsampling Using Class Priority Reweighting(http://arxiv.org/abs/2312.00486)
Summary:
Modern machine learning models are becoming increasingly expensive to train for real-world image and text classification tasks, where massive web-scale data is collected in a streaming fashion. To reduce the training cost, online batch selection techniques have been developed to choose the most informative datapoints. However, these techniques can suffer from poor worst-class generalization performance due to class imbalance and distributional shifts. This work introduces REDUCR, a robust and efficient data downsampling method that uses class priority reweighting. REDUCR reduces the training data while preserving worst-class generalization performance. REDUCR assigns priority weights to datapoints in a class-aware manner using an online learning algorithm. We demonstrate the data efficiency and robust performance of REDUCR on vision and text classification tasks. On web-scraped datasets with imbalanced class distributions, REDUCR significantly improves worst-class test accuracy (and average accuracy), surpassing state-of-the-art methods by around 15%.

Title: SpaCE: The Spatial Confounding Environment. (arXiv:2312.00710v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00710
Code URL: null
Copy Paste: [[2312.00710]] SpaCE: The Spatial Confounding Environment(http://arxiv.org/abs/2312.00710)
Summary:
Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.

language model

Title: Agent-OM: Leveraging Large Language Models for Ontology Matching. (arXiv:2312.00326v1 [cs.AI])

Title: On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs. (arXiv:2312.00353v1 [cs.CL])

Title: A Bayesian approach for prompt optimization in pre-trained language models. (arXiv:2312.00471v1 [cs.LG])

Title: SurreyAI 2023 Submission for the Quality Estimation Shared Task. (arXiv:2312.00525v1 [cs.CL])

Title: Questioning Biases in Case Judgment Summaries: Legal Datasets or Large Language Models?. (arXiv:2312.00554v1 [cs.CL])

Title: Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals. (arXiv:2312.00751v1 [cs.CL])

Title: SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection. (arXiv:2312.00292v1 [cs.CL])

Title: CoLLiE: Collaborative Training of Large Language Models in an Efficient Way. (arXiv:2312.00407v1 [cs.CL])

Title: Summarization-based Data Augmentation for Document Classification. (arXiv:2312.00513v1 [cs.CL])

Title: Explanatory Argument Extraction of Correct Answers in Resident Medical Exams. (arXiv:2312.00567v1 [cs.CL])

Title: Nonparametric Variational Regularisation of Pretrained Transformers. (arXiv:2312.00662v1 [cs.LG])

Title: The Efficiency Spectrum of Large Language Models: An Algorithmic Survey. (arXiv:2312.00678v1 [cs.CL])

Title: Contextualized word senses: from attention to compositionality. (arXiv:2312.00680v1 [cs.CL])

Title: SeaLLMs -- Large Language Models for Southeast Asia. (arXiv:2312.00738v1 [cs.CL])

Title: LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices. (arXiv:2312.00388v1 [cs.LG])

Title: Pathway to a fully data-driven geotechnics: lessons from materials informatics. (arXiv:2312.00581v1 [cs.LG])

Title: Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation. (arXiv:2312.00645v1 [cs.LG])

gpt

Title: Robust Concept Erasure via Kernelized Rate-Distortion Maximization. (arXiv:2312.00194v1 [cs.LG])

llm

Title: Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games. (arXiv:2312.00746v1 [cs.AI])

Title: Instruction-tuning Aligns LLMs to the Human Brain. (arXiv:2312.00575v1 [cs.CL])

long context

lora

Title: Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration. (arXiv:2312.00267v1 [cs.LG])

Title: Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?. (arXiv:2312.00455v1 [cs.AI])

Title: Relevance-guided Neural Machine Translation. (arXiv:2312.00214v1 [cs.CL])

Title: Improving Unsupervised Relation Extraction by Augmenting Diverse Sentence Pairs. (arXiv:2312.00552v1 [cs.CL])

hallucination

prompt

code

Title: PEFTDebias : Capturing debiasing information using PEFTs. (arXiv:2312.00434v1 [cs.LG])

Title: Japanese Tort-case Dataset for Rationale-supported Legal Judgment Prediction. (arXiv:2312.00480v1 [cs.CL])

Title: Removing Biases from Molecular Representations via Information Maximization. (arXiv:2312.00718v1 [cs.LG])

Title: Text Attribute Control via Closed-Loop Disentanglement. (arXiv:2312.00277v1 [cs.LG])

Title: PsyAttention: Psychological Attention Model for Personality Detection. (arXiv:2312.00293v1 [cs.CL])

Title: DeepEn2023: Energy Datasets for Edge Artificial Intelligence. (arXiv:2312.00103v1 [cs.LG])

Title: Automating Continual Learning. (arXiv:2312.00276v1 [cs.LG])

Title: Learning to forecast diagnostic parameters using pre-trained weather embedding. (arXiv:2312.00290v1 [cs.LG])

Title: Hypergraph Node Representation Learning with One-Stage Message Passing. (arXiv:2312.00336v1 [cs.LG])

Title: On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals. (arXiv:2312.00502v1 [cs.LG])

Title: Spatio-Temporal-Decoupled Masked Pre-training for Traffic Forecasting. (arXiv:2312.00516v1 [cs.LG])

Title: Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version). (arXiv:2312.00592v1 [cs.LG])

chat

retrieval augmented generation

rag

Title: Target-agnostic Source-free Domain Adaptation for Regression Tasks. (arXiv:2312.00540v1 [cs.LG])

Title: Simple Transferability Estimation for Regression Tasks. (arXiv:2312.00656v1 [cs.LG])

Title: Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space. (arXiv:2312.00727v1 [cs.LG])

Title: GFN-SR: Symbolic Regression with Generative Flow Networks. (arXiv:2312.00396v1 [cs.LG])

Title: A Causality-Aware Pattern Mining Scheme for Group Activity Recognition in a Pervasive Sensor Space. (arXiv:2312.00404v1 [cs.LG])

Title: Interpretable Meta-Learning of Physical Systems. (arXiv:2312.00477v1 [cs.LG])

Title: REDUCR: Robust Data Downsampling Using Class Priority Reweighting. (arXiv:2312.00486v1 [cs.LG])

Title: SpaCE: The Spatial Confounding Environment. (arXiv:2312.00710v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought