2024-01-02

language model

Title: Turing's Test, a Beautiful Thought Experiment. (arXiv:2401.00009v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00009
Code URL: null
Copy Paste: [[2401.00009]] Turing's Test, a Beautiful Thought Experiment(http://arxiv.org/abs/2401.00009)
Summary:
In the wake of large language models, there has been a resurgence of claims and questions about the Turing test and its value for AI, which are reminiscent of decades of practical "Turing" tests. If AI were quantum physics, by now several "Schr\"odinger's" cats could have been killed. Better late than never, it is time for a historical reconstruction of Turing's beautiful thought experiment. In this paper I present a wealth of evidence, including new archival sources, give original answers to several open questions about Turing's 1950 paper, and address the core question of the value of Turing's test.

Title: Is Knowledge All Large Language Models Needed for Causal Reasoning?. (arXiv:2401.00139v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00139
Code URL: null
Copy Paste: [[2401.00139]] Is Knowledge All Large Language Models Needed for Causal Reasoning?(http://arxiv.org/abs/2401.00139)
Summary:
This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence. Despite the proficiency of LLMs in a range of tasks, their potential for understanding causality requires further exploration. We propose a novel causal attribution model that utilizes "do-operators" for constructing counterfactual scenarios, allowing us to systematically quantify the influence of input numerical data and LLMs' pre-existing knowledge on their causal reasoning processes. Our newly developed experimental setup assesses LLMs' reliance on contextual information and inherent knowledge across various domains. Our evaluation reveals that LLMs' causal reasoning ability depends on the context and domain-specific knowledge provided, and supports the argument that "knowledge is, indeed, what LLMs principally require for sound causal reasoning". On the contrary, in the absence of knowledge, LLMs still maintain a degree of causal reasoning using the available numerical data, albeit with limitations in the calculations.

Title: ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge Graph. (arXiv:2401.00158v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00158
Code URL: null
Copy Paste: [[2401.00158]] ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge Graph(http://arxiv.org/abs/2401.00158)
Summary:
Question Answering over Knowledge Graph (KGQA) aims to seek answer entities for the natural language question from a large-scale Knowledge Graph~(KG). To better perform reasoning on KG, recent work typically adopts a pre-trained language model~(PLM) to model the question, and a graph neural network~(GNN) based module to perform multi-hop reasoning on the KG. Despite the effectiveness, due to the divergence in model architecture, the PLM and GNN are not closely integrated, limiting the knowledge sharing and fine-grained feature interactions. To solve it, we aim to simplify the above two-module approach, and develop a more capable PLM that can directly support subgraph reasoning for KGQA, namely ReasoningLM. In our approach, we propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning, and also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions. After adaptation, the PLM can be parameter-efficient fine-tuned on downstream tasks. Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/ReasoningLM}.

Title: Open-TI: Open Traffic Intelligence with Augmented Language Model. (arXiv:2401.00211v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00211
Code URL: null
Copy Paste: [[2401.00211]] Open-TI: Open Traffic Intelligence with Augmented Language Model(http://arxiv.org/abs/2401.00211)
Summary:
Transportation has greatly benefited the cities' development in the modern civilization process. Intelligent transportation, leveraging advanced computer algorithms, could further increase people's daily commuting efficiency. However, intelligent transportation, as a cross-discipline, often requires practitioners to comprehend complicated algorithms and obscure neural networks, bringing a challenge for the advanced techniques to be trusted and deployed in practical industries. Recognizing the expressiveness of the pre-trained large language models, especially the potential of being augmented with abilities to understand and execute intricate commands, we introduce Open-TI. Serving as a bridge to mitigate the industry-academic gap, Open-TI is an innovative model targeting the goal of Turing Indistinguishable Traffic Intelligence, it is augmented with the capability to harness external traffic analysis packages based on existing conversations. Marking its distinction, Open-TI is the first method capable of conducting exhaustive traffic analysis from scratch - spanning from map data acquisition to the eventual execution in complex simulations. Besides, Open-TI is able to conduct task-specific embodiment like training and adapting the traffic signal control policies (TSC), explore demand optimizations, etc. Furthermore, we explored the viability of LLMs directly serving as control agents, by understanding the expected intentions from Open-TI, we designed an agent-to-agent communication mode to support Open-TI conveying messages to ChatZero (control agent), and then the control agent would choose from the action space to proceed the execution. We eventually provide the formal implementation structure, and the open-ended design invites further community-driven enhancements.

Title: Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks. (arXiv:2401.00290v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00290
Code URL: null
Copy Paste: [[2401.00290]] Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks(http://arxiv.org/abs/2401.00290)
Summary:
We consider the problem of red teaming LLMs on elementary calculations and algebraic tasks to evaluate how various prompting techniques affect the quality of outputs. We present a framework to procedurally generate numerical questions and puzzles, and compare the results with and without the application of several red teaming techniques. Our findings suggest that even though structured reasoning and providing worked-out examples slow down the deterioration of the quality of answers, the gpt-3.5-turbo and gpt-4 models are not well suited for elementary calculations and reasoning tasks, also when being red teamed.

Title: A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models. (arXiv:2401.00544v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00544
Code URL: null
Copy Paste: [[2401.00544]] A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models(http://arxiv.org/abs/2401.00544)
Summary:
This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future deliberations. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization.

Title: AllSpark: a multimodal spatiotemporal general model. (arXiv:2401.00546v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00546
Code URL: null
Copy Paste: [[2401.00546]] AllSpark: a multimodal spatiotemporal general model(http://arxiv.org/abs/2401.00546)
Summary:
For a long time, due to the high heterogeneity in structure and semantics among various spatiotemporal modal data, the joint interpretation of multimodal spatiotemporal data has been an extremely challenging problem. The primary challenge resides in striking a trade-off between the cohesion and autonomy of diverse modalities, and this trade-off exhibits a progressively nonlinear nature as the number of modalities expands. We introduce the Language as Reference Framework (LaRF), a fundamental principle for constructing a multimodal unified model, aiming to strike a trade-off between the cohesion and autonomy among different modalities. We propose a multimodal spatiotemporal general artificial intelligence model, called AllSpark. Our model integrates thirteen different modalities into a unified framework, including 1D (text, code), 2D (RGB, infrared, SAR, multispectral, hyperspectral, tables, graphs, trajectory, oblique photography), and 3D (point clouds, videos) modalities. To achieve modal cohesion, AllSpark uniformly maps diverse modal features to the language modality. In addition, we design modality-specific prompts to guide multi-modal large language models in accurately perceiving multimodal data. To maintain modality autonomy, AllSpark introduces modality-specific encoders to extract the tokens of various spatiotemporal modalities. And modal bridge is employed to achieve dimensional projection from each modality to the language modality. Finally, observing a gap between the model's interpretation and downstream tasks, we designed task heads to enhance the model's generalization capability on specific downstream tasks. Experiments indicate that AllSpark achieves competitive accuracy in modalities such as RGB and trajectory compared to state-of-the-art models.

Title: Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing. (arXiv:2401.00579v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00579
Code URL: null
Copy Paste: [[2401.00579]] Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing(http://arxiv.org/abs/2401.00579)
Summary:
Large Language Models (LLMs), particularly those similar to ChatGPT, have significantly influenced the field of Natural Language Processing (NLP). While these models excel in general language tasks, their performance in domain-specific downstream tasks such as biomedical and clinical Named Entity Recognition (NER), Relation Extraction (RE), and Medical Natural Language Inference (NLI) is still evolving. In this context, our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately $200,000$ instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.

Title: Fairness in Serving Large Language Models. (arXiv:2401.00588v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00588
Code URL: null
Copy Paste: [[2401.00588]] Fairness in Serving Large Language Models(http://arxiv.org/abs/2401.00588)
Summary:
High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rate limits, to ensure that no client can dominate the request queue. However, this rudimentary notion of fairness also results in under-utilization of the resources and poor client experience when there is spare capacity. While there is a rich literature on fair scheduling, serving LLMs presents new challenges due to their unpredictable request lengths and their unique batching characteristics on parallel accelerators. This paper introduces the definition of LLM serving fairness based on a cost function that accounts for the number of input and output tokens processed. To achieve fairness in serving, we propose a novel scheduling algorithm, the Virtual Token Counter (VTC), a fair scheduler based on the continuous batching mechanism. We prove a 2x tight upper bound on the service difference between two backlogged clients, adhering to the requirement of work-conserving. Through extensive experiments, we demonstrate the superior performance of VTC in ensuring fairness, especially in contrast to other baseline methods, which exhibit shortcomings under various conditions.

Title: Large language model for Bible sentiment analysis: Sermon on the Mount. (arXiv:2401.00689v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00689
Code URL: null
Copy Paste: [[2401.00689]] Large language model for Bible sentiment analysis: Sermon on the Mount(http://arxiv.org/abs/2401.00689)
Summary:
The revolution of natural language processing via large language models has motivated its use in multidisciplinary areas that include social sciences and humanities and more specifically, comparative religion. Sentiment analysis provides a mechanism to study the emotions expressed in text. Recently, sentiment analysis has been used to study and compare translations of the Bhagavad Gita, which is a fundamental and sacred Hindu text. In this study, we use sentiment analysis for studying selected chapters of the Bible. These chapters are known as the Sermon on the Mount. We utilize a pre-trained language model for sentiment analysis by reviewing five translations of the Sermon on the Mount, which include the King James version, the New International Version, the New Revised Standard Version, the Lamsa Version, and the Basic English Version. We provide a chapter-by-chapter and verse-by-verse comparison using sentiment and semantic analysis and review the major sentiments expressed. Our results highlight the varying sentiments across the chapters and verses. We found that the vocabulary of the respective translations is significantly different. We detected different levels of humour, optimism, and empathy in the respective chapters that were used by Jesus to deliver his message.

Title: Large Language Models aren't all that you need. (arXiv:2401.00698v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00698
Code URL: null
Copy Paste: [[2401.00698]] Large Language Models aren't all that you need(http://arxiv.org/abs/2401.00698)
Summary:
This paper describes the architecture and systems built towards solving the SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity Recognition) [1]. We evaluate two approaches (a) a traditional Conditional Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a customized head and compare the two approaches. The novel ideas explored are: 1) Decaying auxiliary loss (with residual) - where we train the model on an auxiliary task of Coarse-Grained NER and include this task as a part of the loss function 2) Triplet token blending - where we explore ways of blending the embeddings of neighboring tokens in the final NER layer prior to prediction 3) Task-optimal heads - where we explore a variety of custom heads and learning rates for the final layer of the LLM. We also explore multiple LLMs including GPT-3 and experiment with a variety of dropout and other hyperparameter settings before arriving at our final model which achieves micro & macro f1 of 0.85/0.84 (on dev) and 0.67/0.61 on the test data . We show that while pre-trained LLMs, by themselves, bring about a large improvement in scores as compared to traditional models, we also demonstrate that tangible improvements to the Macro-F1 score can be made by augmenting the LLM with additional feature/loss/model engineering techniques described above.

Title: ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios. (arXiv:2401.00741v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00741
Code URL: null
Copy Paste: [[2401.00741]] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios(http://arxiv.org/abs/2401.00741)
Summary:
Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the intricate capabilities essential for LLMs to effectively utilize tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world. Evaluations involving ten LLMs across three categories reveal a preference for specific scenarios and limited cognitive abilities in tool learning. Intriguingly, expanding the model size even exacerbates the hindrance to tool learning. These findings offer instructive insights aimed at advancing the field of tool learning. The data is available att https://github.com/Junjie-Ye/ToolEyes.git.

Title: Temporal Validity Change Prediction. (arXiv:2401.00779v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00779
Code URL: null
Copy Paste: [[2401.00779]] Temporal Validity Change Prediction(http://arxiv.org/abs/2401.00779)
Summary:
Temporal validity is an important property of text that is useful for many downstream applications, such as recommender systems, conversational AI, or story understanding. Existing benchmarking tasks often require models to identify the temporal validity duration of a single statement. However, in many cases, additional contextual information, such as sentences in a story or posts on a social media profile, can be collected from the available text stream. This contextual information may greatly alter the duration for which a statement is expected to be valid. We propose Temporal Validity Change Prediction, a natural language processing task benchmarking the capability of machine learning models to detect contextual statements that induce such change. We create a dataset consisting of temporal target statements sourced from Twitter and crowdsource sample context statements. We then benchmark a set of transformer-based language models on our dataset. Finally, we experiment with temporal validity duration prediction as an auxiliary task to improve the performance of the state-of-the-art model.

Title: Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models. (arXiv:2401.00788v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00788
Code URL: null
Copy Paste: [[2401.00788]] Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models(http://arxiv.org/abs/2401.00788)
Summary:
The high cost of full-parameter fine-tuning (FFT) of Large Language Models (LLMs) has led to a series of parameter-efficient fine-tuning (PEFT) methods. However, it remains unclear which methods provide the best cost-performance trade-off at different model scales. We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters. Through investigations across 5 tasks and 8 different datasets encompassing both code comprehension and code generation tasks, we find that FFT generally leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale. LoRA usually offers the most favorable trade-off between cost and performance. Further investigation into the effects of these methods on both model robustness and code security reveals that larger models tend to demonstrate reduced robustness and less security. At last, we explore the relationships among updated parameters, cross-entropy loss, and task performance. We find that the tuning effectiveness observed in small models generalizes well to larger models, and the validation loss in instruction tuning can be a reliable indicator of overall downstream performance.

Title: Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education. (arXiv:2401.00832v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00832
Code URL: null
Copy Paste: [[2401.00832]] Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education(http://arxiv.org/abs/2401.00832)
Summary:
The integration of Artificial Intelligence (AI), particularly Large Language Model (LLM)-based systems, in education has shown promise in enhancing teaching and learning experiences. However, the advent of Multimodal Large Language Models (MLLMs) like GPT-4 with vision (GPT-4V), capable of processing multimodal data including text, sound, and visual inputs, opens a new era of enriched, personalized, and interactive learning landscapes in education. Grounded in theory of multimedia learning, this paper explores the transformative role of MLLMs in central aspects of science education by presenting exemplary innovative learning scenarios. Possible applications for MLLMs could range from content creation to tailored support for learning, fostering competencies in scientific practices, and providing assessment and feedback. These scenarios are not limited to text-based and uni-modal formats but can be multimodal, increasing thus personalization, accessibility, and potential learning effectiveness. Besides many opportunities, challenges such as data protection and ethical considerations become more salient, calling for robust frameworks to ensure responsible integration. This paper underscores the necessity for a balanced approach in implementing MLLMs, where the technology complements rather than supplants the educator's role, ensuring thus an effective and ethical use of AI in science education. It calls for further research to explore the nuanced implications of MLLMs on the evolving role of educators and to extend the discourse beyond science education to other disciplines. Through the exploration of potentials, challenges, and future implications, we aim to contribute to a preliminary understanding of the transformative trajectory of MLLMs in science education and beyond.

Title: The Problem of Alignment. (arXiv:2401.00210v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00210
Code URL: null
Copy Paste: [[2401.00210]] The Problem of Alignment(http://arxiv.org/abs/2401.00210)
Summary:
Large Language Models produce sequences learned as statistical patterns from large corpora. In order not to reproduce corpus biases, after initial training models must be aligned with human values, preferencing certain continuations over others. Alignment, which can be viewed as the superimposition of normative structure onto a statistical model, reveals a conflicted and complex interrelationship between language and technology. This relationship shapes theories of language, linguistic practice and subjectivity, which are especially relevant to the current sophistication in artificially produced text. We examine this practice of structuration as a two-way interaction between users and models by analysing how ChatGPT4 redacts perceived `anomalous' language in fragments of Joyce's Ulysses and the new linguistic practice of prompt engineering. We then situate this alignment problem historically, revisiting earlier postwar linguistic debates which counterposed two views of meaning: as discrete structures, and as continuous probability distributions. We discuss the largely occluded work of the Moscow Linguistic School, which sought to reconcile this opposition. Our attention to the Moscow School and later related arguments by Searle and Kristeva casts the problem of alignment in a new light: as one involving attention to the social structuration of linguistic practice, including structuration of anomalies that, like the Joycean text, exist in defiance of expressive conventions. These debates around the communicative orientation toward language can help explain some of the contemporary behaviours and interdependencies that take place between users and LLMs.

Title: Boosting Large Language Model for Speech Synthesis: An Empirical Study. (arXiv:2401.00246v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00246
Code URL: null
Copy Paste: [[2401.00246]] Boosting Large Language Model for Speech Synthesis: An Empirical Study(http://arxiv.org/abs/2401.00246)
Summary:
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prompting LLMs with perception abilities like auditory comprehension, and the effective approach for augmenting LLMs with speech synthesis capabilities remains ambiguous. In this paper, we conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E. We compare three integration methods between LLMs and speech synthesis models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder. Experimental results show that, using LoRA method to fine-tune LLMs directly to boost the speech synthesis capability does not work well, and superposed LLMs and VALL-E can improve the quality of generated speech both in speaker similarity and word error rate (WER). Among these three methods, coupled methods leveraging LLMs as the text encoder can achieve the best performance, making it outperform original speech synthesis models with a consistently better speaker similarity and a significant (10.9%) WER reduction.

Title: Evaluation is all you need. Prompting Generative Large Language Models for Annotation Tasks in the Social Sciences. A Primer using Open Models. (arXiv:2401.00284v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00284
Code URL: null
Copy Paste: [[2401.00284]] Evaluation is all you need(http://arxiv.org/abs/2401.00284)
Summary:
This paper explores the use of open generative Large Language Models (LLMs) for annotation tasks in the social sciences. The study highlights the challenges associated with proprietary models, such as limited reproducibility and privacy concerns, and advocates for the adoption of open (source) models that can be operated on independent devices. Two examples of annotation tasks, sentiment analysis in tweets and identification of leisure activities in childhood aspirational essays are provided. The study evaluates the performance of different prompting strategies and models (neural-chat-7b-v3-2, Starling-LM-7B-alpha, openchat_3.5, zephyr-7b-alpha and zephyr-7b-beta). The results indicate the need for careful validation and tailored prompt engineering. The study highlights the advantages of open models for data privacy and reproducibility.

Title: Improving Text Embeddings with Large Language Models. (arXiv:2401.00368v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00368
Code URL: null
Copy Paste: [[2401.00368]] Improving Text Embeddings with Large Language Models(http://arxiv.org/abs/2401.00368)
Summary:
In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelines or relying on manually collected datasets that are often constrained by task diversity and language coverage. We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across nearly 100 languages. We then fine-tune open-source decoder-only LLMs on the synthetic data using standard contrastive loss. Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, our model sets new state-of-the-art results on the BEIR and MTEB benchmarks.

Title: FusionMind -- Improving question and answering with external context fusion. (arXiv:2401.00388v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00388
Code URL: null
Copy Paste: [[2401.00388]] FusionMind -- Improving question and answering with external context fusion(http://arxiv.org/abs/2401.00388)
Summary:
Answering questions using pre-trained language models (LMs) and knowledge graphs (KGs) presents challenges in identifying relevant knowledge and performing joint reasoning.We compared LMs (fine-tuned for the task) with the previously published QAGNN method for the Question-answering (QA) objective and further measured the impact of additional factual context on the QAGNN performance. The QAGNN method employs LMs to encode QA context and estimate KG node importance, and effectively update the question choice entity representations using Graph Neural Networks (GNNs). We further experimented with enhancing the QA context encoding by incorporating relevant knowledge facts for the question stem. The models are trained on the OpenbookQA dataset, which contains ~6000 4-way multiple choice questions and is widely used as a benchmark for QA tasks. Through our experimentation, we found that incorporating knowledge facts context led to a significant improvement in performance. In contrast, the addition of knowledge graphs to language models resulted in only a modest increase. This suggests that the integration of contextual knowledge facts may be more impactful for enhancing question answering performance compared to solely adding knowledge graphs.

Title: RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models. (arXiv:2401.00396v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00396
Code URL: null
Copy Paste: [[2401.00396]] RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models(http://arxiv.org/abs/2401.00396)
Summary:
Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs). Despite the integration of RAG, LLMs may still present unsupported or contradictory claims to the retrieved contents. In order to develop effective hallucination prevention strategies under RAG, it is important to create benchmark datasets that can measure the extent of hallucination. This paper presents RAGTruth, a corpus tailored for analyzing word-level hallucinations in various domains and tasks within the standard RAG frameworks for LLM applications. RAGTruth comprises nearly 18,000 naturally generated responses from diverse LLMs using RAG. These responses have undergone meticulous manual annotations at both the individual cases and word levels, incorporating evaluations of hallucination intensity. We not only benchmark hallucination frequencies across different LLMs, but also critically assess the effectiveness of several existing hallucination detection methodologies. Furthermore, we show that using a high-quality dataset such as RAGTruth, it is possible to finetune a relatively small LLM and achieve a competitive level of performance in hallucination detection when compared to the existing prompt-based approaches using state-of-the-art large language models such as GPT-4.

Title: SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection. (arXiv:2401.00424v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00424
Code URL: null
Copy Paste: [[2401.00424]] SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection(http://arxiv.org/abs/2401.00424)
Summary:
Multi-modal intent detection aims to utilize various modalities to understand the user's intentions, which is essential for the deployment of dialogue systems in real-world scenarios. The two core challenges for multi-modal intent detection are (1) how to effectively align and fuse different features of modalities and (2) the limited labeled multi-modal intent training data. In this work, we introduce a shallow-to-deep interaction framework with data augmentation (SDIF-DA) to address the above challenges. Firstly, SDIF-DA leverages a shallow-to-deep interaction module to progressively and effectively align and fuse features across text, video, and audio modalities. Secondly, we propose a ChatGPT-based data augmentation approach to automatically augment sufficient training data. Experimental results demonstrate that SDIF-DA can effectively align and fuse multi-modal features by achieving state-of-the-art performance. In addition, extensive analyses show that the introduced data augmentation approach can successfully distill knowledge from the large language model.

Title: GeoGalactica: A Scientific Large Language Model in Geoscience. (arXiv:2401.00434v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00434
Code URL: null
Copy Paste: [[2401.00434]] GeoGalactica: A Scientific Large Language Model in Geoscience(http://arxiv.org/abs/2401.00434)
Summary:
Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utilizing NLP techniques in geoscience research and practice is wide and convoluted, contributing from knowledge extraction and document classification to question answering and knowledge discovery. In this work, we take the initial step to leverage LLM for science, through a rather straightforward approach. We try to specialize an LLM into geoscience, by further pre-training the model with a vast amount of texts in geoscience, as well as supervised fine-tuning (SFT) the resulting model with our custom collected instruction tuning dataset. These efforts result in a model GeoGalactica consisting of 30 billion parameters. To our best knowledge, it is the largest language model for the geoscience domain. More specifically, GeoGalactica is from further pre-training of Galactica. We train GeoGalactica over a geoscience-related text corpus containing 65 billion tokens curated from extensive data sources in the big science project Deep-time Digital Earth (DDE), preserving as the largest geoscience-specific text corpus. Then we fine-tune the model with 1 million pairs of instruction-tuning data consisting of questions that demand professional geoscience knowledge to answer. In this technical report, we will illustrate in detail all aspects of GeoGalactica, including data collection, data cleaning, base model selection, pre-training, SFT, and evaluation. We open-source our data curation tools and the checkpoints of GeoGalactica during the first 3/4 of pre-training.

Title: BatchEval: Towards Human-like Text Evaluation. (arXiv:2401.00437v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00437
Code URL: null
Copy Paste: [[2401.00437]] BatchEval: Towards Human-like Text Evaluation(http://arxiv.org/abs/2401.00437)
Summary:
Significant progress has been made in automatic text evaluation with the introduction of large language models (LLMs) as evaluators. However, current sample-wise evaluation paradigm suffers from the following issues: (1) Sensitive to prompt design; (2) Poor resistance to noise; (3) Inferior ensemble performance with static reference. Inspired by the fact that humans treat both criterion definition and inter sample comparison as references for evaluation, we propose BatchEval, a paradigm that conducts batch-wise evaluation iteratively to alleviate the above problems. We explore variants under this paradigm and confirm the optimal settings are two stage procedure with heterogeneous batch composition strategy and decimal scoring format. Comprehensive experiments across 3 LLMs on 4 text evaluation tasks demonstrate that BatchEval outperforms state-of-the-art methods by 10.5% on Pearson correlations with only 64% API cost on average. Further analyses have been conducted to verify the robustness, generalization, and working mechanism of BatchEval.

Title: Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws. (arXiv:2401.00448v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00448
Code URL: null
Copy Paste: [[2401.00448]] Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws(http://arxiv.org/abs/2401.00448)
Summary:
Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including the popular DeepMind Chinchilla scaling laws, neglect to include the cost of inference. We modify the Chinchilla scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand. We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.

Title: HSC-GPT: A Large Language Model for Human Settlements Construction. (arXiv:2401.00504v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00504
Code URL: null
Copy Paste: [[2401.00504]] HSC-GPT: A Large Language Model for Human Settlements Construction(http://arxiv.org/abs/2401.00504)
Summary:
The field of human settlement construction encompasses a range of spatial designs and management tasks, including urban planning and landscape architecture design. These tasks involve a plethora of instructions and descriptions presented in natural language, which are essential for understanding design requirements and producing effective design solutions. Recent research has sought to integrate natural language processing (NLP) and generative artificial intelligence (AI) into human settlement construction tasks. Due to the efficient processing and analysis capabilities of AI with data, significant successes have been achieved in design within this domain. However, this task still faces several fundamental challenges. The semantic information involved includes complex spatial details, diverse data source formats, high sensitivity to regional culture, and demanding requirements for innovation and rigor in work scenarios. These factors lead to limitations when applying general generative AI in this field, further exacerbated by a lack of high-quality data for model training. To address these challenges, this paper first proposes HSC-GPT, a large-scale language model framework specifically designed for tasks in human settlement construction, considering the unique characteristics of this domain.

Title: Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets. (arXiv:2401.00575v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00575
Code URL: null
Copy Paste: [[2401.00575]] Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets(http://arxiv.org/abs/2401.00575)
Summary:
We propose a semi-supervised text classifier based on self-training using one positive and one negative property of neural networks. One of the weaknesses of self-training is the semantic drift problem, where noisy pseudo-labels accumulate over iterations and consequently the error rate soars. In order to tackle this challenge, we reshape the role of pseudo-labels and create a hierarchical order of information. In addition, a crucial step in self-training is to use the classifier confidence prediction to select the best candidate pseudo-labels. This step cannot be efficiently done by neural networks, because it is known that their output is poorly calibrated. To overcome this challenge, we propose a hybrid metric to replace the plain confidence measurement. Our metric takes into account the prediction uncertainty via a subsampling technique. We evaluate our model in a set of five standard benchmarks, and show that it significantly outperforms a set of ten diverse baseline models. Furthermore, we show that the improvement achieved by our model is additive to language model pretraining, which is a widely used technique for using unlabeled documents. Our code is available at https://github.com/p-karisani/RST.

Title: An Analysis of Embedding Layers and Similarity Scores using Siamese Neural Networks. (arXiv:2401.00582v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00582
Code URL: null
Copy Paste: [[2401.00582]] An Analysis of Embedding Layers and Similarity Scores using Siamese Neural Networks(http://arxiv.org/abs/2401.00582)
Summary:
Large Lanugage Models (LLMs) are gaining increasing popularity in a variety of use cases, from language understanding and writing to assistance in application development. One of the most important aspects for optimal funcionality of LLMs is embedding layers. Word embeddings are distributed representations of words in a continuous vector space. In the context of LLMs, words or tokens from the input text are transformed into high-dimensional vectors using unique algorithms specific to the model. Our research examines the embedding algorithms from leading companies in the industry, such as OpenAI, Google's PaLM, and BERT. Using medical data, we have analyzed similarity scores of each embedding layer, observing differences in performance among each algorithm. To enhance each model and provide an additional encoding layer, we also implemented Siamese Neural Networks. After observing changes in performance with the addition of the model, we measured the carbon footage per epoch of training. The carbon footprint associated with large language models (LLMs) is a significant concern, and should be taken into consideration when selecting algorithms for a variety of use cases. Overall, our research compared the accuracy different, leading embedding algorithms and their carbon footage, allowing for a holistic review of each embedding algorithm.

Title: Predicting Anti-microbial Resistance using Large Language Models. (arXiv:2401.00642v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00642
Code URL: null
Copy Paste: [[2401.00642]] Predicting Anti-microbial Resistance using Large Language Models(http://arxiv.org/abs/2401.00642)
Summary:
During times of increasing antibiotic resistance and the spread of infectious diseases like COVID-19, it is important to classify genes related to antibiotic resistance. As natural language processing has advanced with transformer-based language models, many language models that learn characteristics of nucleotide sequences have also emerged. These models show good performance in classifying various features of nucleotide sequences. When classifying nucleotide sequences, not only the sequence itself, but also various background knowledge is utilized. In this study, we use not only a nucleotide sequence-based language model but also a text language model based on PubMed articles to reflect more biological background knowledge in the model. We propose a method to fine-tune the nucleotide sequence language model and the text language model based on various databases of antibiotic resistance genes. We also propose an LLM-based augmentation technique to supplement the data and an ensemble method to effectively combine the two models. We also propose a benchmark for evaluating the model. Our method achieved better performance than the nucleotide sequence language model in the drug resistance class prediction.

Title: Benchmarking Large Language Models on Controllable Generation under Diversified Instructions. (arXiv:2401.00690v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00690
Code URL: null
Copy Paste: [[2401.00690]] Benchmarking Large Language Models on Controllable Generation under Diversified Instructions(http://arxiv.org/abs/2401.00690)
Summary:
While large language models (LLMs) have exhibited impressive instruction-following capabilities, it is still unclear whether and to what extent they can respond to explicit constraints that might be entailed in various instructions. As a significant aspect of LLM alignment, it is thus important to formulate such a specialized set of instructions as well as investigate the resulting behavior of LLMs. To address this vacancy, we propose a new benchmark CoDI-Eval to systematically and comprehensively evaluate LLMs' responses to instructions with various constraints. We construct a large collection of constraints-attributed instructions as a test suite focused on both generalization and coverage. Specifically, we advocate an instruction diversification process to synthesize diverse forms of constraint expression and also deliberate the candidate task taxonomy with even finer-grained sub-categories. Finally, we automate the entire evaluation process to facilitate further developments. Different from existing studies on controllable text generation, CoDI-Eval extends the scope to the prevalent instruction-following paradigm for the first time. We provide extensive evaluations of representative LLMs (e.g., ChatGPT, Vicuna) on CoDI-Eval, revealing their limitations in following instructions with specific constraints and there is still a significant gap between open-source and commercial closed-source LLMs. We believe this benchmark will facilitate research into improving the controllability of LLMs' responses to instructions. Our data and code are available at https://github.com/Xt-cyh/CoDI-Eval.

Title: SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models. (arXiv:2401.00793v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00793
Code URL: null
Copy Paste: [[2401.00793]] SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models(http://arxiv.org/abs/2401.00793)
Summary:
With the growing use of large language models hosted on cloud platforms to offer inference services, privacy concerns are escalating, especially concerning sensitive data like investment plans and bank account details. Secure Multi-Party Computing (SMPC) emerges as a promising solution to protect the privacy of inference data and model parameters. However, the application of SMPC in Privacy-Preserving Inference (PPI) for large language models, particularly those based on the Transformer architecture, often leads to considerable slowdowns or declines in performance. This is largely due to the multitude of nonlinear operations in the Transformer architecture, which are not well-suited to SMPC and are difficult to circumvent or optimize effectively. To address this concern, we introduce an advanced optimization framework called SecFormer, designed to strike an optimal balance between performance and efficiency in PPI for Transformer models. By implementing knowledge distillation techniques, we successfully eliminate the high-cost exponential and maximum operations in PPI without sacrificing model performance. Additionally, we have developed a suite of efficient SMPC protocols that utilize segmented polynomials and Goldschmidt's method to handle other complex nonlinear functions within PPI, such as GeLU, LayerNorm, and Softmax. Our extensive experiments reveal that SecFormer outperforms MPCFormer in performance, showing improvements of $5.6\%$ and $24.2\%$ for BERT$_{\text{BASE}}$ and BERT$_{\text{LARGE}}$, respectively. In terms of efficiency, SecFormer is 3.4 and 3.2 times faster than Puma, demonstrating its effectiveness and speed.

Title: If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents. (arXiv:2401.00812v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00812
Code URL: null
Copy Paste: [[2401.00812]] If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents(http://arxiv.org/abs/2401.00812)
Summary:
The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs' training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code.

Title: Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models. (arXiv:2401.00625v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00625
Code URL: null
Copy Paste: [[2401.00625]] Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models(http://arxiv.org/abs/2401.00625)
Summary:
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs. We categorize methods based on their optimization focus: computational, memory, energy, financial, and network resources and their applicability across various stages of an LLM's lifecycle, including architecture design, pretraining, finetuning, and system design. Additionally, the survey introduces a nuanced categorization of resource efficiency techniques by their specific resource types, which uncovers the intricate relationships and mappings between various resources and corresponding optimization techniques. A standardized set of evaluation metrics and datasets is also presented to facilitate consistent and fair comparisons across different models and techniques. By offering a comprehensive overview of the current sota and identifying open research avenues, this survey serves as a foundational reference for researchers and practitioners, aiding them in developing more sustainable and efficient LLMs in a rapidly evolving landscape.

gpt

Title: GraphGPT: Graph Learning with Generative Pre-trained Transformers. (arXiv:2401.00529v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00529
Code URL: null
Copy Paste: [[2401.00529]] GraphGPT: Graph Learning with Generative Pre-trained Transformers(http://arxiv.org/abs/2401.00529)
Summary:
We introduce \textit{GraphGPT}, a novel model for Graph learning by self-supervised Generative Pre-training Transformers. Our model transforms each graph or sampled subgraph into a sequence of tokens representing the node, edge and attributes reversibly using the Eulerian path first. Then we feed the tokens into a standard transformer decoder and pre-train it with the next-token-prediction (NTP) task. Lastly, we fine-tune the GraphGPT model with the supervised tasks. This intuitive, yet effective model achieves superior or close results to the state-of-the-art methods for the graph-, edge- and node-level tasks on the large scale molecular dataset PCQM4Mv2, the protein-protein association dataset ogbl-ppa and the ogbn-proteins dataset from the Open Graph Benchmark (OGB). Furthermore, the generative pre-training enables us to train GraphGPT up to 400M+ parameters with consistently increasing performance, which is beyond the capability of GNNs and previous graph transformers. The source code and pre-trained checkpoints will be released soon\footnote{\url{https://github.com/alibaba/graph-gpt}} to pave the way for the graph foundation model research, and also to assist the scientific discovery in pharmaceutical, chemistry, material and bio-informatics domains, etc.

llm

Title: LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning. (arXiv:2401.00125v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00125
Code URL: null
Copy Paste: [[2401.00125]] LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning(http://arxiv.org/abs/2401.00125)
Summary:
Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that require complex driving maneuvers. To address these limitations, we investigate the possibility of leveraging the common-sense reasoning capabilities of Large Language Models (LLMs) such as GPT4 and Llama2 to generate plans for self-driving vehicles. In particular, we develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Guided by commonsense reasoning abilities of LLMs, our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach. Through extensive evaluation on the nuPlan benchmark, we achieve state-of-the-art performance, outperforming all existing pure learning- and rule-based methods across most metrics. Our code will be available at https://llmassist.github.io.

Title: keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM. (arXiv:2401.00426v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00426
Code URL: null
Copy Paste: [[2401.00426]] keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM(http://arxiv.org/abs/2401.00426)
Summary:
Large language models (LLMs) have exhibited remarkable performance on various natural language processing (NLP) tasks, especially for question answering. However, in the face of problems beyond the scope of knowledge, these LLMs tend to talk nonsense with a straight face, where the potential solution could be incorporating an Information Retrieval (IR) module and generating response based on these retrieved knowledge. In this paper, we present a novel framework to assist LLMs, such as ChatGPT, to retrieve question-related structured information on the knowledge graph, and demonstrate that Knowledge-based question answering (Keqing) could be a nature Chain-of-Thought (CoT) mentor to guide the LLM to sequentially find the answer entities of a complex question through interpretable logical chains. Specifically, the workflow of Keqing will execute decomposing a complex question according to predefined templates, retrieving candidate entities on knowledge graph, reasoning answers of sub-questions, and finally generating response with reasoning paths, which greatly improves the reliability of LLM's response. The experimental results on KBQA datasets show that Keqing can achieve competitive performance and illustrate the logic of answering each question.

Title: The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness. (arXiv:2401.00287v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00287
Code URL: null
Copy Paste: [[2401.00287]] The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness(http://arxiv.org/abs/2401.00287)
Summary:
As Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications, their safety concerns become critical areas of NLP research. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark: a collection of diverse safe and unsafe prompts with carefully designed evaluation methods that facilitate systematic evaluation, comparison, and analysis over 'safety' and 'over-defensiveness.' With SODE, we study a variety of LLM defense strategies over multiple state-of-the-art LLMs, which reveals several interesting and important findings, such as (a) the widely popular 'self-checking' techniques indeed improve the safety against unsafe inputs, but this comes at the cost of extreme over-defensiveness on the safe inputs, (b) providing a safety instruction along with in-context exemplars (of both safe and unsafe inputs) consistently improves safety and also mitigates undue over-defensiveness of the models, (c) providing contextual knowledge easily breaks the safety guardrails and makes the models more vulnerable to generating unsafe responses. Overall, our work reveals numerous such critical findings that we believe will pave the way and facilitate further research in improving the safety of LLMs.

Title: State of What Art? A Call for Multi-Prompt LLM Evaluation. (arXiv:2401.00595v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00595
Code URL: null
Copy Paste: [[2401.00595]] State of What Art? A Call for Multi-Prompt LLM Evaluation(http://arxiv.org/abs/2401.00595)
Summary:
Recent advances in large language models (LLMs) have led to the development of various evaluation benchmarks. These benchmarks typically rely on a single instruction template for evaluating all LLMs on a specific task. In this paper, we comprehensively analyze the brittleness of results obtained via single-prompt evaluations across 6.5M instances, involving 20 different LLMs and 39 tasks from 3 benchmarks. To improve robustness of the analysis, we propose to evaluate LLMs with a set of diverse prompts instead. We discuss tailored evaluation metrics for specific use cases (e.g., LLM developers vs. developers interested in a specific downstream task), ensuring a more reliable and meaningful assessment of LLM capabilities. We then implement these criteria and conduct evaluations of multiple models, providing insights into the true strengths and limitations of current LLMs.

Title: A Computational Framework for Behavioral Assessment of LLM Therapists. (arXiv:2401.00820v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00820
Code URL: null
Copy Paste: [[2401.00820]] A Computational Framework for Behavioral Assessment of LLM Therapists(http://arxiv.org/abs/2401.00820)
Summary:
The emergence of ChatGPT and other large language models (LLMs) has greatly increased interest in utilizing LLMs as therapists to support individuals struggling with mental health challenges. However, due to the lack of systematic studies, our understanding of how LLM therapists behave, i.e., ways in which they respond to clients, is significantly limited. Understanding their behavior across a wide range of clients and situations is crucial to accurately assess their capabilities and limitations in the high-risk setting of mental health, where undesirable behaviors can lead to severe consequences. In this paper, we propose BOLT, a novel computational framework to study the conversational behavior of LLMs when employed as therapists. We develop an in-context learning method to quantitatively measure the behavior of LLMs based on 13 different psychotherapy techniques including reflections, questions, solutions, normalizing, and psychoeducation. Subsequently, we compare the behavior of LLM therapists against that of high- and low-quality human therapy, and study how their behavior can be modulated to better reflect behaviors observed in high-quality therapy. Our analysis of GPT and Llama-variants reveals that these LLMs often resemble behaviors more commonly exhibited in low-quality therapy rather than high-quality therapy, such as offering a higher degree of problem-solving advice when clients share emotions, which is against typical recommendations. At the same time, unlike low-quality therapy, LLMs reflect significantly more upon clients' needs and strengths. Our analysis framework suggests that despite the ability of LLMs to generate anecdotal examples that appear similar to human therapists, LLM therapists are currently not fully consistent with high-quality care, and thus require additional research to ensure quality care.

Title: KAXAI: An Integrated Environment for Knowledge Analysis and Explainable AI. (arXiv:2401.00193v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00193
Code URL: null
Copy Paste: [[2401.00193]] KAXAI: An Integrated Environment for Knowledge Analysis and Explainable AI(http://arxiv.org/abs/2401.00193)
Summary:
In order to fully harness the potential of machine learning, it is crucial to establish a system that renders the field more accessible and less daunting for individuals who may not possess a comprehensive understanding of its intricacies. The paper describes the design of a system that integrates AutoML, XAI, and synthetic data generation to provide a great UX design for users. The system allows users to navigate and harness the power of machine learning while abstracting its complexities and providing high usability. The paper proposes two novel classifiers, Logistic Regression Forest and Support Vector Tree, for enhanced model performance, achieving 96\% accuracy on a diabetes dataset and 93\% on a survey dataset. The paper also introduces a model-dependent local interpreter called MEDLEY and evaluates its interpretation against LIME, Greedy, and Parzen. Additionally, the paper introduces LLM-based synthetic data generation, library-based data generation, and enhancing the original dataset with GAN. The findings on synthetic data suggest that enhancing the original dataset with GAN is the most reliable way to generate synthetic data, as evidenced by KS tests, standard deviation, and feature importance. The authors also found that GAN works best for quantitative datasets.

long context

lora

Title: Modeling arousal potential of epistemic emotions using Bayesian information gain: Inquiry cycle driven by free energy fluctuations. (arXiv:2401.00007v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00007
Code URL: null
Copy Paste: [[2401.00007]] Modeling arousal potential of epistemic emotions using Bayesian information gain: Inquiry cycle driven by free energy fluctuations(http://arxiv.org/abs/2401.00007)
Summary:
Epistemic emotions, such as curiosity and interest, drive the inquiry process. This study proposes a novel formulation of epistemic emotions such as curiosity and interest using two types of information gain generated by the principle of free energy minimization: Kullback-Leibler divergence(KLD) from Bayesian posterior to prior, which represents free energy reduction in recognition, and Bayesian surprise (BS), which represents the expected information gain by Bayesian prior update. By applying a Gaussian generative model with an additional uniform likelihood, we found that KLD and BS form an upward-convex function of surprise (minimized free energy and prediction error), similar to Berlyne's arousal potential functions, or the Wundt curve. We consider that the alternate maximization of BS and KLD generates an ideal inquiry cycle to approach the optimal arousal level with fluctuations in surprise, and that curiosity and interest drive to facilitate the cyclic process. We exhaustively analyzed the effects of prediction uncertainty (prior variance) and observation uncertainty (likelihood variance) on the peaks of the information gain function as optimal surprises. The results show that greater prediction uncertainty, meaning an open-minded attitude, and less observational uncertainty, meaning precise observation with attention, are expected to provide greater information gains through a greater range of exploration. The proposed mathematical framework unifies the free energy principle of the brain and the arousal potential theory to explain the Wundt curve as an information gain function and suggests an ideal inquiry process driven by epistemic emotions.

Title: Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations. (arXiv:2401.00162v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00162
Code URL: null
Copy Paste: [[2401.00162]] Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations(http://arxiv.org/abs/2401.00162)
Summary:
The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized temporal credit assignment (CA) to achieve impressive results in multiple hard tasks. However, many CA methods relied on complex architectures or introduced sensitive hyperparameters to estimate the impact of state-action pairs. Meanwhile, the premise of the feasibility of CA methods is to obtain trajectories with sparse rewards, which can be troublesome in sparse-reward environments with large state spaces. To tackle these problems, we propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG) that leverages a small set of sparse-reward demonstrations to make reliable and effective long-term credit assignments while efficiently facilitating exploration. The key idea is that the relative impact of state-action pairs can be indirectly estimated using offline demonstrations rather than directly leveraging the sparse reward trajectories generated by the agent. Specifically, we first obtain the trajectory importance by considering both the trajectory-level distance to demonstrations and the returns of the relevant trajectories. Then, the guidance reward is calculated for each state-action pair by smoothly averaging the importance of the trajectories through it, merging the demonstration's distribution and reward information. We theoretically analyze the performance improvement bound caused by smooth guidance rewards and derive a new worst-case lower bound on the performance improvement. Extensive results demonstrate POSG's significant advantages in control performance and convergence speed compared to benchmark DRL algorithms. Notably, the specific metrics and quantifiable results are investigated to demonstrate the superiority of POSG.

Title: Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles. (arXiv:2401.00243v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00243
Code URL: null
Copy Paste: [[2401.00243]] Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles(http://arxiv.org/abs/2401.00243)
Summary:
Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs). However, a notable challenge in RLHF is overoptimization, where beyond a certain threshold, the pursuit of higher rewards leads to a decline in human preferences. In this paper, we observe the weakness of KL regularization which is commonly employed in existing RLHF methods to address overoptimization. To mitigate this limitation, we scrutinize the RLHF objective in the offline dataset and propose uncertainty-penalized RLHF (UP-RLHF), which incorporates uncertainty regularization during RL-finetuning. To enhance the uncertainty quantification abilities for reward models, we first propose a diverse low-rank adaptation (LoRA) ensemble by maximizing the nuclear norm of LoRA matrix concatenations. Then we optimize policy models utilizing penalized rewards, determined by both rewards and uncertainties provided by the diverse reward LoRA ensembles. Our experimental results, based on two real human preference datasets, showcase the effectiveness of diverse reward LoRA ensembles in quantifying reward uncertainty. Additionally, uncertainty regularization in UP-RLHF proves to be pivotal in mitigating overoptimization, thereby contributing to the overall performance.

Title: Client-wise Modality Selection for Balanced Multi-modal Federated Learning. (arXiv:2401.00403v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00403
Code URL: null
Copy Paste: [[2401.00403]] Client-wise Modality Selection for Balanced Multi-modal Federated Learning(http://arxiv.org/abs/2401.00403)
Summary:
Selecting proper clients to participate in the iterative federated learning (FL) rounds is critical to effectively harness a broad range of distributed datasets. Existing client selection methods simply consider the variability among FL clients with uni-modal data, however, have yet to consider clients with multi-modalities. We reveal that traditional client selection scheme in MFL may suffer from a severe modality-level bias, which impedes the collaborative exploitation of multi-modal data, leading to insufficient local data exploration and global aggregation. To tackle this challenge, we propose a Client-wise Modality Selection scheme for MFL (CMSFed) that can comprehensively utilize information from each modality via avoiding such client selection bias caused by modality imbalance. Specifically, in each MFL round, the local data from different modalities are selectively employed to participate in local training and aggregation to mitigate potential modality imbalance of the global model. To approximate the fully aggregated model update in a balanced way, we introduce a novel local training loss function to enhance the weak modality and align the divergent feature spaces caused by inconsistent modality adoption strategies for different clients simultaneously. Then, a modality-level gradient decoupling method is designed to derive respective submodular functions to maintain the gradient diversity during the selection progress and balance MFL according to local modality imbalance in each iteration. Our extensive experiments showcase the superiority of CMSFed over baselines and its effectiveness in multi-modal data exploitation.

Title: Viz: A QLoRA-based Copyright Marketplace for Legally Compliant Generative AI. (arXiv:2401.00503v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00503
Code URL: null
Copy Paste: [[2401.00503]] Viz: A QLoRA-based Copyright Marketplace for Legally Compliant Generative AI(http://arxiv.org/abs/2401.00503)
Summary:
This paper aims to introduce and analyze the Viz system in a comprehensive way, a novel system architecture that integrates Quantized Low-Rank Adapters (QLoRA) to fine-tune large language models (LLM) within a legally compliant and resource efficient marketplace. Viz represents a significant contribution to the field of artificial intelligence, particularly in addressing the challenges of computational efficiency, legal compliance, and economic sustainability in the utilization and monetization of LLMs. The paper delineates the scholarly discourse and developments that have informed the creation of Viz, focusing primarily on the advancements in LLM models, copyright issues in AI training (NYT case, 2023), and the evolution of model fine-tuning techniques, particularly low-rank adapters and quantized low-rank adapters, to create a sustainable and economically compliant framework for LLM utilization. The economic model it proposes benefits content creators, AI developers, and end-users, delineating a harmonious integration of technology, economy, and law, offering a comprehensive solution to the complex challenges of today's AI landscape.

hallucination

prompt

code

Title: Consciousness as a logically consistent and prognostic model of reality. (arXiv:2401.00005v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00005
Code URL: null
Copy Paste: [[2401.00005]] Consciousness as a logically consistent and prognostic model of reality(http://arxiv.org/abs/2401.00005)
Summary:
The work demonstrates that brain might reflect the external world causal relationships in the form of a logically consistent and prognostic model of reality, which shows up as consciousness. The paper analyses and solves the problem of statistical ambiguity and provides a formal model of causal relationships as probabilistic maximally specific rules. We suppose that brain makes all possible inferences from causal relationships. We prove that the suggested formal model has a property of an unambiguous inference: from consistent premises we infer a consistent conclusion. It enables a set of all inferences to form a consistent model of the perceived world. Causal relationships may create fixed points of cyclic inter-predictable properties. We consider the "natural" classification introduced by John St. Mill and demonstrate that a variety of fixed points of the objects' attributes forms a "natural" classification of the external world. Then we consider notions of "natural" categories and causal models of categories, introduced by Eleanor Rosch and Bob Rehder and demonstrate that fixed points of causal relationships between objects attributes, which we perceive, formalize these notions. If the "natural" classification describes the objects of the external world, and "natural" concepts the perception of these objects, then the theory of integrated information, introduced by G. Tononi, describes the information processes of the brain for "natural" concepts formation that reflects the "natural" classification. We argue that integrated information provides high accuracy of the objects identification. A computer-based experiment is provided that illustrates fixed points formation for coded digits.

Title: HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes. (arXiv:2401.00365v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00365
Code URL: null
Copy Paste: [[2401.00365]] HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes(http://arxiv.org/abs/2401.00365)
Summary:
Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the codebook is not efficiently used to express the data, and hence degrades reconstruction accuracy. To mitigate this problem, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE), and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validated HQ-VAE in terms of its applicability to a different modality with an audio dataset.

Title: Graph-Convolutional Autoencoder Ensembles for the Humanities, Illustrated with a Study of the American Slave Trade. (arXiv:2401.00824v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00824
Code URL: null
Copy Paste: [[2401.00824]] Graph-Convolutional Autoencoder Ensembles for the Humanities, Illustrated with a Study of the American Slave Trade(http://arxiv.org/abs/2401.00824)
Summary:
We introduce a graph-aware autoencoder ensemble framework, with associated formalisms and tooling, designed to facilitate deep learning for scholarship in the humanities. By composing sub-architectures to produce a model isomorphic to a humanistic domain we maintain interpretability while providing function signatures for each sub-architectural choice, allowing both traditional and computational researchers to collaborate without disrupting established practices. We illustrate a practical application of our approach to a historical study of the American post-Atlantic slave trade, and make several specific technical contributions: a novel hybrid graph-convolutional autoencoder mechanism, batching policies for common graph topologies, and masking techniques for particular use-cases. The effectiveness of the framework for broadening participation of diverse domains is demonstrated by a growing suite of two dozen studies, both collaborations with humanists and established tasks from machine learning literature, spanning a variety of fields and data modalities. We make performance comparisons of several different architectural choices and conclude with an ambitious list of imminent next steps for this research.

Title: Enabling Smart Retrofitting and Performance Anomaly Detection for a Sensorized Vessel: A Maritime Industry Experience. (arXiv:2401.00112v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00112
Code URL: null
Copy Paste: [[2401.00112]] Enabling Smart Retrofitting and Performance Anomaly Detection for a Sensorized Vessel: A Maritime Industry Experience(http://arxiv.org/abs/2401.00112)
Summary:
The integration of sensorized vessels, enabling real-time data collection and machine learning-driven data analysis marks a pivotal advancement in the maritime industry. This transformative technology not only can enhance safety, efficiency, and sustainability but also usher in a new era of cost-effective and smart maritime transportation in our increasingly interconnected world. This study presents a deep learning-driven anomaly detection system augmented with interpretable machine learning models for identifying performance anomalies in an industrial sensorized vessel, called TUCANA. We Leverage a human-in-the-loop unsupervised process that involves utilizing standard and Long Short-Term Memory (LSTM) autoencoders augmented with interpretable surrogate models, i.e., random forest and decision tree, to add transparency and interpretability to the results provided by the deep learning models. The interpretable models also enable automated rule generation for translating the inference into human-readable rules. Additionally, the process also includes providing a projection of the results using t-distributed stochastic neighbor embedding (t-SNE), which helps with a better understanding of the structure and relationships within the data and assessment of the identified anomalies. We empirically evaluate the system using real data acquired from the vessel TUCANA and the results involve achieving over 80% precision and 90% recall with the LSTM model used in the process. The interpretable models also provide logical rules aligned with expert thinking, and the t-SNE-based projection enhances interpretability. Our system demonstrates that the proposed approach can be used effectively in real-world scenarios, offering transparency and precision in performance anomaly detection.

Title: Saliency-Aware Regularized Graph Neural Network. (arXiv:2401.00755v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00755
Code URL: null
Copy Paste: [[2401.00755]] Saliency-Aware Regularized Graph Neural Network(http://arxiv.org/abs/2401.00755)
Summary:
The crux of graph classification lies in the effective representation learning for the entire graph. Typical graph neural networks focus on modeling the local dependencies when aggregating features of neighboring nodes, and obtain the representation for the entire graph by aggregating node features. Such methods have two potential limitations: 1) the global node saliency w.r.t. graph classification is not explicitly modeled, which is crucial since different nodes may have different semantic relevance to graph classification; 2) the graph representation directly aggregated from node features may have limited effectiveness to reflect graph-level information. In this work, we propose the Saliency-Aware Regularized Graph Neural Network (SAR-GNN) for graph classification, which consists of two core modules: 1) a traditional graph neural network serving as the backbone for learning node features and 2) the Graph Neural Memory designed to distill a compact graph representation from node features of the backbone. We first estimate the global node saliency by measuring the semantic similarity between the compact graph representation and node features. Then the learned saliency distribution is leveraged to regularize the neighborhood aggregation of the backbone, which facilitates the message passing of features for salient nodes and suppresses the less relevant nodes. Thus, our model can learn more effective graph representation. We demonstrate the merits of SAR-GNN by extensive experiments on seven datasets across various types of graph data. Code will be released.

chat

Title: A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots. (arXiv:2401.00609v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00609
Code URL: null
Copy Paste: [[2401.00609]] A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots(http://arxiv.org/abs/2401.00609)
Summary:
We present a review of personality in neural conversational agents (CAs), also called chatbots. First, we define Personality, Persona, and Profile. We explain all personality schemes which have been used in CAs, and list models under the scheme(s) which they use. Second we describe 21 datasets which have been developed in recent CA personality research. Third, we define the methods used to embody personality in a CA, and review recent models using them. Fourth, we survey some relevant reviews on CAs, personality, and related topics. Finally, we draw conclusions and identify some research challenges for this important emerging field.

retrieval augmented generation

retrieval-augmented generation

rag

Title: Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism. (arXiv:2401.00015v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00015
Code URL: null
Copy Paste: [[2401.00015]] Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement Mechanism(http://arxiv.org/abs/2401.00015)
Summary:
Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforcement learning (DRL). Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences: we aim to optimize a weighted sum of the arbitrage profit and a risk measure while constraining the daily number of cycles for the battery. We assess the performance of our proposed control framework using the Belgian imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q learning and soft actor-critic. Results reveal that the distributional soft actor-critic method can outperform other methods. Moreover, we note that our fully risk-averse agent appropriately learns to hedge against the risk related to the unknown imbalance price by (dis)charging the battery only when the agent is more certain about the price.

Title: Semantic Computing for Organizational Effectiveness: From Organization Theory to Practice through Semantics-Based Modelling. (arXiv:2401.00062v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00062
Code URL: null
Copy Paste: [[2401.00062]] Semantic Computing for Organizational Effectiveness: From Organization Theory to Practice through Semantics-Based Modelling(http://arxiv.org/abs/2401.00062)
Summary:
A critical function of an organization is to foster the level of integration (coordination and cooperation) necessary to achieve its objectives. The need to coordinate and motivation to cooperate emerges from the myriad dependencies between an organization's members and their work. Therefore, to reason about solutions to coordination and cooperation problems requires a robust representation that includes the underlying dependencies. We find that such a representation remains missing from formal organizational models, and we leverage semantics to bridge this gap. Drawing on well-established organizational research and our extensive fieldwork with one of North America's largest municipalities, (1) we introduce an ontology, formalized in first-order logic, that operationalizes concepts like outcome, reward, and epistemic dependence, and their links to potential integration risks; and (2) present real-world applications of this ontology to analyze and support integration in complex government infrastructure projects. Our ontology is implemented and validated in both Z3 and OWL. Key features of our model include inferable dependencies, explainable coordination and cooperation risks, and actionable insights on how dependency structures within an organization can be altered to mitigate the risks. Conceptualizing real-world challenges like incentive misalignment, free-riding, and subgoal optimization in terms of dependency structures, our semantics-based approach represents a novel method for modelling and enhancing coordination and cooperation. Integrated within a decision-support system, our model may serve as an impactful aid for organizational design and effectiveness. More broadly, our approach underscores the transformative potential of semantics in deriving tangible, real-world value from existing organization theory.

Title: Causal State Distillation for Explainable Reinforcement Learning. (arXiv:2401.00104v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00104
Code URL: null
Copy Paste: [[2401.00104]] Causal State Distillation for Explainable Reinforcement Learning(http://arxiv.org/abs/2401.00104)
Summary:
Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: \emph{causal sufficiency}, \emph{sparseness}, and \emph{orthogonality}. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections.

Title: DiffHybrid-UQ: Uncertainty Quantification for Differentiable Hybrid Neural Modeling. (arXiv:2401.00161v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00161
Code URL: null
Copy Paste: [[2401.00161]] DiffHybrid-UQ: Uncertainty Quantification for Differentiable Hybrid Neural Modeling(http://arxiv.org/abs/2401.00161)
Summary:
The hybrid neural differentiable models mark a significant advancement in the field of scientific machine learning. These models, integrating numerical representations of known physics into deep neural networks, offer enhanced predictive capabilities and show great potential for data-driven modeling of complex physical systems. However, a critical and yet unaddressed challenge lies in the quantification of inherent uncertainties stemming from multiple sources. Addressing this gap, we introduce a novel method, DiffHybrid-UQ, for effective and efficient uncertainty propagation and estimation in hybrid neural differentiable models, leveraging the strengths of deep ensemble Bayesian learning and nonlinear transformations. Specifically, our approach effectively discerns and quantifies both aleatoric uncertainties, arising from data noise, and epistemic uncertainties, resulting from model-form discrepancies and data sparsity. This is achieved within a Bayesian model averaging framework, where aleatoric uncertainties are modeled through hybrid neural models. The unscented transformation plays a pivotal role in enabling the flow of these uncertainties through the nonlinear functions within the hybrid model. In contrast, epistemic uncertainties are estimated using an ensemble of stochastic gradient descent (SGD) trajectories. This approach offers a practical approximation to the posterior distribution of both the network parameters and the physical parameters. Notably, the DiffHybrid-UQ framework is designed for simplicity in implementation and high scalability, making it suitable for parallel computing environments. The merits of the proposed method have been demonstrated through problems governed by both ordinary and partial differentiable equations.

Title: Transformer Multivariate Forecasting: Less is More?. (arXiv:2401.00230v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00230
Code URL: null
Copy Paste: [[2401.00230]] Transformer Multivariate Forecasting: Less is More?(http://arxiv.org/abs/2401.00230)
Summary:
In the domain of multivariate forecasting, transformer models stand out as powerful apparatus, displaying exceptional capabilities in handling messy datasets from real-world contexts. However, the inherent complexity of these datasets, characterized by numerous variables and lengthy temporal sequences, poses challenges, including increased noise and extended model runtime. This paper focuses on reducing redundant information to elevate forecasting accuracy while optimizing runtime efficiency. We propose a novel transformer forecasting framework enhanced by Principal Component Analysis (PCA) to tackle this challenge. The framework is evaluated by five state-of-the-art (SOTA) models and four diverse real-world datasets. Our experimental results demonstrate the framework's ability to minimize prediction errors across all models and datasets while significantly reducing runtime. From the model perspective, one of the PCA-enhanced models: PCA+Crossformer, reduces mean square errors (MSE) by 33.3% and decreases runtime by 49.2% on average. From the dataset perspective, the framework delivers 14.3% MSE and 76.6% runtime reduction on Electricity datasets, as well as 4.8% MSE and 86.9% runtime reduction on Traffic datasets. This study aims to advance various SOTA models and enhance transformer-based time series forecasting for intricate data.

Title: Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data. (arXiv:2401.00521v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00521
Code URL: null
Copy Paste: [[2401.00521]] Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data(http://arxiv.org/abs/2401.00521)
Summary:
Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is often overlooked or inadequately considered. To address these limitations, we present a novel Multi-spatial Multi-temporal air quality forecasting method based on Graph Convolutional Networks and Gated Recurrent Units (M2G2), bridging the gap in air quality forecasting across spatial and temporal scales. The proposed framework consists of two modules: Multi-scale Spatial GCN (MS-GCN) for spatial information fusion and Multi-scale Temporal GRU(MT-GRU) for temporal information integration. In the spatial dimension, the MS-GCN module employs a bidirectional learnable structure and a residual structure, enabling comprehensive information exchange between individual monitoring stations and the city-scale graph. Regarding the temporal dimension, the MT-GRU module adaptively combines information from different temporal scales through parallel hidden states. Leveraging meteorological indicators and four air quality indicators, we present comprehensive comparative analyses and ablation experiments, showcasing the higher accuracy of M2G2 in comparison to nine currently available advanced approaches across all aspects. The improvements of M2G2 over the second-best method on RMSE of the 24h/48h/72h are as follows: PM2.5: (7.72%, 6.67%, 10.45%); PM10: (6.43%, 5.68%, 7.73%); NO2: (5.07%, 7.76%, 16.60%); O3: (6.46%, 6.86%, 9.79%). Furthermore, we demonstrate the effectiveness of each module of M2G2 by ablation study.

Title: Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process Mixtures. (arXiv:2401.00773v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00773
Code URL: null
Copy Paste: [[2401.00773]] Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process Mixtures(http://arxiv.org/abs/2401.00773)
Summary:
Probabilistic mixture models are acknowledged as a valuable tool for unsupervised outlier detection owing to their interpretability and intuitive grounding in statistical principles. Within this framework, Dirichlet process mixture models emerge as a compelling alternative to conventional finite mixture models for both clustering and outlier detection tasks. However, despite their evident advantages, the widespread adoption of Dirichlet process mixture models in unsupervised outlier detection has been hampered by challenges related to computational inefficiency and sensitivity to outliers during the construction of detectors. To tackle these challenges, we propose a novel outlier detection method based on ensembles of Dirichlet process Gaussian mixtures. The proposed method is a fully unsupervised algorithm that capitalizes on random subspace and subsampling ensembles, not only ensuring efficient computation but also enhancing the robustness of the resulting outlier detector. Moreover, the proposed method leverages variational inference for Dirichlet process mixtures to ensure efficient and fast computation. Empirical studies with benchmark datasets demonstrate that our method outperforms existing approaches for unsupervised outlier detection.

Title: Automatic Essay Scoring in a Brazilian Scenario. (arXiv:2401.00095v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00095
Code URL: null
Copy Paste: [[2401.00095]] Automatic Essay Scoring in a Brazilian Scenario(http://arxiv.org/abs/2401.00095)
Summary:
This paper presents a novel Automatic Essay Scoring (AES) algorithm tailored for the Portuguese-language essays of Brazil's Exame Nacional do Ensino M\'edio (ENEM), addressing the challenges in traditional human grading systems. Our approach leverages advanced deep learning techniques to align closely with human grading criteria, targeting efficiency and scalability in evaluating large volumes of student essays. This research not only responds to the logistical and financial constraints of manual grading in Brazilian educational assessments but also promises to enhance fairness and consistency in scoring, marking a significant step forward in the application of AES in large-scale academic settings.

Title: Machine Translation Testing via Syntactic Tree Pruning. (arXiv:2401.00751v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.00751
Code URL: null
Copy Paste: [[2401.00751]] Machine Translation Testing via Syntactic Tree Pruning(http://arxiv.org/abs/2401.00751)
Summary:
Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structure and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP can accurately find 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them cannot be found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective to detect translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average.

Title: Online Algorithmic Recourse by Collective Action. (arXiv:2401.00055v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00055
Code URL: null
Copy Paste: [[2401.00055]] Online Algorithmic Recourse by Collective Action(http://arxiv.org/abs/2401.00055)
Summary:
Research on algorithmic recourse typically considers how an individual can reasonably change an unfavorable automated decision when interacting with a fixed decision-making system. This paper focuses instead on the online setting, where system parameters are updated dynamically according to interactions with data subjects. Beyond the typical individual-level recourse, the online setting opens up new ways for groups to shape system decisions by leveraging the parameter update rule. We show empirically that recourse can be improved when users coordinate by jointly computing their feature perturbations, underscoring the importance of collective action in mitigating adverse automated decisions.

Title: Fairness-Enhancing Vehicle Rebalancing in the Ride-hailing System. (arXiv:2401.00093v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00093
Code URL: null
Copy Paste: [[2401.00093]] Fairness-Enhancing Vehicle Rebalancing in the Ride-hailing System(http://arxiv.org/abs/2401.00093)
Summary:
The rapid growth of the ride-hailing industry has revolutionized urban transportation worldwide. Despite its benefits, equity concerns arise as underserved communities face limited accessibility to affordable ride-hailing services. A key issue in this context is the vehicle rebalancing problem, where idle vehicles are moved to areas with anticipated demand. Without equitable approaches in demand forecasting and rebalancing strategies, these practices can further deepen existing inequities. In the realm of ride-hailing, three main facets of fairness are recognized: algorithmic fairness, fairness to drivers, and fairness to riders. This paper focuses on enhancing both algorithmic and rider fairness through a novel vehicle rebalancing method. We introduce an approach that combines a Socio-Aware Spatial-Temporal Graph Convolutional Network (SA-STGCN) for refined demand prediction and a fairness-integrated Matching-Integrated Vehicle Rebalancing (MIVR) model for subsequent vehicle rebalancing. Our methodology is designed to reduce prediction discrepancies and ensure equitable service provision across diverse regions. The effectiveness of our system is evaluated using simulations based on real-world ride-hailing data. The results suggest that our proposed method enhances both accuracy and fairness in forecasting ride-hailing demand, ultimately resulting in more equitable vehicle rebalancing in subsequent operations. Specifically, the algorithm developed in this study effectively reduces the standard deviation and average customer wait times by 6.48% and 0.49%, respectively. This achievement signifies a beneficial outcome for ride-hailing platforms, striking a balance between operational efficiency and fairness.

Title: Deep Generative Symbolic Regression. (arXiv:2401.00282v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00282
Code URL: null
Copy Paste: [[2401.00282]] Deep Generative Symbolic Regression(http://arxiv.org/abs/2401.00282)
Summary:
Symbolic regression (SR) aims to discover concise closed-form mathematical equations from data, a task fundamental to scientific discovery. However, the problem is highly challenging because closed-form equations lie in a complex combinatorial search space. Existing methods, ranging from heuristic search to reinforcement learning, fail to scale with the number of input variables. We make the observation that closed-form equations often have structural characteristics and invariances (e.g., the commutative law) that could be further exploited to build more effective symbolic regression solutions. Motivated by this observation, our key contribution is to leverage pre-trained deep generative models to capture the intrinsic regularities of equations, thereby providing a solid foundation for subsequent optimization steps. We show that our novel formalism unifies several prominent approaches of symbolic regression and offers a new perspective to justify and improve on the previous ad hoc designs, such as the usage of cross-entropy loss during pre-training. Specifically, we propose an instantiation of our framework, Deep Generative Symbolic Regression (DGSR). In our experiments, we show that DGSR achieves a higher recovery rate of true equations in the setting of a larger number of input variables, and it is more computationally efficient at inference time than state-of-the-art RL symbolic regression solutions.

Title: Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise. (arXiv:2401.00364v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00364
Code URL: null
Copy Paste: [[2401.00364]] Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise(http://arxiv.org/abs/2401.00364)
Summary:
Stochastic approximation (SA) is an iterative algorithm to find the fixed point of an operator given noisy samples of this operator. SA appears in many areas such as optimization and Reinforcement Learning (RL). When implemented in practice, the noise that appears in the update of RL algorithms is naturally Markovian. Furthermore, in some settings, such as gradient TD, SA is employed in a two-time-scale manner. The mix of Markovian noise along with the two-time-scale structure results in an algorithm which is complex to analyze theoretically. In this paper, we characterize a tight convergence bound for the iterations of linear two-time-scale SA with Markovian noise. Our results show the convergence behavior of this algorithm given various choices of step sizes. Applying our result to the well-known TDC algorithm, we show the first $O(1/\epsilon)$ sample complexity for the convergence of this algorithm, outperforming all the previous work. Similarly, our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.

Title: Real-Time FJ/MAC PDE Solvers via Tensorized, Back-Propagation-Free Optical PINN Training. (arXiv:2401.00413v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00413
Code URL: null
Copy Paste: [[2401.00413]] Real-Time FJ/MAC PDE Solvers via Tensorized, Back-Propagation-Free Optical PINN Training(http://arxiv.org/abs/2401.00413)
Summary:
Solving partial differential equations (PDEs) numerically often requires huge computing time, energy cost, and hardware resources in practical applications. This has limited their applications in many scenarios (e.g., autonomous systems, supersonic flows) that have a limited energy budget and require near real-time response. Leveraging optical computing, this paper develops an on-chip training framework for physics-informed neural networks (PINNs), aiming to solve high-dimensional PDEs with fJ/MAC photonic power consumption and ultra-low latency. Despite the ultra-high speed of optical neural networks, training a PINN on an optical chip is hard due to (1) the large size of photonic devices, and (2) the lack of scalable optical memory devices to store the intermediate results of back-propagation (BP). To enable realistic optical PINN training, this paper presents a scalable method to avoid the BP process. We also employ a tensor-compressed approach to improve the convergence and scalability of our optical PINN training. This training framework is designed with tensorized optical neural networks (TONN) for scalable inference acceleration and MZI phase-domain tuning for \textit{in-situ} optimization. Our simulation results of a 20-dim HJB PDE show that our photonic accelerator can reduce the number of MZIs by a factor of $1.17\times 10^3$, with only $1.36$ J and $1.15$ s to solve this equation. This is the first real-size optical PINN training framework that can be applied to solve high-dimensional PDEs.

Title: MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting. (arXiv:2401.00423v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00423
Code URL: null
Copy Paste: [[2401.00423]] MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting(http://arxiv.org/abs/2401.00423)
Summary:
Multivariate time series forecasting poses an ongoing challenge across various disciplines. Time series data often exhibit diverse intra-series and inter-series correlations, contributing to intricate and interwoven dependencies that have been the focus of numerous studies. Nevertheless, a significant research gap remains in comprehending the varying inter-series correlations across different time scales among multiple time series, an area that has received limited attention in the literature. To bridge this gap, this paper introduces MSGNet, an advanced deep learning model designed to capture the varying inter-series correlations across multiple time scales using frequency domain analysis and adaptive graph convolution. By leveraging frequency domain analysis, MSGNet effectively extracts salient periodic patterns and decomposes the time series into distinct time scales. The model incorporates a self-attention mechanism to capture intra-series dependencies, while introducing an adaptive mixhop graph convolution layer to autonomously learn diverse inter-series correlations within each time scale. Extensive experiments are conducted on several real-world datasets to showcase the effectiveness of MSGNet. Furthermore, MSGNet possesses the ability to automatically learn explainable multi-scale inter-series correlations, exhibiting strong generalization capabilities even when applied to out-of-distribution samples.

Title: Financial Time-Series Forecasting: Towards Synergizing Performance And Interpretability Within a Hybrid Machine Learning Approach. (arXiv:2401.00534v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00534
Code URL: null
Copy Paste: [[2401.00534]] Financial Time-Series Forecasting: Towards Synergizing Performance And Interpretability Within a Hybrid Machine Learning Approach(http://arxiv.org/abs/2401.00534)
Summary:
In the realm of cryptocurrency, the prediction of Bitcoin prices has garnered substantial attention due to its potential impact on financial markets and investment strategies. This paper propose a comparative study on hybrid machine learning algorithms and leverage on enhancing model interpretability. Specifically, linear regression(OLS, LASSO), long-short term memory(LSTM), decision tree regressors are introduced. Through the grounded experiments, we observe linear regressor achieves the best performance among candidate models. For the interpretability, we carry out a systematic overview on the preprocessing techniques of time-series statistics, including decomposition, auto-correlational function, exponential triple forecasting, which aim to excavate latent relations and complex patterns appeared in the financial time-series forecasting. We believe this work may derive more attention and inspire more researches in the realm of time-series analysis and its realistic applications.

Title: Federated Class-Incremental Learning with New-Class Augmented Self-Distillation. (arXiv:2401.00622v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00622
Code URL: null
Copy Paste: [[2401.00622]] Federated Class-Incremental Learning with New-Class Augmented Self-Distillation(http://arxiv.org/abs/2401.00622)
Summary:
Federated Learning (FL) enables collaborative model training among participants while guaranteeing the privacy of raw data. Mainstream FL methodologies overlook the dynamic nature of real-world data, particularly its tendency to grow in volume and diversify in classes over time. This oversight results in FL methods suffering from catastrophic forgetting, where models inadvertently discard previously learned information upon assimilating new data. In response to this challenge, we propose a novel Federated Class-Incremental Learning (FCIL) method, named FCIL with New-Class Augmented Self-Distillation (FedNASD). FedNASD combines new class scores, which are inferred from current models, with historical models' predictions. Based on the combined past and present knowledge, it incorporates self-distillation over models on clients, aiming to achieve effective knowledge transfer from historical models to current models. Theoretical analysis demonstrates that FedNASD is equivalent to modeling old class scores as conditional probabilities in the absence of new classes. Additionally, it reconciles the predictions of new classes with current models to refine the conditional probabilities of historical scores where new classes do not exist. Empirical experiments demonstrate the superiority of FedNASD over four baseline algorithms in reducing the average forgetting rate and boosting global accuracy.

Title: Adversarially Trained Actor Critic for offline CMDPs. (arXiv:2401.00629v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00629
Code URL: null
Copy Paste: [[2401.00629]] Adversarially Trained Actor Critic for offline CMDPs(http://arxiv.org/abs/2401.00629)
Summary:
We propose a Safe Adversarial Trained Actor Critic (SATAC) algorithm for offline reinforcement learning (RL) with general function approximation in the presence of limited data coverage. SATAC operates as a two-player Stackelberg game featuring a refined objective function. The actor (leader player) optimizes the policy against two adversarially trained value critics (follower players), who focus on scenarios where the actor's performance is inferior to the behavior policy. Our framework provides both theoretical guarantees and a robust deep-RL implementation. Theoretically, we demonstrate that when the actor employs a no-regret optimization oracle, SATAC achieves two guarantees: (i) For the first time in the offline RL setting, we establish that SATAC can produce a policy that outperforms the behavior policy while maintaining the same level of safety, which is critical to designing an algorithm for offline RL. (ii) We demonstrate that the algorithm guarantees policy improvement across a broad range of hyperparameters, indicating its practical robustness. Additionally, we offer a practical version of SATAC and compare it with existing state-of-the-art offline safe-RL algorithms in continuous control environments. SATAC outperforms all baselines across a range of tasks, thus validating the theoretical performance.

multi-run

chain-of-thought

tree-of-thought

agent

Title: Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation. (arXiv:2401.00006v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00006
Code URL: null
Copy Paste: [[2401.00006]] Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation(http://arxiv.org/abs/2401.00006)
Summary:
Building open-ended learning agents involves challenges in pre-trained language model (LLM) and reinforcement learning (RL) approaches. LLMs struggle with context-specific real-time interactions, while RL methods face efficiency issues for exploration. To this end, we propose OpenContra, a co-training framework that cooperates LLMs and GRL to construct an open-ended agent capable of comprehending arbitrary human instructions. The implementation comprises two stages: (1) fine-tuning an LLM to translate human instructions into structured goals, and curriculum training a goal-conditioned RL policy to execute arbitrary goals; (2) collaborative training to make the LLM and RL policy learn to adapt each, achieving open-endedness on instruction space. We conduct experiments on Contra, a battle royale FPS game with a complex and vast goal space. The results show that an agent trained with OpenContra comprehends arbitrary human instructions and completes goals with a high completion ratio, which proves that OpenContra may be the first practical solution for constructing open-ended embodied agents.

Title: Principal-Agent Reward Shaping in MDPs. (arXiv:2401.00298v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00298
Code URL: null
Copy Paste: [[2401.00298]] Principal-Agent Reward Shaping in MDPs(http://arxiv.org/abs/2401.00298)
Summary:
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon.

Title: Bidirectional Temporal Plan Graph: Enabling Switchable Passing Orders for More Efficient Multi-Agent Path Finding Plan Execution. (arXiv:2401.00315v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.00315
Code URL: null
Copy Paste: [[2401.00315]] Bidirectional Temporal Plan Graph: Enabling Switchable Passing Orders for More Efficient Multi-Agent Path Finding Plan Execution(http://arxiv.org/abs/2401.00315)
Summary:
The Multi-Agent Path Finding (MAPF) problem involves planning collision-free paths for multiple agents in a shared environment. The majority of MAPF solvers rely on the assumption that an agent can arrive at a specific location at a specific timestep. However, real-world execution uncertainties can cause agents to deviate from this assumption, leading to collisions and deadlocks. Prior research solves this problem by having agents follow a Temporal Plan Graph (TPG), enforcing a consistent passing order at every location as defined in the MAPF plan. However, we show that TPGs are overly strict because, in some circumstances, satisfying the passing order requires agents to wait unnecessarily, leading to longer execution time. To overcome this issue, we introduce a new graphical representation called a Bidirectional Temporal Plan Graph (BTPG), which allows switching passing orders during execution to avoid unnecessary waiting time. We design two anytime algorithms for constructing a BTPG: BTPG-na\"ive and BTPG-optimized. Experimental results show that following BTPGs consistently outperforms following TPGs, reducing unnecessary waits by 8-20%.

Title: Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback. (arXiv:2401.00330v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.00330
Code URL: null
Copy Paste: [[2401.00330]] Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback(http://arxiv.org/abs/2401.00330)
Summary:
In this work, we consider the offline preference-based reinforcement learning problem. We focus on the two-phase learning approach that is prevalent in previous reinforcement learning from human preference works. We find a challenge in applying two-phase learning in the offline PBRL setting that the learned utility model can be too hard for the learning agent to optimize during the second learning phase. To overcome the challenge, we propose a two-phasing learning approach under behavior regularization through action clipping. The insight is that the state-actions which are poorly covered by the dataset can only provide limited information and increase the complexity of the problem in the second learning phase. Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency. We empirically verify that our method has high learning efficiency on a variety of datasets in robotic control environments.