language model

Title: Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey. (arXiv:2312.03014v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03014
Code URL: null
Copy Paste: [[2312.03014]] Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey(http://arxiv.org/abs/2312.03014)
Summary:
As artificial intelligence (AI) continues to rapidly evolve, the realm of Earth and atmospheric sciences is increasingly adopting data-driven models, powered by progressive developments in deep learning (DL). Specifically, DL techniques are extensively utilized to decode the chaotic and nonlinear aspects of Earth systems, and to address climate challenges via understanding weather and climate data. Cutting-edge performance on specific tasks within narrower spatio-temporal scales has been achieved recently through DL. The rise of large models, specifically large language models (LLMs), has enabled fine-tuning processes that yield remarkable outcomes across various downstream tasks, thereby propelling the advancement of general AI. However, we are still navigating the initial stages of crafting general AI for weather and climate. In this survey, we offer an exhaustive, timely overview of state-of-the-art AI methodologies specifically engineered for weather and climate data, with a special focus on time series and text data. Our primary coverage encompasses four critical aspects: types of weather and climate data, principal model architectures, model scopes and applications, and datasets for weather and climate. Furthermore, in relation to the creation and application of foundation models for weather and climate data understanding, we delve into the field's prevailing challenges, offer crucial insights, and propose detailed avenues for future research. This comprehensive approach equips practitioners with the requisite knowledge to make substantial progress in this domain. Our survey encapsulates the most recent breakthroughs in research on large, data-driven models for weather and climate data understanding, emphasizing robust foundations, current advancements, practical applications, crucial resources, and prospective research opportunities.

Title: Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction. (arXiv:2312.03022v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03022
Code URL: null
Copy Paste: [[2312.03022]] Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction(http://arxiv.org/abs/2312.03022)
Summary:
Knowledge graph construction (KGC) is a multifaceted undertaking involving the extraction of entities, relations, and events. Traditionally, large language models (LLMs) have been viewed as solitary task-solving agents in this complex landscape. However, this paper challenges this paradigm by introducing a novel framework, CooperKGC. Departing from the conventional approach, CooperKGC establishes a collaborative processing network, assembling a KGC collaboration team capable of concurrently addressing entity, relation, and event extraction tasks. Our experiments unequivocally demonstrate that fostering collaboration and information interaction among diverse agents within CooperKGC yields superior results compared to individual cognitive processes operating in isolation. Importantly, our findings reveal that the collaboration facilitated by CooperKGC enhances knowledge selection, correction, and aggregation capabilities across multiple rounds of interactions.

Title: Evaluating Agents using Social Choice Theory. (arXiv:2312.03121v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03121
Code URL: null
Copy Paste: [[2312.03121]] Evaluating Agents using Social Choice Theory(http://arxiv.org/abs/2312.03121)
Summary:
We argue that many general evaluation problems can be viewed through the lens of voting theory. Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation. By viewing the aggregator as a social welfare function, we are able to leverage centuries of research in social choice theory to derive principled evaluation frameworks with axiomatic foundations. These evaluations are interpretable and flexible, while avoiding many of the problems currently facing cross-task evaluation. We apply this Voting-as-Evaluation (VasE) framework across multiple settings, including reinforcement learning, large language models, and humans. In practice, we observe that VasE can be more robust than popular evaluation frameworks (Elo and Nash averaging), discovers properties in the evaluation data not evident from scores alone, and can predict outcomes better than Elo in a complex seven-player game. We identify one particular approach, maximal lotteries, that satisfies important consistency properties relevant to evaluation, is computationally efficient (polynomial in the size of the evaluation data), and identifies game-theoretic cycles

Title: FlexModel: A Framework for Interpretability of Distributed Large Language Models. (arXiv:2312.03140v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03140
Code URL: https://github.com/vectorinstitute/flex_model
Copy Paste: [[2312.03140]] FlexModel: A Framework for Interpretability of Distributed Large Language Models(http://arxiv.org/abs/2312.03140)
Summary:
With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed computing. This often hinders contributions from researchers with machine learning expertise but limited distributed computing background. Addressing this challenge, we present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi-GPU and multi-node configurations. The library is compatible with existing model distribution libraries and encapsulates PyTorch models. It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals, bridging the gap between distributed and single-device model paradigms. Primarily, FlexModel enhances accessibility by democratizing model interactions and promotes more inclusive research in the domain of large-scale neural networks. The package is found at https://github.com/VectorInstitute/flex_model.

Title: Teaching Specific Scientific Knowledge into Large Language Models through Additional Training. (arXiv:2312.03360v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03360
Code URL: null
Copy Paste: [[2312.03360]] Teaching Specific Scientific Knowledge into Large Language Models through Additional Training(http://arxiv.org/abs/2312.03360)
Summary:
Through additional training, we explore embedding specialized scientific knowledge into the Llama 2 Large Language Model (LLM). Key findings reveal that effective knowledge integration requires reading texts from multiple perspectives, especially in instructional formats. We utilize text augmentation to tackle the scarcity of specialized texts, including style conversions and translations. Hyperparameter optimization proves crucial, with different size models (7b, 13b, and 70b) reasonably undergoing additional training. Validating our methods, we construct a dataset of 65,000 scientific papers. Although we have succeeded in partially embedding knowledge, the study highlights the complexities and limitations of incorporating specialized information into LLMs, suggesting areas for further improvement.

Title: Not All Large Language Models (LLMs) Succumb to the "Reversal Curse": A Comparative Study of Deductive Logical Reasoning in BERT and GPT Models. (arXiv:2312.03633v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03633
Code URL: null
Copy Paste: [[2312.03633]] Not All Large Language Models (LLMs) Succumb to the "Reversal Curse": A Comparative Study of Deductive Logical Reasoning in BERT and GPT Models(http://arxiv.org/abs/2312.03633)
Summary:
The "Reversal Curse" refers to the scenario where auto-regressive decoder large language models (LLMs), such as ChatGPT, trained on "A is B" fail to learn "B is A", demonstrating a basic failure of logical deduction. This raises a red flag in the use of GPT models for certain general tasks such as constructing knowledge graphs, considering their adherence to this symmetric principle. In our study, we examined a bidirectional LLM, BERT, and found that it is immune to the reversal curse. Driven by ongoing efforts to construct biomedical knowledge graphs with LLMs, we also embarked on evaluating more complex but essential deductive reasoning capabilities. This process included first training encoder and decoder language models to master the intersection ($\cap$) and union ($\cup$) operations on two sets and then moving on to assess their capability to infer different combinations of union ($\cup$) and intersection ($\cap$) operations on three newly created sets. The findings showed that while both encoder and decoder language models, trained for tasks involving two sets (union/intersection), were proficient in such scenarios, they encountered difficulties when dealing with operations that included three sets (various combinations of union and intersection). Our research highlights the distinct characteristics of encoder and decoder models in simple and complex logical reasoning. In practice, the choice between BERT and GPT should be guided by the specific requirements and nature of the task at hand, leveraging their respective strengths in bidirectional context comprehension and sequence prediction.

Title: Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia. (arXiv:2312.03664v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03664
Code URL: null
Copy Paste: [[2312.03664]] Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia(http://arxiv.org/abs/2312.03664)
Summary:
Agent-based modeling has been around for decades, and applied widely across the social and natural sciences. The scope of this research method is now poised to grow dramatically as it absorbs the new affordances provided by Large Language Models (LLM)s. Generative Agent-Based Models (GABM) are not just classic Agent-Based Models (ABM)s where the agents talk to one another. Rather, GABMs are constructed using an LLM to apply common sense to situations, act "reasonably", recall common semantic knowledge, produce API calls to control digital technologies like apps, and communicate both within the simulation and to researchers viewing it from the outside. Here we present Concordia, a library to facilitate constructing and working with GABMs. Concordia makes it easy to construct language-mediated simulations of physically- or digitally-grounded environments. Concordia agents produce their behavior using a flexible component system which mediates between two fundamental operations: LLM calls and associative memory retrieval. A special agent called the Game Master (GM), which was inspired by tabletop role-playing games, is responsible for simulating the environment where the agents interact. Agents take actions by describing what they want to do in natural language. The GM then translates their actions into appropriate implementations. In a simulated physical world, the GM checks the physical plausibility of agent actions and describes their effects. In digital environments simulating technologies such as apps and services, the GM may handle API calls to integrate with external tools such as general AI assistants (e.g., Bard, ChatGPT), and digital apps (e.g., Calendar, Email, Search, etc.). Concordia was designed to support a wide array of applications both in scientific research and for evaluating performance of real digital services by simulating users and/or generating synthetic data.

Title: Assertion Enhanced Few-Shot Learning: Instructive Technique for Large Language Models to Generate Educational Explanations. (arXiv:2312.03122v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03122
Code URL: null
Copy Paste: [[2312.03122]] Assertion Enhanced Few-Shot Learning: Instructive Technique for Large Language Models to Generate Educational Explanations(http://arxiv.org/abs/2312.03122)
Summary:
Human educators possess an intrinsic ability to anticipate and seek educational explanations from students, which drives them to pose thought-provoking questions when students cannot articulate these explanations independently. We aim to imbue Intelligent Tutoring Systems with this ability using few-shot learning capability of Large Language Models. Our work proposes a novel prompting technique, Assertion Enhanced Few-Shot Learning, to facilitate the generation of accurate, detailed oriented educational explanations. Our central hypothesis is that, in educational domain, few-shot demonstrations are necessary but not a sufficient condition for quality explanation generation. We conducted a study involving 12 in-service teachers, comparing our approach to Traditional Few-Shot Learning. The results show that Assertion Enhanced Few-Shot Learning improves explanation accuracy by 15% and yields higher-quality explanations, as evaluated by teachers. We also conduct a qualitative ablation study to factor the impact of assertions to provide educator-friendly prompting guidelines for generating explanations in their domain of interest.

Title: Corporate Bankruptcy Prediction with Domain-Adapted BERT. (arXiv:2312.03194v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03194
Code URL: null
Copy Paste: [[2312.03194]] Corporate Bankruptcy Prediction with Domain-Adapted BERT(http://arxiv.org/abs/2312.03194)
Summary:
This study performs BERT-based analysis, which is a representative contextualized language model, on corporate disclosure data to predict impending bankruptcies. Prior literature on bankruptcy prediction mainly focuses on developing more sophisticated prediction methodologies with financial variables. However, in our study, we focus on improving the quality of input dataset. Specifically, we employ BERT model to perform sentiment analysis on MD&A disclosures. We show that BERT outperforms dictionary-based predictions and Word2Vec-based predictions in terms of adjusted R-square in logistic regression, k-nearest neighbor (kNN-5), and linear kernel support vector machine (SVM). Further, instead of pre-training the BERT model from scratch, we apply self-learning with confidence-based filtering to corporate disclosure data (10-K). We achieve the accuracy rate of 91.56% and demonstrate that the domain adaptation procedure brings a significant improvement in prediction accuracy.

Title: Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities. (arXiv:2312.03330v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03330
Code URL: null
Copy Paste: [[2312.03330]] Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities(http://arxiv.org/abs/2312.03330)
Summary:
Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings. We consider the challenge of measuring misogyny in natural language generation, and argue that generic `toxicity' classifiers are inadequate for this task. We use data from two well-characterised `Incel' communities on Reddit that differ primarily in their degrees of misogyny to construct a pair of training corpora which we use to fine-tune two language models. We show that an open source `toxicity' classifier is unable to distinguish meaningfully between generations from these models. We contrast this with a misogyny-specific lexicon recently proposed by feminist subject-matter experts, demonstrating that, despite the limitations of simple lexicon-based approaches, this shows promise as a benchmark to evaluate language models for misogyny, and that it is sensitive enough to reveal the known differences in these Reddit communities. Our preliminary findings highlight the limitations of a generic approach to evaluating harms, and further emphasise the need for careful benchmark design and selection in natural language evaluation.

Title: Compressed Context Memory For Online Language Model Interaction. (arXiv:2312.03414v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03414
Code URL: https://github.com/snu-mllab/context-memory
Copy Paste: [[2312.03414]] Compressed Context Memory For Online Language Model Interaction(http://arxiv.org/abs/2312.03414)
Summary:
This paper presents a novel context compression method for Transformer language models in online scenarios such as ChatGPT, where the context continually expands. As the context lengthens, the attention process requires more memory and computational resources, which in turn reduces the throughput of the language model. To this end, we propose a compressed context memory system that continually compresses the growing context into a compact memory space. The compression process simply involves integrating a lightweight conditional LoRA into the language model's forward pass during inference. Based on the compressed context memory, the language model can perform inference with reduced memory and attention operations. Through evaluations on conversation, personalization, and multi-task learning, we demonstrate that our approach achieves the performance level of a full context model with $5\times$ smaller context memory space. Codes are available at https://github.com/snu-mllab/context-memory.

Title: Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification. (arXiv:2312.03458v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03458
Code URL: null
Copy Paste: [[2312.03458]] Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification(http://arxiv.org/abs/2312.03458)
Summary:
The proliferation of Large Language Models (LLMs) has spurred extensive research into LLM-related Prompt investigations, such as Instruction Learning (IL), In-context Learning (ICL), and Chain-of-Thought (CoT). These approaches aim to improve LLMs' responses by enabling them to provide concise statements or examples for deeper contemplation when addressing questions. However, independent thinking by LLMs can introduce variability in their thought processes, leading to potential inaccuracies. In response, our study seeks to bridge the gap between LLM and human-like thinking processes, recognizing that text comprehension begins with understanding individual words. To tackle this challenge, we have expanded the CoT method to cater to a specific domain. Our approach, known as "Think from Words" (TFW), initiates the comprehension process at the word level and then extends it to encompass the entire text. We also propose "TFW with Extra word-level information" (TFW Extra), augmenting comprehension with additional word-level data. To assess our methods, we employ text classification on six Japanese datasets comprising text-level and word-level elements. Our findings not only validate the effectiveness of TFW but also shed light on the impact of various word-level information types on LLMs' text comprehension, offering insights into their potential to cause misinterpretations and errors in the overall comprehension of the final text.

Title: DBCopilot: Scaling Natural Language Querying to Massive Databases. (arXiv:2312.03463v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03463
Code URL: https://github.com/tshu-w/dbcopilot
Copy Paste: [[2312.03463]] DBCopilot: Scaling Natural Language Querying to Massive Databases(http://arxiv.org/abs/2312.03463)
Summary:
Text-to-SQL simplifies database interactions by enabling non-experts to convert their natural language (NL) questions into Structured Query Language (SQL) queries. While recent advances in large language models (LLMs) have improved the zero-shot text-to-SQL paradigm, existing methods face scalability challenges when dealing with massive, dynamically changing databases. This paper introduces DBCopilot, a framework that addresses these challenges by employing a compact and flexible copilot model for routing across massive databases. Specifically, DBCopilot decouples the text-to-SQL process into schema routing and SQL generation, leveraging a lightweight sequence-to-sequence neural network-based router to formulate database connections and navigate natural language questions through databases and tables. The routed schemas and questions are then fed into LLMs for efficient SQL generation. Furthermore, DBCopilot also introduced a reverse schema-to-question generation paradigm, which can learn and adapt the router over massive databases automatically without requiring manual intervention. Experimental results demonstrate that DBCopilot is a scalable and effective solution for real-world text-to-SQL tasks, providing a significant advancement in handling large-scale schemas.

Title: Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling. (arXiv:2312.03523v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03523
Code URL: null
Copy Paste: [[2312.03523]] Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling(http://arxiv.org/abs/2312.03523)
Summary:
We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.

Title: Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment. (arXiv:2312.03549v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03549
Code URL: null
Copy Paste: [[2312.03549]] Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment(http://arxiv.org/abs/2312.03549)
Summary:
Large language models (LLMs) such as GPT-3, OPT, and LLaMA have demonstrated remarkable accuracy in a wide range of tasks. However, training these models can incur significant expenses, often requiring tens of thousands of GPUs for months of continuous operation. Typically, this training is carried out in specialized GPU clusters equipped with homogeneous high-speed Remote Direct Memory Access (RDMA) network interface cards (NICs). The acquisition and maintenance of such dedicated clusters is challenging. Current LLM training frameworks, like Megatron-LM and Megatron-DeepSpeed, focus primarily on optimizing training within homogeneous cluster settings. In this paper, we introduce Holmes, a training framework for LLMs that employs thoughtfully crafted data and model parallelism strategies over the heterogeneous NIC environment. Our primary technical contribution lies in a novel scheduling method that intelligently allocates distinct computational tasklets in LLM training to specific groups of GPU devices based on the characteristics of their connected NICs. Furthermore, our proposed framework, utilizing pipeline parallel techniques, demonstrates scalability to multiple GPU clusters, even in scenarios without high-speed interconnects between nodes in distinct clusters. We conducted comprehensive experiments that involved various scenarios in the heterogeneous NIC environment. In most cases, our framework achieves performance levels close to those achievable with homogeneous RDMA-capable networks (InfiniBand or RoCE), significantly exceeding training efficiency within the pure Ethernet environment. Additionally, we verified that our framework outperforms other mainstream LLM frameworks under heterogeneous NIC environment in terms of training efficiency and can be seamlessly integrated with them.

Title: Evaluating and Mitigating Discrimination in Language Model Decisions. (arXiv:2312.03689v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03689
Code URL: null
Copy Paste: [[2312.03689]] Evaluating and Mitigating Discrimination in Language Model Decisions(http://arxiv.org/abs/2312.03689)
Summary:
As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks. We present a method for proactively evaluating the potential discriminatory impact of LMs in a wide range of use cases, including hypothetical use cases where they have not yet been deployed. Specifically, we use an LM to generate a wide array of potential prompts that decision-makers may input into an LM, spanning 70 diverse decision scenarios across society, and systematically vary the demographic information in each prompt. Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied. While we do not endorse or permit the use of language models to make automated decisions for the high-risk use cases we study, we demonstrate techniques to significantly decrease both positive and negative discrimination through careful prompt engineering, providing pathways toward safer deployment in use cases where they may be appropriate. Our work enables developers and policymakers to anticipate, measure, and address discrimination as language model capabilities and applications continue to expand. We release our dataset and prompts at https://huggingface.co/datasets/Anthropic/discrim-eval

gpt

Title: XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering. (arXiv:2312.03567v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03567
Code URL: null
Copy Paste: [[2312.03567]] XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering(http://arxiv.org/abs/2312.03567)
Summary:
Extractive question answering (QA) systems can enable physicians and researchers to query medical records, a foundational capability for designing clinical studies and understanding patient medical history. However, building these systems typically requires expert-annotated QA pairs. Large language models (LLMs), which can perform extractive QA, depend on high quality data in their prompts, specialized for the application domain. We introduce a novel approach, XAIQA, for generating synthetic QA pairs at scale from data naturally available in electronic health records. Our method uses the idea of a classification model explainer to generate questions and answers about medical concepts corresponding to medical codes. In an expert evaluation with two physicians, our method identifies $2.2\times$ more semantic matches and $3.8\times$ more clinical abbreviations than two popular approaches that use sentence transformers to create QA pairs. In an ML evaluation, adding our QA pairs improves performance of GPT-4 as an extractive QA model, including on difficult questions. In both the expert and ML evaluations, we examine trade-offs between our method and sentence transformers for QA pair generation depending on question difficulty.

llm

Title: Inherent limitations of LLMs regarding spatial information. (arXiv:2312.03042v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03042
Code URL: null
Copy Paste: [[2312.03042]] Inherent limitations of LLMs regarding spatial information(http://arxiv.org/abs/2312.03042)
Summary:
Despite the significant advancements in natural language processing capabilities demonstrated by large language models such as ChatGPT, their proficiency in comprehending and processing spatial information, especially within the domains of 2D and 3D route planning, remains notably underdeveloped. This paper investigates the inherent limitations of ChatGPT and similar models in spatial reasoning and navigation-related tasks, an area critical for applications ranging from autonomous vehicle guidance to assistive technologies for the visually impaired. In this paper, we introduce a novel evaluation framework complemented by a baseline dataset, meticulously crafted for this study. This dataset is structured around three key tasks: plotting spatial points, planning routes in two-dimensional (2D) spaces, and devising pathways in three-dimensional (3D) environments. We specifically developed this dataset to assess the spatial reasoning abilities of ChatGPT. Our evaluation reveals key insights into the model's capabilities and limitations in spatial understanding.

Title: Clinical Notes Reveal Physician Fatigue. (arXiv:2312.03077v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03077
Code URL: null
Copy Paste: [[2312.03077]] Clinical Notes Reveal Physician Fatigue(http://arxiv.org/abs/2312.03077)
Summary:
Physicians write notes about patients. In doing so, they reveal much about themselves. Using data from 129,228 emergency room visits, we train a model to identify notes written by fatigued physicians -- those who worked 5 or more of the prior 7 days. In a hold-out set, the model accurately identifies notes written by these high-workload physicians, and also flags notes written in other high-fatigue settings: on overnight shifts, and after high patient volumes. Model predictions also correlate with worse decision-making on at least one important metric: yield of testing for heart attack is 18% lower with each standard deviation increase in model-predicted fatigue. Finally, the model indicates that notes written about Black and Hispanic patients have 12% and 21% higher predicted fatigue than Whites -- larger than overnight vs. daytime differences. These results have an important implication for large language models (LLMs). Our model indicates that fatigued doctors write more predictable notes. Perhaps unsurprisingly, because word prediction is the core of how LLMs work, we find that LLM-written notes have 17% higher predicted fatigue than real physicians' notes. This indicates that LLMs may introduce distortions in generated text that are not yet fully understood.

Title: LLMs for Multi-Modal Knowledge Extraction and Analysis in Intelligence/Safety-Critical Applications. (arXiv:2312.03088v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03088
Code URL: null
Copy Paste: [[2312.03088]] LLMs for Multi-Modal Knowledge Extraction and Analysis in Intelligence/Safety-Critical Applications(http://arxiv.org/abs/2312.03088)
Summary:
Large Language Models have seen rapid progress in capability in recent years; this progress has been accelerating and their capabilities, measured by various benchmarks, are beginning to approach those of humans. There is a strong demand to use such models in a wide variety of applications but, due to unresolved vulnerabilities and limitations, great care needs to be used before applying them to intelligence and safety-critical applications. This paper reviews recent literature related to LLM assessment and vulnerabilities to synthesize the current research landscape and to help understand what advances are most critical to enable use of of these technologies in intelligence and safety-critical applications. The vulnerabilities are broken down into ten high-level categories and overlaid onto a high-level life cycle of an LLM. Some general categories of mitigations are reviewed.

long context

lora

Title: Deep Reinforcement Learning for Community Battery Scheduling under Uncertainties of Load, PV Generation, and Energy Prices. (arXiv:2312.03008v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03008
Code URL: null
Copy Paste: [[2312.03008]] Deep Reinforcement Learning for Community Battery Scheduling under Uncertainties of Load, PV Generation, and Energy Prices(http://arxiv.org/abs/2312.03008)
Summary:
In response to the growing uptake of distributed energy resources (DERs), community batteries have emerged as a promising solution to support renewable energy integration, reduce peak load, and enhance grid reliability. This paper presents a deep reinforcement learning (RL) strategy, centered around the soft actor-critic (SAC) algorithm, to schedule a community battery system in the presence of uncertainties, such as solar photovoltaic (PV) generation, local demand, and real-time energy prices. We position the community battery to play a versatile role, in integrating local PV energy, reducing peak load, and exploiting energy price fluctuations for arbitrage, thereby minimizing the system cost. To improve exploration and convergence during RL training, we utilize the noisy network technique. This paper conducts a comparative study of different RL algorithms, including proximal policy optimization (PPO) and deep deterministic policy gradient (DDPG) algorithms, to evaluate their effectiveness in the community battery scheduling problem. The results demonstrate the potential of RL in addressing community battery scheduling challenges and show that the SAC algorithm achieves the best performance compared to RL and optimization benchmarks.

Title: I-PHYRE: Interactive Physical Reasoning. (arXiv:2312.03009v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03009
Code URL: null
Copy Paste: [[2312.03009]] I-PHYRE: Interactive Physical Reasoning(http://arxiv.org/abs/2312.03009)
Summary:
Current evaluation protocols predominantly assess physical reasoning in stationary scenes, creating a gap in evaluating agents' abilities to interact with dynamic events. While contemporary methods allow agents to modify initial scene configurations and observe consequences, they lack the capability to interact with events in real time. To address this, we introduce I-PHYRE, a framework that challenges agents to simultaneously exhibit intuitive physical reasoning, multi-step planning, and in-situ intervention. Here, intuitive physical reasoning refers to a quick, approximate understanding of physics to address complex problems; multi-step denotes the need for extensive sequence planning in I-PHYRE, considering each intervention can significantly alter subsequent choices; and in-situ implies the necessity for timely object manipulation within a scene, where minor timing deviations can result in task failure. We formulate four game splits to scrutinize agents' learning and generalization of essential principles of interactive physical reasoning, fostering learning through interaction with representative scenarios. Our exploration involves three planning strategies and examines several supervised and reinforcement agents' zero-shot generalization proficiency on I-PHYRE. The outcomes highlight a notable gap between existing learning algorithms and human performance, emphasizing the imperative for more research in enhancing agents with interactive physical reasoning capabilities. The environment and baselines will be made publicly available.

Title: Speculative Exploration on the Concept of Artificial Agents Conducting Autonomous Research. (arXiv:2312.03497v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03497
Code URL: https://github.com/t46/research-automation-perspective-paper
Copy Paste: [[2312.03497]] Speculative Exploration on the Concept of Artificial Agents Conducting Autonomous Research(http://arxiv.org/abs/2312.03497)
Summary:
This paper engages in a speculative exploration of the concept of an artificial agent capable of conducting research. Initially, it examines how the act of research can be conceptually characterized, aiming to provide a starting point for discussions about what it means to create such agents. The focus then shifts to the core components of research: question formulation, hypothesis generation, and hypothesis verification. This discussion includes a consideration of the potential and challenges associated with enabling machines to autonomously perform these tasks. Subsequently, this paper briefly considers the overlapping themes and interconnections that underlie them. Finally, the paper presents preliminary thoughts on prototyping as an initial step towards uncovering the challenges involved in developing these research-capable agents.

Title: Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes. (arXiv:2312.03176v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03176
Code URL: null
Copy Paste: [[2312.03176]] Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes(http://arxiv.org/abs/2312.03176)
Summary:
Change-point detection (CPD) is crucial for identifying abrupt shifts in data, which influence decision-making and efficient resource allocation across various domains. To address the challenges posed by the costly and time-intensive data acquisition in CPD, we introduce the Derivative-Aware Change Detection (DACD) method. It leverages the derivative process of a Gaussian process (GP) for Active Learning (AL), aiming to pinpoint change-point locations effectively. DACD balances the exploitation and exploration of derivative processes through multiple data acquisition functions (AFs). By utilizing GP derivative mean and variance as criteria, DACD sequentially selects the next sampling data point, thus enhancing algorithmic efficiency and ensuring reliable and accurate results. We investigate the effectiveness of DACD method in diverse scenarios and show it outperforms other active learning change-point detection approaches.

Title: Constrained Bayesian Optimization Under Partial Observations: Balanced Improvements and Provable Convergence. (arXiv:2312.03212v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03212
Code URL: null
Copy Paste: [[2312.03212]] Constrained Bayesian Optimization Under Partial Observations: Balanced Improvements and Provable Convergence(http://arxiv.org/abs/2312.03212)
Summary:
The partially observable constrained optimization problems (POCOPs) impede data-driven optimization techniques since an infeasible solution of POCOPs can provide little information about the objective as well as the constraints. We endeavor to design an efficient and provable method for expensive POCOPs under the framework of constrained Bayesian optimization. Our method consists of two key components. Firstly, we present an improved design of the acquisition functions that introduces balanced exploration during optimization. We rigorously study the convergence properties of this design to demonstrate its effectiveness. Secondly, we propose a Gaussian process embedding different likelihoods as the surrogate model for a partially observable constraint. This model leads to a more accurate representation of the feasible regions compared to traditional classification-based models. Our proposed method is empirically studied on both synthetic and real-world problems. The results demonstrate the competitiveness of our method for solving POCOPs.

Title: Run LoRA Run: Faster and Lighter LoRA Implementations. (arXiv:2312.03415v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03415
Code URL: null
Copy Paste: [[2312.03415]] Run LoRA Run: Faster and Lighter LoRA Implementations(http://arxiv.org/abs/2312.03415)
Summary:
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers. This technique is used both for fine-tuning (LoRA, QLoRA) and full train (ReLoRA). This paper presents the RunLoRA framework for efficient implementations of LoRA that significantly improves the speed of neural network training and fine-tuning using low-rank adapters. The proposed implementation optimizes the computation of LoRA operations based on dimensions of corresponding linear layer, layer input dimensions and lora rank by choosing best forward and backward computation graph based on FLOPs and time estimations, resulting in faster training without sacrificing accuracy. The experimental results show up to 17% speedup on Llama family of models.

Title: Search Strategies for Self-driving Laboratories with Pending Experiments. (arXiv:2312.03466v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03466
Code URL: null
Copy Paste: [[2312.03466]] Search Strategies for Self-driving Laboratories with Pending Experiments(http://arxiv.org/abs/2312.03466)
Summary:
Self-driving laboratories (SDLs) consist of multiple stations that perform material synthesis and characterisation tasks. To minimize station downtime and maximize experimental throughput, it is practical to run experiments in asynchronous parallel, in which multiple experiments are being performed at once in different stages. Asynchronous parallelization of experiments, however, introduces delayed feedback (i.e. "pending experiments"), which is known to reduce Bayesian optimiser performance. Here, we build a simulator for a multi-stage SDL and compare optimisation strategies for dealing with delayed feedback and asynchronous parallelized operation. Using data from a real SDL, we build a ground truth Bayesian optimisation simulator from 177 previously run experiments for maximizing the conductivity of functional coatings. We then compare search strategies such as expected improvement, noisy expected improvement, 4-mode exploration and random sampling. We evaluate their performance in terms of amount of delay and problem dimensionality. Our simulation results showcase the trade-off between the asynchronous parallel operation and delayed feedback.

hallucination

prompt

Title: Exploring Answer Information Methods for Question Generation with Transformers. (arXiv:2312.03483v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03483
Code URL: null
Copy Paste: [[2312.03483]] Exploring Answer Information Methods for Question Generation with Transformers(http://arxiv.org/abs/2312.03483)
Summary:
There has been a lot of work in question generation where different methods to provide target answers as input, have been employed. This experimentation has been mostly carried out for RNN based models. We use three different methods and their combinations for incorporating answer information and explore their effect on several automatic evaluation metrics. The methods that are used are answer prompting, using a custom product method using answer embeddings and encoder outputs, choosing sentences from the input paragraph that have answer related information, and using a separate cross-attention attention block in the decoder which attends to the answer. We observe that answer prompting without any additional modes obtains the best scores across rouge, meteor scores. Additionally, we use a custom metric to calculate how many of the generated questions have the same answer, as the answer which is used to generate them.

Title: PROMISE: A Framework for Model-Driven Stateful Prompt Orchestration. (arXiv:2312.03699v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03699
Code URL: null
Copy Paste: [[2312.03699]] PROMISE: A Framework for Model-Driven Stateful Prompt Orchestration(http://arxiv.org/abs/2312.03699)
Summary:
The advent of increasingly powerful language models has raised expectations for language-based interactions. However, controlling these models is a challenge, emphasizing the need to be able to investigate the feasibility and value of their application. We present PROMISE, a framework that facilitates the development of complex language-based interactions with information systems. Its use of state machine modeling concepts enables model-driven, dynamic prompt orchestration across hierarchically nested states and transitions. This improves the control of the behavior of language models and thus enables their effective and efficient use. We show the benefits of PROMISE in the context of application scenarios within health information systems and demonstrate its ability to handle complex interactions.

code

Title: Learning Multi-graph Structure for Temporal Knowledge Graph Reasoning. (arXiv:2312.03004v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03004
Code URL: null
Copy Paste: [[2312.03004]] Learning Multi-graph Structure for Temporal Knowledge Graph Reasoning(http://arxiv.org/abs/2312.03004)
Summary:
Temporal Knowledge Graph (TKG) reasoning that forecasts future events based on historical snapshots distributed over timestamps is denoted as extrapolation and has gained significant attention. Owing to its extreme versatility and variation in spatial and temporal correlations, TKG reasoning presents a challenging task, demanding efficient capture of concurrent structures and evolutional interactions among facts. While existing methods have made strides in this direction, they still fall short of harnessing the diverse forms of intrinsic expressive semantics of TKGs, which encompass entity correlations across multiple timestamps and periodicity of temporal information. This limitation constrains their ability to thoroughly reflect historical dependencies and future trends. In response to these drawbacks, this paper proposes an innovative reasoning approach that focuses on Learning Multi-graph Structure (LMS). Concretely, it comprises three distinct modules concentrating on multiple aspects of graph structure knowledge within TKGs, including concurrent and evolutional patterns along timestamps, query-specific correlations across timestamps, and semantic dependencies of timestamps, which capture TKG features from various perspectives. Besides, LMS incorporates an adaptive gate for merging entity representations both along and across timestamps effectively. Moreover, it integrates timestamp semantics into graph attention calculations and time-aware decoders, in order to impose temporal constraints on events and narrow down prediction scopes with historical statistics. Extensive experimental results on five event-based benchmark datasets demonstrate that LMS outperforms state-of-the-art extrapolation models, indicating the superiority of modeling a multi-graph perspective for TKG reasoning.

Title: Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction. (arXiv:2312.03025v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03025
Code URL: null
Copy Paste: [[2312.03025]] Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction(http://arxiv.org/abs/2312.03025)
Summary:
The task of multimodal relation extraction has attracted significant research attention, but progress is constrained by the scarcity of available training data. One natural thought is to extend existing datasets with cross-modal generative models. In this paper, we consider a novel problem setting, where only unimodal data, either text or image, are available during training. We aim to train a multimodal classifier from synthetic data that perform well on real multimodal test data. However, training with synthetic data suffers from two obstacles: lack of data diversity and label information loss. To alleviate the issues, we propose Mutual Information-aware Multimodal Iterated Relational dAta GEneration (MI2RAGE), which applies Chained Cross-modal Generation (CCG) to promote diversity in the generated data and exploits a teacher network to select valuable training samples with high mutual information with the ground-truth labels. Comparing our method to direct training on synthetic data, we observed a significant improvement of 24.06% F1 with synthetic text and 26.42% F1 with synthetic images. Notably, our best model trained on completely synthetic images outperforms prior state-of-the-art models trained on real multimodal data by a margin of 3.76% in F1. Our codebase will be made available upon acceptance.

Title: Generating Interpretable Networks using Hypernetworks. (arXiv:2312.03051v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03051
Code URL: null
Copy Paste: [[2312.03051]] Generating Interpretable Networks using Hypernetworks(http://arxiv.org/abs/2312.03051)
Summary:
An essential goal in mechanistic interpretability to decode a network, i.e., to convert a neural network's raw weights to an interpretable algorithm. Given the difficulty of the decoding problem, progress has been made to understand the easier encoding problem, i.e., to convert an interpretable algorithm into network weights. Previous works focus on encoding existing algorithms into networks, which are interpretable by definition. However, focusing on encoding limits the possibility of discovering new algorithms that humans have never stumbled upon, but that are nevertheless interpretable. In this work, we explore the possibility of using hypernetworks to generate interpretable networks whose underlying algorithms are not yet known. The hypernetwork is carefully designed such that it can control network complexity, leading to a diverse family of interpretable algorithms ranked by their complexity. All of them are interpretable in hindsight, although some of them are less intuitive to humans, hence providing new insights regarding how to "think" like a neural network. For the task of computing L1 norms, hypernetworks find three algorithms: (a) the double-sided algorithm, (b) the convexity algorithm, (c) the pudding algorithm, although only the first algorithm was expected by the authors before experiments. We automatically classify these algorithms and analyze how these algorithmic phases develop during training, as well as how they are affected by complexity control. Furthermore, we show that a trained hypernetwork can correctly construct models for input dimensions not seen in training, demonstrating systematic generalization.

Title: Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym. (arXiv:2312.03290v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03290
Code URL: https://github.com/mail-ecnu/text-gym-agents
Copy Paste: [[2312.03290]] Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym(http://arxiv.org/abs/2312.03290)
Summary:
The formidable capacity for zero- or few-shot decision-making in language agents encourages us to pose a compelling question: Can language agents be alternatives to PPO agents in traditional sequential decision-making tasks? To investigate this, we first take environments collected in OpenAI Gym as our testbeds and ground them to textual environments that construct the TextGym simulator. This allows for straightforward and efficient comparisons between PPO agents and language agents, given the widespread adoption of OpenAI Gym. To ensure a fair and effective benchmarking, we introduce $5$ levels of scenario for accurate domain-knowledge controlling and a unified RL-inspired framework for language agents. Additionally, we propose an innovative explore-exploit-guided language (EXE) agent to solve tasks within TextGym. Through numerical experiments and ablation studies, we extract valuable insights into the decision-making capabilities of language agents and make a preliminary evaluation of their potential to be alternatives to PPO in classical sequential decision-making problems. This paper sheds light on the performance of language agents and paves the way for future research in this exciting domain. Our code is publicly available at~\url{https://github.com/mail-ecnu/Text-Gym-Agents}.

Title: Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique. (arXiv:2312.03303v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03303
Code URL: null
Copy Paste: [[2312.03303]] Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique(http://arxiv.org/abs/2312.03303)
Summary:
This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypothesis accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. Availability and implementation: Dyport framework is fully open-source. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport

Title: Lazy-k: Decoding for Constrained Token Classification. (arXiv:2312.03367v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03367
Code URL: https://github.com/arthurdevnl/lazyk
Copy Paste: [[2312.03367]] Lazy-k: Decoding for Constrained Token Classification(http://arxiv.org/abs/2312.03367)
Summary:
We explore the possibility of improving probabilistic models in structured prediction. Specifically, we combine the models with constrained decoding approaches in the context of token classification for information extraction. The decoding methods search for constraint-satisfying label-assignments while maximizing the total probability. To do this, we evaluate several existing approaches, as well as propose a novel decoding method called Lazy-$k$. Our findings demonstrate that constrained decoding approaches can significantly improve the models' performances, especially when using smaller models. The Lazy-$k$ approach allows for more flexibility between decoding time and accuracy. The code for using Lazy-$k$ decoding can be found here: https://github.com/ArthurDevNL/lazyk.

Title: A Text-to-Text Model for Multilingual Offensive Language Identification. (arXiv:2312.03379v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03379
Code URL: null
Copy Paste: [[2312.03379]] A Text-to-Text Model for Multilingual Offensive Language Identification(http://arxiv.org/abs/2312.03379)
Summary:
The ubiquity of offensive content on social media is a growing cause for concern among companies and government organizations. Recently, transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance in detecting various forms of offensive content (e.g. hate speech, cyberbullying, and cyberaggression). However, the majority of these models are limited in their capabilities due to their encoder-only architecture, which restricts the number and types of labels in downstream tasks. Addressing these limitations, this study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5) trained on two large offensive language identification datasets; SOLID and CCTK. We investigate the effectiveness of combining two datasets and selecting an optimal threshold in semi-supervised instances in SOLID in the T5 retraining step. Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks. Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish). The results demonstrate that this multilingual model achieves a new state-of-the-art on all the above datasets, showing its usefulness in multilingual scenarios. Our proposed T5-based models will be made freely available to the community.

Title: Interpretability Illusions in the Generalization of Simplified Models. (arXiv:2312.03656v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03656
Code URL: null
Copy Paste: [[2312.03656]] Interpretability Illusions in the Generalization of Simplified Models(http://arxiv.org/abs/2312.03656)
Summary:
A common method to study deep learning systems is to use simplified model representations -- for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space. This approach assumes that the results of these simplified are faithful to the original model. Here, we illustrate an important caveat to this assumption: even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model's behavior out of distribution -- the understanding developed from simplified representations may be an illusion. We illustrate this by training Transformer models on controlled datasets with systematic generalization splits. First, we train models on the Dyck balanced-parenthesis languages. We simplify these models using tools like dimensionality reduction and clustering, and then explicitly test how these simplified proxies match the behavior of the original model on various out-of-distribution test sets. We find that the simplified proxies are generally less faithful out of distribution. In cases where the original model generalizes to novel structures or deeper depths, the simplified versions may fail, or generalize better. This finding holds even if the simplified representations do not directly depend on the training distribution. Next, we study a more naturalistic task: predicting the next character in a dataset of computer code. We find similar generalization gaps between the original model and simplified proxies, and conduct further analysis to investigate which aspects of the code completion task are associated with the largest gaps. Together, our results raise questions about the extent to which mechanistic interpretations derived using tools like SVD can reliably predict what a model will do in novel situations.

Title: Transformer-Based Deep Learning Model for Bored Pile Load-Deformation Prediction in Bangkok Subsoil. (arXiv:2312.03041v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03041
Code URL: null
Copy Paste: [[2312.03041]] Transformer-Based Deep Learning Model for Bored Pile Load-Deformation Prediction in Bangkok Subsoil(http://arxiv.org/abs/2312.03041)
Summary:
This paper presents a novel deep learning model based on the transformer architecture to predict the load-deformation behavior of large bored piles in Bangkok subsoil. The model encodes the soil profile and pile features as tokenization input, and generates the load-deformation curve as output. The model also incorporates the previous sequential data of load-deformation curve into the decoder to improve the prediction accuracy. The model also incorporates the previous sequential data of load-deformation curve into the decoder. The model shows a satisfactory accuracy and generalization ability for the load-deformation curve prediction, with a mean absolute error of 5.72% for the test data. The model could also be used for parametric analysis and design optimization of piles under different soil and pile conditions, pile cross section, pile length and type of pile.

Title: REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training. (arXiv:2312.03044v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03044
Code URL: https://github.com/zhao1402072392/rest
Copy Paste: [[2312.03044]] REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training(http://arxiv.org/abs/2312.03044)
Summary:
The deep neural network (DNN) has been proven effective in various domains. However, they often struggle to perform well on certain minority groups during inference, despite showing strong performance on the majority of data groups. This is because over-parameterized models learned \textit{bias attributes} from a large number of \textit{bias-aligned} training samples. These bias attributes are strongly spuriously correlated with the target variable, causing the models to be biased towards spurious correlations (i.e., \textit{bias-conflicting}). To tackle this issue, we propose a novel \textbf{re}weighted \textbf{s}parse \textbf{t}raining framework, dubbed as \textit{\textbf{REST}}, which aims to enhance the performance of biased data while improving computation and memory efficiency. Our proposed REST framework has been experimentally validated on three datasets, demonstrating its effectiveness in exploring unbiased subnetworks. We found that REST reduces the reliance on spuriously correlated features, leading to better performance across a wider range of data groups with fewer training and inference resources. We highlight that the \textit{REST} framework represents a promising approach for improving the performance of DNNs on biased data, while simultaneously improving computation and memory efficiency. By reducing the reliance on spurious correlations, REST has the potential to enhance the robustness of DNNs and improve their generalization capabilities. Code is released at \url{https://github.com/zhao1402072392/REST}

Title: Multitask Learning Can Improve Worst-Group Outcomes. (arXiv:2312.03151v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03151
Code URL: https://github.com/atharvajk98/mtl-group-robustness
Copy Paste: [[2312.03151]] Multitask Learning Can Improve Worst-Group Outcomes(http://arxiv.org/abs/2312.03151)
Summary:
In order to create machine learning systems that serve a variety of users well, it is vital to not only achieve high average performance but also ensure equitable outcomes across diverse groups. However, most machine learning methods are designed to improve a model's average performance on a chosen end task without consideration for their impact on worst group error. Multitask learning (MTL) is one such widely used technique. In this paper, we seek not only to understand the impact of MTL on worst-group accuracy but also to explore its potential as a tool to address the challenge of group-wise fairness. We primarily consider the common setting of fine-tuning a pre-trained model, where, following recent work (Gururangan et al., 2020; Dery et al., 2023), we multitask the end task with the pre-training objective constructed from the end task data itself. In settings with few or no group annotations, we find that multitasking often, but not always, achieves better worst-group accuracy than Just-Train-Twice (JTT; Liu et al. (2021)) -- a representative distributionally robust optimization (DRO) method. Leveraging insights from synthetic data experiments, we propose to modify standard MTL by regularizing the joint multitask representation space. We run a large number of fine-tuning experiments across computer vision and natural language and find that our regularized MTL approach consistently outperforms JTT on both worst and average group outcomes. Our official code can be found here: https://github.com/atharvajk98/MTL-group-robustness.

Title: CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models. (arXiv:2312.03256v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03256
Code URL: https://github.com/hugozhl/cafe
Copy Paste: [[2312.03256]] CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models(http://arxiv.org/abs/2312.03256)
Summary:
Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. The design philosophy of CAFE is to dynamically allocate more memory resources to important features (called hot features), and allocate less memory to unimportant ones. In CAFE, we propose a fast and lightweight sketch data structure, named HotSketch, to capture feature importance and report hot features in real time. For each reported hot feature, we assign it a unique embedding. For the non-hot features, we allow multiple features to share one embedding by using hash embedding technique. Guided by our design philosophy, we further propose a multi-level hash embedding framework to optimize the embedding tables of non-hot features. We theoretically analyze the accuracy of HotSketch, and analyze the model convergence against deviation. Extensive experiments show that CAFE significantly outperforms existing embedding compression methods, yielding 3.92% and 3.68% superior testing AUC on Criteo Kaggle dataset and CriteoTB dataset at a compression ratio of 10000x. The source codes of CAFE are available at GitHub.

Title: Enhancing Molecular Property Prediction via Mixture of Collaborative Experts. (arXiv:2312.03292v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03292
Code URL: https://github.com/Hyacinth-YX/mixture-of-collaborative-experts
Copy Paste: [[2312.03292]] Enhancing Molecular Property Prediction via Mixture of Collaborative Experts(http://arxiv.org/abs/2312.03292)
Summary:
Molecular Property Prediction (MPP) task involves predicting biochemical properties based on molecular features, such as molecular graph structures, contributing to the discovery of lead compounds in drug development. To address data scarcity and imbalance in MPP, some studies have adopted Graph Neural Networks (GNN) as an encoder to extract commonalities from molecular graphs. However, these approaches often use a separate predictor for each task, neglecting the shared characteristics among predictors corresponding to different tasks. In response to this limitation, we introduce the GNN-MoCE architecture. It employs the Mixture of Collaborative Experts (MoCE) as predictors, exploiting task commonalities while confronting the homogeneity issue in the expert pool and the decision dominance dilemma within the expert group. To enhance expert diversity for collaboration among all experts, the Expert-Specific Projection method is proposed to assign a unique projection perspective to each expert. To balance decision-making influence for collaboration within the expert group, the Expert-Specific Loss is presented to integrate individual expert loss into the weighted decision loss of the group for more equitable training. Benefiting from the enhancements of MoCE in expert creation, dynamic expert group formation, and experts' collaboration, our model demonstrates superior performance over traditional methods on 24 MPP datasets, especially in tasks with limited data or high imbalance.

Title: Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild. (arXiv:2312.03344v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03344
Code URL: null
Copy Paste: [[2312.03344]] Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild(http://arxiv.org/abs/2312.03344)
Summary:
Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose a hybrid variational autoencoder to learn interpretable representations of CGM and meal data. Our method grounds the latent space to the inputs of a mechanistic differential equation, producing embeddings that reflect physiological quantities, such as insulin sensitivity, glucose effectiveness, and basal glucose levels. Moreover, we introduce a novel method to infer the glucose appearance rate, making the mechanistic model robust to unreliable meal logs. On a dataset of CGM and self-reported meals from individuals with type-2 diabetes and pre-diabetes, our unsupervised representation discovers a separation between individuals proportional to their disease severity. Our embeddings produce clusters that are up to 4x better than naive, expert, black-box, and pure mechanistic features. Our method provides a nuanced, yet interpretable, embedding space to compare glycemic control within and across individuals, directly learnable from in-the-wild data.

chat

retrieval augmented generation

rag

Title: Data-Driven Traffic Reconstruction and Kernel Methods for Identifying Stop-and-Go Congestion. (arXiv:2312.03186v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03186
Code URL: null
Copy Paste: [[2312.03186]] Data-Driven Traffic Reconstruction and Kernel Methods for Identifying Stop-and-Go Congestion(http://arxiv.org/abs/2312.03186)
Summary:
Identifying stop-and-go events (SAGs) in traffic flow presents an important avenue for advancing data-driven research for climate change mitigation and sustainability, owing to their substantial impact on carbon emissions, travel time, fuel consumption, and roadway safety. In fact, SAGs are estimated to account for 33-50% of highway driving externalities. However, insufficient attention has been paid to precisely quantifying where, when, and how much these SAGs take place -necessary for downstream decision making, such as intervention design and policy analysis. A key challenge is that the data available to researchers and governments are typically sparse and aggregated to a granularity that obscures SAGs. To overcome such data limitations, this study thus explores the use of traffic reconstruction techniques for SAG identification. In particular, we introduce a kernel-based method for identifying spatio-temporal features in traffic and leverage bootstrapping to quantify the uncertainty of the reconstruction process. Experimental results on California highway data demonstrate the promise of the method for capturing SAGs. This work contributes to a foundation for data-driven decision making to advance sustainability of traffic systems.

Title: Deep Multimodal Fusion for Surgical Feedback Classification. (arXiv:2312.03231v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03231
Code URL: null
Copy Paste: [[2312.03231]] Deep Multimodal Fusion for Surgical Feedback Classification(http://arxiv.org/abs/2312.03231)
Summary:
Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In this work, we leverage a clinically-validated five-category classification of surgical feedback: "Anatomic", "Technical", "Procedural", "Praise" and "Visual Aid". We then develop a multi-label machine learning model to classify these five categories of surgical feedback from inputs of text, audio, and video modalities. The ultimate goal of our work is to help automate the annotation of real-time contextual surgical feedback at scale. Our automated classification of surgical feedback achieves AUCs ranging from 71.5 to 77.6 with the fusion improving performance by 3.1%. We also show that high-quality manual transcriptions of feedback audio from experts improve AUCs to between 76.5 and 96.2, which demonstrates a clear path toward future improvements. Empirically, we find that the Staged training strategy, with first pre-training each modality separately and then training them jointly, is more effective than training different modalities altogether. We also present intuitive findings on the importance of modalities for different feedback categories. This work offers an important first look at the feasibility of automated classification of real-world live surgical feedback based on text, audio, and video modalities.

Title: Approximating Solutions to the Knapsack Problem using the Lagrangian Dual Framework. (arXiv:2312.03413v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03413
Code URL: null
Copy Paste: [[2312.03413]] Approximating Solutions to the Knapsack Problem using the Lagrangian Dual Framework(http://arxiv.org/abs/2312.03413)
Summary:
The Knapsack Problem is a classic problem in combinatorial optimisation. Solving these problems may be computationally expensive. Recent years have seen a growing interest in the use of deep learning methods to approximate the solutions to such problems. A core problem is how to enforce or encourage constraint satisfaction in predicted solutions. A promising approach for predicting solutions to constrained optimisation problems is the Lagrangian Dual Framework which builds on the method of Lagrangian Relaxation. In this paper we develop neural network models to approximate Knapsack Problem solutions using the Lagrangian Dual Framework while improving constraint satisfaction. We explore the problems of output interpretation and model selection within this context. Experimental results show strong constraint satisfaction with a minor reduction of optimality as compared to a baseline neural network which does not explicitly model the constraints.

Title: Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation. (arXiv:2312.03312v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03312
Code URL: null
Copy Paste: [[2312.03312]] Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation(http://arxiv.org/abs/2312.03312)
Summary:
This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. Additionally, we introduce a global phoneme noise generator for realistic ASR noise during phoneme-to-grapheme training to reduce error propagation. Experiments on the CommonVoice 12.0 dataset show significant reductions in Word Error Rate (WER) for low-resource languages, highlighting the effectiveness of our approach. This research contributes to the advancements of two-pass ASR systems in low-resource languages, offering the potential for improved cross-lingual transfer learning.

Title: Domain Invariant Representation Learning and Sleep Dynamics Modeling for Automatic Sleep Staging. (arXiv:2312.03196v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03196
Code URL: https://github.com/yeon-lab/dream
Copy Paste: [[2312.03196]] Domain Invariant Representation Learning and Sleep Dynamics Modeling for Automatic Sleep Staging(http://arxiv.org/abs/2312.03196)
Summary:
Sleep staging has become a critical task in diagnosing and treating sleep disorders to prevent sleep related diseases. With rapidly growing large scale public sleep databases and advances in machine learning, significant progress has been made toward automatic sleep staging. However, previous studies face some critical problems in sleep studies; the heterogeneity of subjects' physiological signals, the inability to extract meaningful information from unlabeled sleep signal data to improve predictive performances, the difficulty in modeling correlations between sleep stages, and the lack of an effective mechanism to quantify predictive uncertainty. In this study, we propose a neural network based automatic sleep staging model, named DREAM, to learn domain generalized representations from physiological signals and models sleep dynamics. DREAM learns sleep related and subject invariant representations from diverse subjects' sleep signal segments and models sleep dynamics by capturing interactions between sequential signal segments and between sleep stages. In the experiments, we demonstrate that DREAM outperforms the existing sleep staging methods on three datasets. The case study demonstrates that our model can learn the generalized decision function resulting in good prediction performances for the new subjects, especially in case there are differences between testing and training subjects. The usage of unlabeled data shows the benefit of leveraging unlabeled EEG data. Further, uncertainty quantification demonstrates that DREAM provides prediction uncertainty, making the model reliable and helping sleep experts in real world applications.

Title: Physical Symbolic Optimization. (arXiv:2312.03612v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03612
Code URL: https://github.com/wassimtenachi/physo
Copy Paste: [[2312.03612]] Physical Symbolic Optimization(http://arxiv.org/abs/2312.03612)
Summary:
We present a framework for constraining the automatic sequential generation of equations to obey the rules of dimensional analysis by construction. Combining this approach with reinforcement learning, we built $\Phi$-SO, a Physical Symbolic Optimization method for recovering analytical functions from physical data leveraging units constraints. Our symbolic regression algorithm achieves state-of-the-art results in contexts in which variables and constants have known physical units, outperforming all other methods on SRBench's Feynman benchmark in the presence of noise (exceeding 0.1%) and showing resilience even in the presence of significant (10%) levels of noise.

Title: On the Role of Edge Dependency in Graph Generative Models. (arXiv:2312.03691v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03691
Code URL: null
Copy Paste: [[2312.03691]] On the Role of Edge Dependency in Graph Generative Models(http://arxiv.org/abs/2312.03691)
Summary:
In this work, we introduce a novel evaluation framework for generative models of graphs, emphasizing the importance of model-generated graph overlap (Chanpuriya et al., 2021) to ensure both accuracy and edge-diversity. We delineate a hierarchy of graph generative models categorized into three levels of complexity: edge independent, node independent, and fully dependent models. This hierarchy encapsulates a wide range of prevalent methods. We derive theoretical bounds on the number of triangles and other short-length cycles producible by each level of the hierarchy, contingent on the model overlap. We provide instances demonstrating the asymptotic optimality of our bounds. Furthermore, we introduce new generative models for each of the three hierarchical levels, leveraging dense subgraph discovery (Gionis & Tsourakakis, 2015). Our evaluation, conducted on real-world datasets, focuses on assessing the output quality and overlap of our proposed models in comparison to other popular models. Our results indicate that our simple, interpretable models provide competitive baselines to popular generative models. Through this investigation, we aim to propel the advancement of graph generative models by offering a structured framework and robust evaluation metrics, thereby facilitating the development of models capable of generating accurate and edge-diverse graphs.

language model

Title: Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey. (arXiv:2312.03014v1 [cs.LG])

Title: Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction. (arXiv:2312.03022v1 [cs.AI])

Title: Evaluating Agents using Social Choice Theory. (arXiv:2312.03121v1 [cs.AI])

Title: FlexModel: A Framework for Interpretability of Distributed Large Language Models. (arXiv:2312.03140v1 [cs.LG])

Title: Teaching Specific Scientific Knowledge into Large Language Models through Additional Training. (arXiv:2312.03360v1 [cs.CL])

Title: Not All Large Language Models (LLMs) Succumb to the "Reversal Curse": A Comparative Study of Deductive Logical Reasoning in BERT and GPT Models. (arXiv:2312.03633v1 [cs.CL])

Title: Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia. (arXiv:2312.03664v1 [cs.AI])

Title: Assertion Enhanced Few-Shot Learning: Instructive Technique for Large Language Models to Generate Educational Explanations. (arXiv:2312.03122v1 [cs.CL])

Title: Corporate Bankruptcy Prediction with Domain-Adapted BERT. (arXiv:2312.03194v1 [cs.CL])

Title: Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities. (arXiv:2312.03330v1 [cs.CL])

Title: Compressed Context Memory For Online Language Model Interaction. (arXiv:2312.03414v1 [cs.LG])

Title: Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification. (arXiv:2312.03458v1 [cs.CL])

Title: DBCopilot: Scaling Natural Language Querying to Massive Databases. (arXiv:2312.03463v1 [cs.CL])

Title: Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling. (arXiv:2312.03523v1 [cs.CL])

Title: Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment. (arXiv:2312.03549v1 [cs.CL])

Title: Evaluating and Mitigating Discrimination in Language Model Decisions. (arXiv:2312.03689v1 [cs.CL])

gpt

Title: XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering. (arXiv:2312.03567v1 [cs.CL])

llm

Title: Inherent limitations of LLMs regarding spatial information. (arXiv:2312.03042v1 [cs.CL])

Title: Clinical Notes Reveal Physician Fatigue. (arXiv:2312.03077v1 [cs.CL])

Title: LLMs for Multi-Modal Knowledge Extraction and Analysis in Intelligence/Safety-Critical Applications. (arXiv:2312.03088v1 [cs.CL])

long context

lora

Title: Deep Reinforcement Learning for Community Battery Scheduling under Uncertainties of Load, PV Generation, and Energy Prices. (arXiv:2312.03008v1 [cs.LG])

Title: I-PHYRE: Interactive Physical Reasoning. (arXiv:2312.03009v1 [cs.AI])

Title: Speculative Exploration on the Concept of Artificial Agents Conducting Autonomous Research. (arXiv:2312.03497v1 [cs.AI])

Title: Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes. (arXiv:2312.03176v1 [cs.LG])

Title: Constrained Bayesian Optimization Under Partial Observations: Balanced Improvements and Provable Convergence. (arXiv:2312.03212v1 [cs.LG])

Title: Run LoRA Run: Faster and Lighter LoRA Implementations. (arXiv:2312.03415v1 [cs.LG])

Title: Search Strategies for Self-driving Laboratories with Pending Experiments. (arXiv:2312.03466v1 [cs.LG])

hallucination

prompt

Title: Exploring Answer Information Methods for Question Generation with Transformers. (arXiv:2312.03483v1 [cs.CL])

Title: PROMISE: A Framework for Model-Driven Stateful Prompt Orchestration. (arXiv:2312.03699v1 [cs.CL])

code

Title: Learning Multi-graph Structure for Temporal Knowledge Graph Reasoning. (arXiv:2312.03004v1 [cs.AI])

Title: Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction. (arXiv:2312.03025v1 [cs.AI])

Title: Generating Interpretable Networks using Hypernetworks. (arXiv:2312.03051v1 [cs.LG])

Title: Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym. (arXiv:2312.03290v1 [cs.AI])

Title: Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique. (arXiv:2312.03303v1 [cs.AI])

Title: Lazy-k: Decoding for Constrained Token Classification. (arXiv:2312.03367v1 [cs.CL])

Title: A Text-to-Text Model for Multilingual Offensive Language Identification. (arXiv:2312.03379v1 [cs.CL])

Title: Interpretability Illusions in the Generalization of Simplified Models. (arXiv:2312.03656v1 [cs.LG])

Title: Transformer-Based Deep Learning Model for Bored Pile Load-Deformation Prediction in Bangkok Subsoil. (arXiv:2312.03041v1 [cs.LG])

Title: REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training. (arXiv:2312.03044v1 [cs.LG])

Title: Multitask Learning Can Improve Worst-Group Outcomes. (arXiv:2312.03151v1 [cs.LG])

Title: CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models. (arXiv:2312.03256v1 [cs.LG])

Title: Enhancing Molecular Property Prediction via Mixture of Collaborative Experts. (arXiv:2312.03292v1 [cs.LG])

Title: Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild. (arXiv:2312.03344v1 [cs.LG])

chat

retrieval augmented generation

rag

Title: Data-Driven Traffic Reconstruction and Kernel Methods for Identifying Stop-and-Go Congestion. (arXiv:2312.03186v1 [cs.LG])

Title: Deep Multimodal Fusion for Surgical Feedback Classification. (arXiv:2312.03231v1 [cs.LG])

Title: Approximating Solutions to the Knapsack Problem using the Lagrangian Dual Framework. (arXiv:2312.03413v1 [cs.LG])

Title: Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation. (arXiv:2312.03312v1 [cs.CL])

Title: Domain Invariant Representation Learning and Sleep Dynamics Modeling for Automatic Sleep Staging. (arXiv:2312.03196v1 [cs.LG])

Title: Physical Symbolic Optimization. (arXiv:2312.03612v1 [cs.LG])

Title: On the Role of Edge Dependency in Graph Generative Models. (arXiv:2312.03691v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought