2023-12-20

language model

Title: Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data. (arXiv:2312.11502v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11502
Code URL: null
Copy Paste: [[2312.11502]] Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data(http://arxiv.org/abs/2312.11502)
Summary:
In this work we introduce Labrador, a pre-trained Transformer model for laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million lab test results from electronic health records (EHRs) and evaluated on various downstream outcome prediction tasks. Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. Our ablation studies reveal that transfer learning shows limited effectiveness for BERT and achieves marginal success with Labrador. We explore the reasons for the failure of transfer learning and suggest that the data generating process underlying each patient cannot be characterized sufficiently using labs alone, among other factors. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations.

Title: The performance of multiple language models in identifying offensive language on social media. (arXiv:2312.11504v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11504
Code URL: null
Copy Paste: [[2312.11504]] The performance of multiple language models in identifying offensive language on social media(http://arxiv.org/abs/2312.11504)
Summary:
Text classification is an important topic in the field of natural language processing. It has been preliminarily applied in information retrieval, digital library, automatic abstracting, text filtering, word semantic discrimination and many other fields. The aim of this research is to use a variety of algorithms to test the ability to identify offensive posts and evaluate their performance against a variety of assessment methods. The motivation for this project is to reduce the harm of these languages to human censors by automating the screening of offending posts. The field is a new one, and despite much interest in the past two years, there has been no focus on the object of the offence. Through the experiment of this project, it should inspire future research on identification methods as well as identification content.

Title: LLM in a flash: Efficient Large Language Model Inference with Limited Memory. (arXiv:2312.11514v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11514
Code URL: null
Copy Paste: [[2312.11514]] LLM in a flash: Efficient Large Language Model Inference with Limited Memory(http://arxiv.org/abs/2312.11514)
Summary:
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this flash memory-informed framework, we introduce two principal techniques. First, "windowing'" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory.

Title: User Modeling in the Era of Large Language Models: Current Research and Future Directions. (arXiv:2312.11518v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11518
Code URL: null
Copy Paste: [[2312.11518]] User Modeling in the Era of Large Language Models: Current Research and Future Directions(http://arxiv.org/abs/2312.11518)
Summary:
User modeling (UM) aims to discover patterns or learn representations from user data about the characteristics of a specific user, such as profile, preference, and personality. The user models enable personalization and suspiciousness detection in many online applications such as recommendation, education, and healthcare. Two common types of user data are text and graph, as the data usually contain a large amount of user-generated content (UGC) and online interactions. The research of text and graph mining is developing rapidly, contributing many notable solutions in the past two decades. Recently, large language models (LLMs) have shown superior performance on generating, understanding, and even reasoning over text data. The approaches of user modeling have been equipped with LLMs and soon become outstanding. This article summarizes existing research about how and why LLMs are great tools of modeling and understanding UGC. Then it reviews a few categories of large language models for user modeling (LLM-UM) approaches that integrate the LLMs with text and graph-based methods in different ways. Then it introduces specific LLM-UM techniques for a variety of UM applications. Finally, it presents remaining challenges and future directions in the LLM-UM research. We maintain the reading list at: https://github.com/TamSiuhin/LLM-UM-Reading

Title: Large Language Models are Complex Table Parsers. (arXiv:2312.11521v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11521
Code URL: null
Copy Paste: [[2312.11521]] Large Language Models are Complex Table Parsers(http://arxiv.org/abs/2312.11521)
Summary:
With the Generative Pre-trained Transformer 3.5 (GPT-3.5) exhibiting remarkable reasoning and comprehension abilities in Natural Language Processing (NLP), most Question Answering (QA) research has primarily centered around general QA tasks based on GPT, neglecting the specific challenges posed by Complex Table QA. In this paper, we propose to incorporate GPT-3.5 to address such challenges, in which complex tables are reconstructed into tuples and specific prompt designs are employed for dialogues. Specifically, we encode each cell's hierarchical structure, position information, and content as a tuple. By enhancing the prompt template with an explanatory description of the meaning of each tuple and the logical reasoning process of the task, we effectively improve the hierarchical structure awareness capability of GPT-3.5 to better parse the complex tables. Extensive experiments and results on Complex Table QA datasets, i.e., the open-domain dataset HiTAB and the aviation domain dataset AIT-QA show that our approach significantly outperforms previous work on both datasets, leading to state-of-the-art (SOTA) performance.

Title: ToViLaG: Your Visual-Language Generative Model is Also An Evildoer. (arXiv:2312.11523v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11523
Code URL: https://github.com/victorup/ToViLaG
Copy Paste: [[2312.11523]] ToViLaG: Your Visual-Language Generative Model is Also An Evildoer(http://arxiv.org/abs/2312.11523)
Summary:
Warning: this paper includes model outputs showing offensive content. Recent large-scale Visual-Language Generative Models (VLGMs) have achieved unprecedented improvement in multimodal image/text generation. However, these models might also generate toxic content, e.g., offensive text and pornography images, raising significant ethical risks. Despite exhaustive studies on toxic degeneration of language models, this problem remains largely unexplored within the context of visual-language generation. This work delves into the propensity for toxicity generation and susceptibility to toxic data across various VLGMs. For this purpose, we built ToViLaG, a dataset comprising 32K co-toxic/mono-toxic text-image pairs and 1K innocuous but evocative text that tends to stimulate toxicity. Furthermore, we propose WInToRe, a novel toxicity metric tailored to visual-language generation, which theoretically reflects different aspects of toxicity considering both input and output. On such a basis, we benchmarked the toxicity of a diverse spectrum of VLGMs and discovered that some models do more evil than expected while some are more vulnerable to infection, underscoring the necessity of VLGMs detoxification. Therefore, we develop an innovative bottleneck-based detoxification method. Our method could reduce toxicity while maintaining comparable generation quality, providing a promising initial solution to this line of research.

Title: Evaluating Language-Model Agents on Realistic Autonomous Tasks. (arXiv:2312.11671v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11671
Code URL: null
Copy Paste: [[2312.11671]] Evaluating Language-Model Agents on Realistic Autonomous Tasks(http://arxiv.org/abs/2312.11671)
Summary:
In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as "autonomous replication and adaptation" or ARA. We believe that systems capable of ARA could have wide-reaching and hard-to-anticipate consequences, and that measuring and forecasting ARA may be useful for informing measures around security, monitoring, and alignment. Additionally, once a system is capable of ARA, placing bounds on a system's capabilities may become significantly more difficult.

We construct four simple example agents that combine language models with tools that allow them to take actions in the world. We then evaluate these agents on 12 tasks relevant to ARA. We find that these language model agents can only complete the easiest tasks from this list, although they make some progress on the more challenging tasks. Unfortunately, these evaluations are not adequate to rule out the possibility that near-future agents will be capable of ARA. In particular, we do not think that these evaluations provide good assurance that the ``next generation'' of language models (e.g. 100x effective compute scaleup on existing models) will not yield agents capable of ARA, unless intermediate evaluations are performed during pretraining. Relatedly, we expect that fine-tuning of the existing models could produce substantially more competent agents, even if the fine-tuning is not directly targeted at ARA.

Title: Agent-based Learning of Materials Datasets from Scientific Literature. (arXiv:2312.11690v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11690
Code URL: https://github.com/ai4chems/eunomia
Copy Paste: [[2312.11690]] Agent-based Learning of Materials Datasets from Scientific Literature(http://arxiv.org/abs/2312.11690)
Summary:
Advancements in machine learning and artificial intelligence are transforming materials discovery. Yet, the availability of structured experimental data remains a bottleneck. The vast corpus of scientific literature presents a valuable and rich resource of such data. However, manual dataset creation from these resources is challenging due to issues in maintaining quality and consistency, scalability limitations, and the risk of human error and bias. Therefore, in this work, we develop a chemist AI agent, powered by large language models (LLMs), to overcome these challenges by autonomously creating structured datasets from natural language text, ranging from sentences and paragraphs to extensive scientific research articles. Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles, scientists, the Internet and other tools altogether. We benchmark the performance of our approach in three different information extraction tasks with various levels of complexity, including solid-state impurity doping, metal-organic framework (MOF) chemical formula, and property relations. Our results demonstrate that our zero-shot agent, with the appropriate tools, is capable of attaining performance that is either superior or comparable to the state-of-the-art fine-tuned materials information extraction methods. This approach simplifies compilation of machine learning-ready datasets for various materials discovery applications, and significantly ease the accessibility of advanced natural language processing tools for novice users in natural language. The methodology in this work is developed as an open-source software on https://github.com/AI4ChemS/Eunomia.

Title: Robust Stochastic Graph Generator for Counterfactual Explanations. (arXiv:2312.11747v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11747
Code URL: null
Copy Paste: [[2312.11747]] Robust Stochastic Graph Generator for Counterfactual Explanations(http://arxiv.org/abs/2312.11747)
Summary:
Counterfactual Explanation (CE) techniques have garnered attention as a means to provide insights to the users engaging with AI systems. While extensively researched in domains such as medical imaging and autonomous vehicles, Graph Counterfactual Explanation (GCE) methods have been comparatively under-explored. GCEs generate a new graph similar to the original one, with a different outcome grounded on the underlying predictive model. Among these GCE techniques, those rooted in generative mechanisms have received relatively limited investigation despite demonstrating impressive accomplishments in other domains, such as artistic styles and natural language modelling. The preference for generative explainers stems from their capacity to generate counterfactual instances during inference, leveraging autonomously acquired perturbations of the input graph. Motivated by the rationales above, our study introduces RSGG-CE, a novel Robust Stochastic Graph Generator for Counterfactual Explanations able to produce counterfactual examples from the learned latent space considering a partially ordered generation sequence. Furthermore, we undertake quantitative and qualitative analyses to compare RSGG-CE's performance against SoA generative explainers, highlighting its increased ability to engendering plausible counterfactual candidates.

Title: Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment. (arXiv:2312.11813v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11813
Code URL: https://github.com/tsinghua-fib-lab/ugi
Copy Paste: [[2312.11813]] Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment(http://arxiv.org/abs/2312.11813)
Summary:
Urban environments, characterized by their complex, multi-layered networks encompassing physical, social, economic, and environmental dimensions, face significant challenges in the face of rapid urbanization. These challenges, ranging from traffic congestion and pollution to social inequality, call for advanced technological interventions. Recent developments in big data, artificial intelligence, urban computing, and digital twins have laid the groundwork for sophisticated city modeling and simulation. However, a gap persists between these technological capabilities and their practical implementation in addressing urban challenges in an systemic-intelligent way. This paper proposes Urban Generative Intelligence (UGI), a novel foundational platform integrating Large Language Models (LLMs) into urban systems to foster a new paradigm of urban intelligence. UGI leverages CityGPT, a foundation model trained on city-specific multi-source data, to create embodied agents for various urban tasks. These agents, operating within a textual urban environment emulated by city simulator and urban knowledge graph, interact through a natural language interface, offering an open platform for diverse intelligent and embodied agent development. This platform not only addresses specific urban issues but also simulates complex urban systems, providing a multidisciplinary approach to understand and manage urban complexity. This work signifies a transformative step in city science and urban intelligence, harnessing the power of LLMs to unravel and address the intricate dynamics of urban systems. The code repository with demonstrations will soon be released here https://github.com/tsinghua-fib-lab/UGI.

Title: An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training. (arXiv:2312.11819v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11819
Code URL: null
Copy Paste: [[2312.11819]] An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training(http://arxiv.org/abs/2312.11819)
Summary:
Recently, ChatGPT or InstructGPT like large language models (LLM) has made a significant impact in the AI world. These models are incredibly versatile, capable of performing language tasks on par or even exceeding the capabilities of human experts. Many works have attempted to reproduce the complex InstructGPT's RLHF (Reinforcement Learning with Human Feedback) training pipeline. However, the mainstream distributed RLHF training methods typically adopt a fixed model placement strategy, referred to as the Flattening strategy. This strategy treats all four models involved in RLHF as a single entity and places them on all devices, regardless of their differences. Unfortunately, this strategy exacerbates the generation bottlenecks in the RLHF training and degrades the overall training efficiency. To address these issues, we propose an adaptive model placement framework that offers two flexible model placement strategies. These strategies allow for the agile allocation of models across devices in a fine-grained manner. The Interleaving strategy helps reduce memory redundancy and communication costs during RLHF training. On the other hand, the Separation strategy improves the throughput of model training by separating the training and generation stages of the RLHF pipeline. Notably, this framework seamlessly integrates with other mainstream techniques for acceleration and enables automatic hyperparameter search. Extensive experiments have demonstrated that our Interleaving and Separation strategies can achieve notable improvements up to 11x, compared to the current state-of-the-art (SOTA) approaches. These experiments encompassed a wide range of training scenarios, involving models of varying sizes and devices of different scales. The results highlight the effectiveness and superiority of our approaches in accelerating the training of distributed RLHF.

Title: Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach. (arXiv:2312.11865v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11865
Code URL: https://github.com/histmeisah/large-language-models-play-starcraftii
Copy Paste: [[2312.11865]] Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach(http://arxiv.org/abs/2312.11865)
Summary:
StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro level operations and strategic macro awareness. Previous works, such as Alphastar and SCC, achieve impressive performance on tackling StarCraft II , however, still exhibit deficiencies in long term strategic planning and strategy interpretability. Emerging large language model (LLM) agents, such as Voyage and MetaGPT, presents the immense potential in solving intricate tasks. Motivated by this, we aim to validate the capabilities of LLMs on StarCraft II, a highly complex RTS game.To conveniently take full advantage of LLMs` reasoning abilities, we first develop textual StratCraft II environment, called TextStarCraft II, which LLM agent can interact. Secondly, we propose a Chain of Summarization method, including single frame summarization for processing raw observations and multi frame summarization for analyzing game information, providing command recommendations, and generating strategic decisions. Our experiment consists of two parts: first, an evaluation by human experts, which includes assessing the LLMs`s mastery of StarCraft II knowledge and the performance of LLM agents in the game; second, the in game performance of LLM agents, encompassing aspects like win rate and the impact of Chain of Summarization.Experiment results demonstrate that: 1. LLMs possess the relevant knowledge and complex planning abilities needed to address StarCraft II scenarios; 2. Human experts consider the performance of LLM agents to be close to that of an average player who has played StarCraft II for eight years; 3. LLM agents are capable of defeating the built in AI at the Harder(Lv5) difficulty level. We have open sourced the code and released demo videos of LLM agent playing StarCraft II.

Title: Sparse is Enough in Fine-tuning Pre-trained Large Language Model. (arXiv:2312.11875v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11875
Code URL: https://github.com/song-wx/sift
Copy Paste: [[2312.11875]] Sparse is Enough in Fine-tuning Pre-trained Large Language Model(http://arxiv.org/abs/2312.11875)
Summary:
With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation, including Adapters, Bia-only, and the recently widely used Low-Rank Adaptation. Although these methods have demonstrated their effectiveness to some extent and have been widely applied, the underlying principles are still unclear. In this paper, we reveal the transition of loss landscape in the downstream domain from random initialization to pre-trained initialization, that is, from low-amplitude oscillation to high-amplitude oscillation. The parameter gradients exhibit a property akin to sparsity, where a small fraction of components dominate the total gradient norm, for instance, 1% of the components account for 99% of the gradient. This property ensures that the pre-trained model can easily find a flat minimizer which guarantees the model's ability to generalize even with a low number of trainable parameters. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/.

Title: ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference. (arXiv:2312.11882v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11882
Code URL: null
Copy Paste: [[2312.11882]] ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference(http://arxiv.org/abs/2312.11882)
Summary:
Early Exiting is one of the most popular methods to achieve efficient inference. Current early exiting methods adopt the (weighted) sum of the cross entropy loss of all internal classifiers during training, imposing all these classifiers to predict all instances correctly. However, during inference, as long as one internal classifier predicts an instance correctly, it can accelerate without losing accuracy. Thus, there is a notable gap between training and inference. We propose ConsistentEE, an early exiting method that is consistent in training and inference. ConsistentEE formulates the early exiting process as a reinforcement learning problem. A policy network is added to decide whether an instance should exit or continue. The training objective of ConsistentEE only require each instance to be predicted correctly by one internal classifier. Additionally, we introduce the concept Memorize Layer to measure the hardness of an instance. We incorporate memorized layer into reward function design, which allows ``easy'' instances to focus more on acceleration while ``hard'' instances to focus more on accuracy. Experimental results show that our method outperforms other baselines on various natural language understanding and generation tasks.

Title: Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives. (arXiv:2312.11970v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11970
Code URL: null
Copy Paste: [[2312.11970]] Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives(http://arxiv.org/abs/2312.11970)
Summary:
Agent-based modeling and simulation has evolved as a powerful tool for modeling complex systems, offering insights into emergent behaviors and interactions among diverse agents. Integrating large language models into agent-based modeling and simulation presents a promising avenue for enhancing simulation capabilities. This paper surveys the landscape of utilizing large language models in agent-based modeling and simulation, examining their challenges and promising future directions. In this survey, since this is an interdisciplinary field, we first introduce the background of agent-based modeling and simulation and large language model-empowered agents. We then discuss the motivation for applying large language models to agent-based simulation and systematically analyze the challenges in environment perception, human alignment, action generation, and evaluation. Most importantly, we provide a comprehensive overview of the recent works of large language model-empowered agent-based modeling and simulation in multiple scenarios, which can be divided into four domains: cyber, physical, social, and hybrid, covering simulation of both real-world and virtual environments. Finally, since this area is new and quickly evolving, we discuss the open problems and promising future directions.

Title: Fluctuation-based Adaptive Structured Pruning for Large Language Models. (arXiv:2312.11983v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11983
Code URL: https://github.com/casia-iva-lab/flap
Copy Paste: [[2312.11983]] Fluctuation-based Adaptive Structured Pruning for Large Language Models(http://arxiv.org/abs/2312.11983)
Summary:
Network Pruning is a promising way to address the huge computing resource demands of the deployment and inference of Large Language Models (LLMs). Retraining-free is important for LLMs' pruning methods. However, almost all of the existing retraining-free pruning approaches for LLMs focus on unstructured pruning, which requires specific hardware support for acceleration. In this paper, we propose a novel retraining-free structured pruning framework for LLMs, named FLAP (FLuctuation-based Adaptive Structured Pruning). It is hardware-friendly by effectively reducing storage and enhancing inference speed. For effective structured pruning of LLMs, we highlight three critical elements that demand the utmost attention: formulating structured importance metrics, adaptively searching the global compressed model, and implementing compensation mechanisms to mitigate performance loss. First, FLAP determines whether the output feature map is easily recoverable when a column of weight is removed, based on the fluctuation pruning metric. Then it standardizes the importance scores to adaptively determine the global compressed model structure. At last, FLAP adds additional bias terms to recover the output feature maps using the baseline values. We thoroughly evaluate our approach on a variety of language benchmarks. Without any retraining, our method significantly outperforms the state-of-the-art methods, including LLM-Pruner and the extension of Wanda in structured pruning. The code is released at https://github.com/CASIA-IVA-Lab/FLAP.

Title: Active Preference Inference using Language Models and Probabilistic Reasoning. (arXiv:2312.12009v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12009
Code URL: null
Copy Paste: [[2312.12009]] Active Preference Inference using Language Models and Probabilistic Reasoning(http://arxiv.org/abs/2312.12009)
Summary:
Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.

Title: Synergistic Anchored Contrastive Pre-training for Few-Shot Relation Extraction. (arXiv:2312.12021v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12021
Code URL: https://github.com/aone-nlp/fsre-sacon
Copy Paste: [[2312.12021]] Synergistic Anchored Contrastive Pre-training for Few-Shot Relation Extraction(http://arxiv.org/abs/2312.12021)
Summary:
Few-shot Relation Extraction (FSRE) aims to extract relational facts from a sparse set of labeled corpora. Recent studies have shown promising results in FSRE by employing Pre-trained Language Models (PLMs) within the framework of supervised contrastive learning, which considers both instances and label facts. However, how to effectively harness massive instance-label pairs to encompass the learned representation with semantic richness in this learning paradigm is not fully explored. To address this gap, we introduce a novel synergistic anchored contrastive pre-training framework. This framework is motivated by the insight that the diverse viewpoints conveyed through instance-label pairs capture incomplete yet complementary intrinsic textual semantics. Specifically, our framework involves a symmetrical contrastive objective that encompasses both sentence-anchored and label-anchored contrastive losses. By combining these two losses, the model establishes a robust and uniform representation space. This space effectively captures the reciprocal alignment of feature distributions among instances and relational facts, simultaneously enhancing the maximization of mutual information across diverse perspectives within the same relation. Experimental results demonstrate that our framework achieves significant performance enhancements compared to baseline models in downstream FSRE tasks. Furthermore, our approach exhibits superior adaptability to handle the challenges of domain shift and zero-shot relation extraction. Our code is available online at https://github.com/AONE-NLP/FSRE-SaCon.

Title: Zero-Shot Fact-Checking with Semantic Triples and Knowledge Graphs. (arXiv:2312.11785v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11785
Code URL: null
Copy Paste: [[2312.11785]] Zero-Shot Fact-Checking with Semantic Triples and Knowledge Graphs(http://arxiv.org/abs/2312.11785)
Summary:
Despite progress in automated fact-checking, most systems require a significant amount of labeled training data, which is expensive. In this paper, we propose a novel zero-shot method, which instead of operating directly on the claim and evidence sentences, decomposes them into semantic triples augmented using external knowledge graphs, and uses large language models trained for natural language inference. This allows it to generalize to adversarial datasets and domains that supervised models require specific training data for. Our empirical results show that our approach outperforms previous zero-shot approaches on FEVER, FEVER-Symmetric, FEVER 2.0, and Climate-FEVER, while being comparable or better than supervised models on the adversarial and the out-of-domain datasets.

Title: Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health. (arXiv:2312.11803v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11803
Code URL: https://github.com/maria-antoniak/maternal-health-principles
Copy Paste: [[2312.11803]] Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health(http://arxiv.org/abs/2312.11803)
Summary:
Objective: An ethical framework for the use of large language models (LLMs) is urgently needed to shape how natural language processing (NLP) tools are used for healthcare applications. Drawing directly from the voices of those most affected, we propose a set of guiding principles for the use of NLP in healthcare, with examples based on applications in maternal health.

Materials and Methods: We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally surveyed 30 healthcare workers and 30 birthing people about their values, needs, and perceptions of AI and LLMs. We conducted quantitative and qualitative analyses of the interactive discussions to consolidate our findings into a set of guiding principles.

Results: Using the case study of maternal health, we propose nine principles for ethical use of LLMs, grouped into three categories: (i) contextual significance, (ii) measurements, and (iii) who/what is valued. We describe rationales underlying these principles and provide practical advice.

Discussion: Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Our principles serve as a framework for surfacing key considerations when deploying LLMs in medicine, as well as providing a methodological pattern for other researchers to follow.

Conclusion: This set of principles can serve as a resource to practitioners working on maternal health and other healthcare fields to emphasize the importance of technical nuance, historical context, and inclusive design when developing LLMs for use in clinical settings.

Title: Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction. (arXiv:2312.11890v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11890
Code URL: null
Copy Paste: [[2312.11890]] Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction(http://arxiv.org/abs/2312.11890)
Summary:
This paper presents novel techniques for enhancing the performance of knowledge tracing (KT) models by focusing on the crucial factor of question and concept difficulty level. Despite the acknowledged significance of difficulty, previous KT research has yet to exploit its potential for model optimization and has struggled to predict difficulty from unseen data. To address these problems, we propose a difficulty-centered contrastive learning method for KT models and a Large Language Model (LLM)-based framework for difficulty prediction. These innovative methods seek to improve the performance of KT models and provide accurate difficulty estimates for unseen data. Our ablation study demonstrates the efficacy of these techniques by demonstrating enhanced KT model performance. Nonetheless, the complex relationship between language and difficulty merits further investigation.

Title: External Knowledge Augmented Polyphone Disambiguation Using Large Language Model. (arXiv:2312.11920v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11920
Code URL: null
Copy Paste: [[2312.11920]] External Knowledge Augmented Polyphone Disambiguation Using Large Language Model(http://arxiv.org/abs/2312.11920)
Summary:
One of the key issues in Mandarin Chinese text-to-speech (TTS) systems is polyphone disambiguation when doing grapheme-to-phoneme (G2P) conversion. In this paper, we introduce a novel method to solve the problem as a generation task. Following the trending research of large language models (LLM) and prompt learning, the proposed method consists of three modules. Retrieval module incorporates external knowledge which is a multi-level semantic dictionary of Chinese polyphonic characters to format the sentence into a prompt. Generation module adopts the decoder-only Transformer architecture to induce the target text. Postprocess module corrects the generated text into a valid result if needed. Experimental results show that our method outperforms the existing methods on a public dataset called CPP. We also empirically study the impacts of different templates of the prompt, different sizes of training data, and whether to incorporate external knowledge.

Title: Climate Change from Large Language Models. (arXiv:2312.11985v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11985
Code URL: null
Copy Paste: [[2312.11985]] Climate Change from Large Language Models(http://arxiv.org/abs/2312.11985)
Summary:
Climate change presents significant challenges to the global community, and it is imperative to raise widespread awareness of the climate crisis and educate users about low-carbon living. Artificial intelligence, particularly large language models (LLMs), have emerged as powerful tools in mitigating the climate crisis, leveraging their extensive knowledge, broad user base, and natural language interaction capabilities. However, despite the growing body of research on climate change, there is a lack of comprehensive assessments of climate crisis knowledge within LLMs. This paper aims to resolve this gap by proposing an automatic evaluation framework. We employ a hybrid approach to data acquisition that combines data synthesis and manual collection to compile a diverse set of questions related to the climate crisis. These questions cover various aspects of climate change, including its causes, impacts, mitigation strategies, and adaptation measures. We then evaluate the model knowledge through prompt engineering based on the collected questions and generated answers. We propose a set of comprehensive metrics to evaluate the climate crisis knowledge, incorporating indicators from 10 different perspectives. Experimental results show that our method is effective in evaluating the knowledge of LLMs regarding the climate crisis. We evaluate several state-of-the-art LLMs and find that their knowledge falls short in terms of timeliness.

gpt

Title: Assessing GPT4-V on Structured Reasoning Tasks. (arXiv:2312.11524v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11524
Code URL: null
Copy Paste: [[2312.11524]] Assessing GPT4-V on Structured Reasoning Tasks(http://arxiv.org/abs/2312.11524)
Summary:
Multi-modality promises to unlock further uses for large language models. Recently, the state-of-the-art language model GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chain-of-Thought to multi-modal LLMs, yields significant improvements over the vanilla model. We also present a categorized analysis of scenarios where these models perform well and where they struggle, highlighting challenges associated with coherent multimodal reasoning.

Title: Founder-GPT: Self-play to evaluate the Founder-Idea fit. (arXiv:2312.12037v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12037
Code URL: null
Copy Paste: [[2312.12037]] Founder-GPT: Self-play to evaluate the Founder-Idea fit(http://arxiv.org/abs/2312.12037)
Summary:
This research introduces an innovative evaluation method for the "founder-idea" fit in early-stage startups, utilizing advanced large language model techniques to assess founders' profiles against their startup ideas to enhance decision-making. Embeddings, self-play, tree-of-thought, and critique-based refinement techniques show early promising results that each idea's success patterns are unique and they should be evaluated based on the context of the founder's background.

Title: A Revisit of Fake News Dataset with Augmented Fact-checking by ChatGPT. (arXiv:2312.11870v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11870
Code URL: null
Copy Paste: [[2312.11870]] A Revisit of Fake News Dataset with Augmented Fact-checking by ChatGPT(http://arxiv.org/abs/2312.11870)
Summary:
The proliferation of fake news has emerged as a critical issue in recent years, requiring significant efforts to detect it. However, the existing fake news detection datasets are sourced from human journalists, which are likely to have inherent bias limitations due to the highly subjective nature of this task. In this paper, we revisit the existing fake news dataset verified by human journalists with augmented fact-checking by large language models (ChatGPT), and we name the augmented fake news dataset ChatGPT-FC. We quantitatively analyze the distinctions and resemblances between human journalists and LLM in assessing news subject credibility, news creator credibility, time-sensitive, and political framing. Our findings highlight LLM's potential to serve as a preliminary screening method, offering a promising avenue to mitigate the inherent biases of human journalists and enhance fake news detection.

Title: Can ChatGPT be Your Personal Medical Assistant?. (arXiv:2312.12006v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12006
Code URL: null
Copy Paste: [[2312.12006]] Can ChatGPT be Your Personal Medical Assistant?(http://arxiv.org/abs/2312.12006)
Summary:
The advanced large language model (LLM) ChatGPT has shown its potential in different domains and remains unbeaten due to its characteristics compared to other LLMs. This study aims to evaluate the potential of using a fine-tuned ChatGPT model as a personal medical assistant in the Arabic language. To do so, this study uses publicly available online questions and answering datasets in Arabic language. There are almost 430K questions and answers for 20 disease-specific categories. GPT-3.5-turbo model was fine-tuned with a portion of this dataset. The performance of this fine-tuned model was evaluated through automated and human evaluation. The automated evaluations include perplexity, coherence, similarity, and token count. Native Arabic speakers with medical knowledge evaluated the generated text by calculating relevance, accuracy, precision, logic, and originality. The overall result shows that ChatGPT has a bright future in medical assistance.

llm

Title: Variety and Quality over Quantity: Towards Versatile Instruction Curation. (arXiv:2312.11508v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11508
Code URL: null
Copy Paste: [[2312.11508]] Variety and Quality over Quantity: Towards Versatile Instruction Curation(http://arxiv.org/abs/2312.11508)
Summary:
Instruction fine-tuning, involving the refinement of pre-trained LLMs using datasets accompanied by natural instructions, is a powerful approach. However, its effectiveness is hindered by the redundancy and deficiencies in LLM-generated instruction datasets. In this paper, we introduce a highly effective and versatile paradigm for selecting diverse and high-quality instruction-following data from fine-tuning datasets. We first employ the dataset enhancement and expansion to augment the dataset with more diverse and high-quality data, then we apply variety compression and quality compression sequentially to curate the desired dataset. Our experimental results showcase that, even with a limited quantity of high-quality instruction data, LLMs consistently maintain robust performance across both natural language understanding tasks and code generation tasks. Notably, they outperform models trained on significantly larger instruction datasets in certain instances.

Title: ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity. (arXiv:2312.11511v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11511
Code URL: null
Copy Paste: [[2312.11511]] ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity(http://arxiv.org/abs/2312.11511)
Summary:
We present ComplexityNet, a streamlined language model designed for assessing task complexity. This model predicts the likelihood of accurate output by various language models, each with different capabilities. Our initial application of ComplexityNet involves the Mostly Basic Python Problems (MBPP) dataset. We pioneered the creation of the first set of labels to define task complexity. ComplexityNet achieved a notable 79% accuracy in determining task complexity, a significant improvement over the 34% accuracy of the original, non fine-tuned model. Furthermore, ComplexityNet effectively reduces computational resource usage by 90% compared to using the highest complexity model, while maintaining a high code generation accuracy of 86.7%. This study demonstrates that fine-tuning smaller models to categorize tasks based on their complexity can lead to a more balanced trade-off between accuracy and efficiency in the use of Large Language Models. Our findings suggest a promising direction for optimizing LLM applications, especially in resource-constrained environments.

Title: KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know. (arXiv:2312.11539v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11539
Code URL: null
Copy Paste: [[2312.11539]] KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know(http://arxiv.org/abs/2312.11539)
Summary:
Current approaches to evaluating large language models (LLMs) with pre-existing Knowledge Graphs (KG) mostly ignore the structure of the KG and make arbitrary choices of which part of the graph to evaluate. In this paper, we introduce KGLens, a method to evaluate LLMs by generating natural language questions from a KG in a structure aware manner so that we can characterize its performance on a more aggregated level. KGLens uses a parameterized KG, where each edge is augmented with a beta distribution that guides how to sample edges from the KG for QA testing. As the evaluation proceeds, different edges of the parameterized KG are sampled and assessed appropriately, converging to a more global picture of the performance of the LLMs on the KG as a whole. In our experiments, we construct three domain-specific KGs for knowledge assessment, comprising over 19,000 edges, 700 relations, and 21,000 entities. The results demonstrate that KGLens can not only assess overall performance but also provide topic, temporal, and relation analyses of LLMs. This showcases the adaptability and customizability of KGLens, emphasizing its ability to focus the evaluation based on specific criteria.

Title: CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare. (arXiv:2312.11541v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11541
Code URL: null
Copy Paste: [[2312.11541]] CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare(http://arxiv.org/abs/2312.11541)
Summary:
In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution to our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, utilizing the power of Contrastive Language Image Pretraining(CLIP) and Large Language Models(LLMs), consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries. Our comprehensive framework harnesses the power of CLIP, a multimodal foundation model, and various general-purpose LLMs, comprising four main modules: the medical disorder identification module, the relevant context generation module, the context filtration module for distilling relevant medical concepts and knowledge, and finally, a general-purpose LLM to generate visually aware medical question summaries. Leveraging our MMQS dataset, we showcase how visual cues from images enhance the generation of medically nuanced summaries. This multimodal approach not only enhances the decision-making process in healthcare but also fosters a more nuanced understanding of patient queries, laying the groundwork for future research in personalized and responsive medical care

Title: Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity. (arXiv:2312.11779v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11779
Code URL: null
Copy Paste: [[2312.11779]] Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity(http://arxiv.org/abs/2312.11779)
Summary:
A large body of NLP research has documented the ways gender biases manifest and amplify within large language models (LLMs), though this research has predominantly operated within a gender binary-centric context. A growing body of work has identified the harmful limitations of this gender-exclusive framing; many LLMs cannot correctly and consistently refer to persons outside the gender binary, especially if they use neopronouns. While data scarcity has been identified as a possible culprit, the precise mechanisms through which it influences LLM misgendering remain underexplored. Our work addresses this gap by studying data scarcity's role in subword tokenization and, consequently, the formation of LLM word representations. We uncover how the Byte-Pair Encoding (BPE) tokenizer, a backbone for many popular LLMs, contributes to neopronoun misgendering through out-of-vocabulary behavior. We introduce pronoun tokenization parity (PTP), a novel approach to reduce LLM neopronoun misgendering by preserving a token's functional structure. We evaluate PTP's efficacy using pronoun consistency-based metrics and a novel syntax-based metric. Through several controlled experiments, finetuning LLMs with PTP improves neopronoun consistency from 14.5% to 58.4%, highlighting the significant role tokenization plays in LLM pronoun consistency.

Title: Neural Network Approximation for Pessimistic Offline Reinforcement Learning. (arXiv:2312.11863v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11863
Code URL: null
Copy Paste: [[2312.11863]] Neural Network Approximation for Pessimistic Offline Reinforcement Learning(http://arxiv.org/abs/2312.11863)
Summary:
Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling of deep learning and Bellman residuals makes this problem challenging, in addition to the difficulty of data dependence. In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with $\mathcal{C}$-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and the neural network approximation theory for the H\"{o}lder class to achieve this. We also develop methods to bound the Bellman estimation error caused by function approximation with empirical Bellman constraint perturbations. Additionally, we present a result that lessens the curse of dimensionality using data with low intrinsic dimensionality and function classes with low complexity. Our estimation provides valuable insights into the development of deep offline RL and guidance for algorithm model design.

long context

lora

Title: Exploration-Exploitation Model of Moth-Inspired Olfactory Navigation. (arXiv:2312.11492v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11492
Code URL: null
Copy Paste: [[2312.11492]] Exploration-Exploitation Model of Moth-Inspired Olfactory Navigation(http://arxiv.org/abs/2312.11492)
Summary:
Navigation of male moths toward females during the mating search offers a unique perspective on the exploration-exploitation (EE) model in decision-making. This study uses the EE model to explain male moth pheromone-driven flight paths. We leverage wind tunnel measurements and 3D tracking using infrared cameras to gain insights into male moth behavior. During the experiments in the wind tunnel, we add disturbance to the airflow and analyze the effect of increased fluctuations on moth flights in the context of the proposed EE model. We separate the exploration and exploitation phases by applying a genetic algorithm to the dataset of moth 3D trajectories. First, we demonstrate that the exploration-to-exploitation rate (EER) increases with distance from the source of the female pheromone, which can be explained in the context of the EE model. Furthermore, our findings reveal a compelling relationship between EER and increased flow fluctuations near the pheromone source. Using the open-source pheromone plume simulation and our moth-inspired navigation model, we explain why male moths exhibit an enhanced EER as turbulence levels increase, emphasizing the agent's adaptation to dynamically changing environments. This research extends our understanding of optimal navigation strategies based on general biological EE models and supports the development of advanced, theoretically supported bio-inspired navigation algorithms. We provide important insights into the potential of bio-inspired navigation models for addressing complex decision-making challenges.

Title: A Survey of Reasoning with Foundation Models. (arXiv:2312.11562v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11562
Code URL: https://github.com/reasoning-survey/awesome-reasoning-foundation-models
Copy Paste: [[2312.11562]] A Survey of Reasoning with Foundation Models(http://arxiv.org/abs/2312.11562)
Summary:
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.

Title: MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA. (arXiv:2312.11795v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11795
Code URL: https://github.com/bruthyu/melo
Copy Paste: [[2312.11795]] MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA(http://arxiv.org/abs/2312.11795)
Summary:
Large language models (LLMs) have shown great success in various Natural Language Processing (NLP) tasks, whist they still need updates after deployment to fix errors or keep pace with the changing knowledge in the world. Researchers formulate such problem as Model Editing and have developed various editors focusing on different axes of editing properties. However, current editors can hardly support all properties and rely on heavy computational resources. In this paper, we propose a plug-in Model Editing method based on neuron-indexed dynamic LoRA (MELO), which alters the behavior of language models by dynamically activating certain LoRA blocks according to the index built in an inner vector database. Our method satisfies various editing properties with high efficiency and can be easily integrated into multiple LLM backbones. Experimental results show that our proposed MELO achieves state-of-the-art editing performance on three sequential editing tasks (document classification, question answering and hallucination correction), while requires the least trainable parameters and computational cost.

hallucination

prompt

code

Title: Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation. (arXiv:2312.11532v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11532
Code URL: null
Copy Paste: [[2312.11532]] Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation(http://arxiv.org/abs/2312.11532)
Summary:
This paper introduces a novel approach for topic modeling utilizing latent codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely encapsulating the rich information of the pre-trained embeddings such as the pre-trained language model. From the novel interpretation of the latent codebooks and embeddings as conceptual bag-of-words, we propose a new generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates the original documents related to the respective latent codebook. The TVQ-VAE can visualize the topics with various generative distributions including the traditional BoW distribution and the autoregressive image generation. Our experimental results on document analysis and image generation demonstrate that TVQ-VAE effectively captures the topic context which reveals the underlying structures of the dataset and supports flexible forms of document generation. Official implementation of the proposed TVQ-VAE is available at https://github.com/clovaai/TVQ-VAE.

Title: Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation. (arXiv:2312.11554v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11554
Code URL: null
Copy Paste: [[2312.11554]] Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation(http://arxiv.org/abs/2312.11554)
Summary:
Understanding and accurately explaining compatibility relationships between fashion items is a challenging problem in the burgeoning domain of AI-driven outfit recommendations. Present models, while making strides in this area, still occasionally fall short, offering explanations that can be elementary and repetitive. This work aims to address these shortcomings by introducing the Pair Fashion Explanation (PFE) dataset, a unique resource that has been curated to illuminate these compatibility relationships. Furthermore, we propose an innovative two-stage pipeline model that leverages this dataset. This fine-tuning allows the model to generate explanations that convey the compatibility relationships between items. Our experiments showcase the model's potential in crafting descriptions that are knowledgeable, aligned with ground-truth matching correlations, and that produce understandable and informative descriptions, as assessed by both automatic metrics and human evaluation. Our code and data are released at https://github.com/wangyu-ustc/PairFashionExplanation

Title: Bridging Logic and Learning: A Neural-Symbolic Approach for Enhanced Reasoning in Neural Models (ASPER). (arXiv:2312.11651v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11651
Code URL: https://github.com/fadi2200/aspen
Copy Paste: [[2312.11651]] Bridging Logic and Learning: A Neural-Symbolic Approach for Enhanced Reasoning in Neural Models (ASPER)(http://arxiv.org/abs/2312.11651)
Summary:
Neural-symbolic learning, an intersection of neural networks and symbolic reasoning, aims to blend neural networks' learning capabilities with symbolic AI's interpretability and reasoning. This paper introduces an approach designed to improve the performance of neural models in learning reasoning tasks. It achieves this by integrating Answer Set Programming (ASP) solvers and domain-specific expertise, which is an approach that diverges from traditional complex neural-symbolic models. In this paper, a shallow artificial neural network (ANN) is specifically trained to solve Sudoku puzzles with minimal training data. The model has a unique loss function that integrates losses calculated using the ASP solver outputs, effectively enhancing its training efficiency. Most notably, the model shows a significant improvement in solving Sudoku puzzles using only 12 puzzles for training and testing without hyperparameter tuning. This advancement indicates that the model's enhanced reasoning capabilities have practical applications, extending well beyond Sudoku puzzles to potentially include a variety of other domains. The code can be found on GitHub: https://github.com/Fadi2200/ASPEN.

Title: Time-Transformer: Integrating Local and Global Features for Better Time Series Generation. (arXiv:2312.11714v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11714
Code URL: null
Copy Paste: [[2312.11714]] Time-Transformer: Integrating Local and Global Features for Better Time Series Generation(http://arxiv.org/abs/2312.11714)
Summary:
Generating time series data is a promising approach to address data deficiency problems. However, it is also challenging due to the complex temporal properties of time series data, including local correlations as well as global dependencies. Most existing generative models have failed to effectively learn both the local and global properties of time series data. To address this open problem, we propose a novel time series generative model named 'Time-Transformer AAE', which consists of an adversarial autoencoder (AAE) and a newly designed architecture named 'Time-Transformer' within the decoder. The Time-Transformer first simultaneously learns local and global features in a layer-wise parallel design, combining the abilities of Temporal Convolutional Networks and Transformer in extracting local features and global dependencies respectively. Second, a bidirectional cross attention is proposed to provide complementary guidance across the two branches and achieve proper fusion between local and global features. Experimental results demonstrate that our model can outperform existing state-of-the-art models in 5 out of 6 datasets, specifically on those with data containing both global and local properties. Furthermore, we highlight our model's advantage on handling this kind of data via an artificial dataset. Finally, we show our model's ability to address a real-world problem: data augmentation to support learning with small datasets and imbalanced datasets.

Title: Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models. (arXiv:2312.11720v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11720
Code URL: https://github.com/paulopirozelli/logicalreasoning
Copy Paste: [[2312.11720]] Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models(http://arxiv.org/abs/2312.11720)
Summary:
Logical reasoning is central to complex human activities, such as thinking, debating, and planning; it is also a central component of many AI systems as well. In this paper, we investigate the extent to which encoder-only transformer language models (LMs) can reason according to logical rules. We ask whether those LMs can deduce theorems in propositional calculus and first-order logic; if their relative success in these problems reflects general logical capabilities; and which layers contribute the most to the task. First, we show for several encoder-only LMs that they can be trained, to a reasonable degree, to determine logical validity on various datasets. Next, by cross-probing fine-tuned models on these datasets, we show that LMs have difficulty in transferring their putative logical reasoning ability, which suggests that they may have learned dataset-specific features, instead of a general capability. Finally, we conduct a layerwise probing experiment, which shows that the hypothesis classification task is mostly solved through higher layers.

Title: Poker Hand History File Format Specification. (arXiv:2312.11753v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11753
Code URL: https://github.com/uoftcprg/pokerkit
Copy Paste: [[2312.11753]] Poker Hand History File Format Specification(http://arxiv.org/abs/2312.11753)
Summary:
This paper introduces the Poker Hand History (PHH) file format, designed to standardize the recording of poker hands across different game variants. Despite poker's widespread popularity in the mainstream culture as a mind sport and its prominence in the field of artificial intelligence (AI) research as a benchmark for imperfect information AI agents, it lacks a consistent format that humans can use to document poker hands across different variants that can also easily be parsed by machines. To address this gap in the literature, we propose the PHH format which provides a concise human-readable machine-friendly representation of hand history that comprehensively captures various details of the hand, ranging from initial game parameters and actions to contextual parameters including but not limited to the venue, players, and time control information. In the supplementary, we provide over 10,000 hands covering 11 different variants in the PHH format. Building on our previous work on PokerKit, a premier poker hand simulation tool, we demonstrate the usages of our open-source Python implementation of the PHH parser. The source code of the parser is available on GitHub: https://github.com/uoftcprg/pokerkit

Title: A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking. (arXiv:2312.11816v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.11816
Code URL: https://github.com/season1blue/dwe
Copy Paste: [[2312.11816]] A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking(http://arxiv.org/abs/2312.11816)
Summary:
Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia, which plays a key role in many applications. However, existing methods suffer from shortcomings, including modality impurity such as noise in raw image and ambiguous textual entity representation, which puts obstacles to MEL. We formulate multimodal entity linking as a neural text matching problem where each multimodal information (text and image) is treated as a query, and the model learns the mapping from each query to the relevant entity from candidate entities. This paper introduces a dual-way enhanced (DWE) framework for MEL: (1) our model refines queries with multimodal data and addresses semantic gaps using cross-modal enhancers between text and image information. Besides, DWE innovatively leverages fine-grained image attributes, including facial characteristic and scene feature, to enhance and refine visual features. (2)By using Wikipedia descriptions, DWE enriches entity semantics and obtains more comprehensive textual representation, which reduces between textual representation and the entities in KG. Extensive experiments on three public benchmarks demonstrate that our method achieves state-of-the-art (SOTA) performance, indicating the superiority of our model. The code is released on https://github.com/season1blue/DWE

Title: Relation-Aware Question Answering for Heterogeneous Knowledge Graphs. (arXiv:2312.11922v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11922
Code URL: https://github.com/yanmenxue/rah-kbqa
Copy Paste: [[2312.11922]] Relation-Aware Question Answering for Heterogeneous Knowledge Graphs(http://arxiv.org/abs/2312.11922)
Summary:
Multi-hop Knowledge Base Question Answering(KBQA) aims to find the answer entity in a knowledge graph (KG), which requires multiple steps of reasoning. Existing retrieval-based approaches solve this task by concentrating on the specific relation at different hops and predicting the intermediate entity within the reasoning path. During the reasoning process of these methods, the representation of relations are fixed but the initial relation representation may not be optimal. We claim they fail to utilize information from head-tail entities and the semantic connection between relations to enhance the current relation representation, which undermines the ability to capture information of relations in KGs. To address this issue, we construct a \textbf{dual relation graph} where each node denotes a relation in the original KG (\textbf{primal entity graph}) and edges are constructed between relations sharing same head or tail entities. Then we iteratively do primal entity graph reasoning, dual relation graph information propagation, and interaction between these two graphs. In this way, the interaction between entity and relation is enhanced, and we derive better entity and relation representations. Experiments on two public datasets, WebQSP and CWQ, show that our approach achieves a significant performance gain over the prior state-of-the-art. Our code is available on \url{https://github.com/yanmenxue/RAH-KBQA}.

Title: Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting. (arXiv:2312.11945v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11945
Code URL: null
Copy Paste: [[2312.11945]] Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting(http://arxiv.org/abs/2312.11945)
Summary:
Recent approaches in Incomplete Utterance Rewriting (IUR) fail to capture the source of important words, which is crucial to edit the incomplete utterance, and introduce words from irrelevant utterances. We propose a novel and effective multi-task information interaction framework including context selection, edit matrix construction, and relevance merging to capture the multi-granularity of semantic information. Benefiting from fetching the relevant utterance and figuring out the important words, our approach outperforms existing state-of-the-art models on two benchmark datasets Restoration-200K and CANAND in this field. Code will be provided on \url{https://github.com/yanmenxue/QR}.

Title: Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling. (arXiv:2312.11947v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11947
Code URL: null
Copy Paste: [[2312.11947]] Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling(http://arxiv.org/abs/2312.11947)
Summary:
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.

Title: Coreference Graph Guidance for Mind-Map Generation. (arXiv:2312.11997v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11997
Code URL: null
Copy Paste: [[2312.11997]] Coreference Graph Guidance for Mind-Map Generation(http://arxiv.org/abs/2312.11997)
Summary:
Mind-map generation aims to process a document into a hierarchical structure to show its central idea and branches. Such a manner is more conducive to understanding the logic and semantics of the document than plain text. Recently, a state-of-the-art method encodes the sentences of a document sequentially and converts them to a relation graph via sequence-to-graph. Though this method is efficient to generate mind-maps in parallel, its mechanism focuses more on sequential features while hardly capturing structural information. Moreover, it's difficult to model long-range semantic relations. In this work, we propose a coreference-guided mind-map generation network (CMGN) to incorporate external structure knowledge. Specifically, we construct a coreference graph based on the coreference semantic relationship to introduce the graph structure information. Then we employ a coreference graph encoder to mine the potential governing relations between sentences. In order to exclude noise and better utilize the information of the coreference graph, we adopt a graph enhancement module in a contrastive learning manner. Experimental results demonstrate that our model outperforms all the existing methods. The case study further proves that our model can more accurately and concisely reveal the structure and semantics of a document. Code and data are available at https://github.com/Cyno2232/CMGN.

Title: Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions. (arXiv:2312.11735v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11735
Code URL: null
Copy Paste: [[2312.11735]] Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions(http://arxiv.org/abs/2312.11735)
Summary:
In many real-world applications, from robotics to pedestrian trajectory prediction, there is a need to predict multiple real-valued outputs to represent several potential scenarios. Current deep learning techniques to address multiple-output problems are based on two main methodologies: (1) mixture density networks, which suffer from poor stability at high dimensions, or (2) multiple choice learning (MCL), an approach that uses $M$ single-output functions, each only producing a point estimate hypothesis. This paper presents a Mixture of Multiple-Output functions (MoM) approach using a novel variant of dropout, Multiple Hypothesis Dropout. Unlike traditional MCL-based approaches, each multiple-output function not only estimates the mean but also the variance for its hypothesis. This is achieved through a novel stochastic winner-take-all loss which allows each multiple-output function to estimate variance through the spread of its subnetwork predictions. Experiments on supervised learning problems illustrate that our approach outperforms existing solutions for reconstructing multimodal output distributions. Additional studies on unsupervised learning problems show that estimating the parameters of latent posterior distributions within a discrete autoencoder significantly improves codebook efficiency, sample quality, precision and recall.

Title: Big Learning Expectation Maximization. (arXiv:2312.11926v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11926
Code URL: https://github.com/yulaicong/big-learning-expectation-maximization
Copy Paste: [[2312.11926]] Big Learning Expectation Maximization(http://arxiv.org/abs/2312.11926)
Summary:
Mixture models serve as one fundamental tool with versatile applications. However, their training techniques, like the popular Expectation Maximization (EM) algorithm, are notoriously sensitive to parameter initialization and often suffer from bad local optima that could be arbitrarily worse than the optimal. To address the long-lasting bad-local-optima challenge, we draw inspiration from the recent ground-breaking foundation models and propose to leverage their underlying big learning principle to upgrade the EM. Specifically, we present the Big Learning EM (BigLearn-EM), an EM upgrade that simultaneously performs joint, marginal, and orthogonally transformed marginal matchings between data and model distributions. Through simulated experiments, we empirically show that the BigLearn-EM is capable of delivering the optimal with high probability; comparisons on benchmark clustering datasets further demonstrate its effectiveness and advantages over existing techniques. The code is available at https://github.com/YulaiCong/Big-Learning-Expectation-Maximization.

chat

Title: TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with Decentralized Natural Language Understanding Models. (arXiv:2312.11828v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.11828
Code URL: null
Copy Paste: [[2312.11828]] TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with Decentralized Natural Language Understanding Models(http://arxiv.org/abs/2312.11828)
Summary:
Chatbots have become one of the main pathways for the delivery of business automation tools. Multi-agent systems offer a framework for designing chatbots at scale, making it easier to support complex conversations that span across multiple domains as well as enabling developers to maintain and expand their capabilities incrementally over time. However, multi-agent systems complicate the natural language understanding (NLU) of user intents, especially when they rely on decentralized NLU models: some utterances (termed single intent) may invoke a single agent while others (termed multi-intent) may explicitly invoke multiple agents. Without correctly parsing multi-intent inputs, decentralized NLU approaches will not achieve high prediction accuracy. In this paper, we propose an efficient parsing and orchestration pipeline algorithm to service multi-intent utterances from the user in the context of a multi-agent system. Our proposed approach achieved comparable performance to competitive deep learning models on three different datasets while being up to 48 times faster.

retrieval augmented generation

rag

Title: Probabilistic Offline Policy Ranking with Approximate Bayesian Computation. (arXiv:2312.11551v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11551
Code URL: null
Copy Paste: [[2312.11551]] Probabilistic Offline Policy Ranking with Approximate Bayesian Computation(http://arxiv.org/abs/2312.11551)
Summary:
In practice, it is essential to compare and rank candidate policies offline before real-world deployment for safety and reliability. Prior work seeks to solve this offline policy ranking (OPR) problem through value-based methods, such as Off-policy evaluation (OPE). However, they fail to analyze special cases performance (e.g., worst or best cases), due to the lack of holistic characterization of policies performance. It is even more difficult to estimate precise policy values when the reward is not fully accessible under sparse settings. In this paper, we present Probabilistic Offline Policy Ranking (POPR), a framework to address OPR problems by leveraging expert data to characterize the probability of a candidate policy behaving like experts, and approximating its entire performance posterior distribution to help with ranking. POPR does not rely on value estimation, and the derived performance posterior can be used to distinguish candidates in worst, best, and average-cases. To estimate the posterior, we propose POPR-EABC, an Energy-based Approximate Bayesian Computation (ABC) method conducting likelihood-free inference. POPR-EABC reduces the heuristic nature of ABC by a smooth energy function, and improves the sampling efficiency by a pseudo-likelihood. We empirically demonstrate that POPR-EABC is adequate for evaluating policies in both discrete and continuous action spaces across various experiment environments, and facilitates probabilistic comparisons of candidate policies before deployment.

Title: COPD-FlowNet: Elevating Non-invasive COPD Diagnosis with CFD Simulations. (arXiv:2312.11561v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11561
Code URL: null
Copy Paste: [[2312.11561]] COPD-FlowNet: Elevating Non-invasive COPD Diagnosis with CFD Simulations(http://arxiv.org/abs/2312.11561)
Summary:
Chronic Obstructive Pulmonary Disorder (COPD) is a prevalent respiratory disease that significantly impacts the quality of life of affected individuals. This paper presents COPDFlowNet, a novel deep-learning framework that leverages a custom Generative Adversarial Network (GAN) to generate synthetic Computational Fluid Dynamics (CFD) velocity flow field images specific to the trachea of COPD patients. These synthetic images serve as a valuable resource for data augmentation and model training. Additionally, COPDFlowNet incorporates a custom Convolutional Neural Network (CNN) architecture to predict the location of the obstruction site.

Title: Estimation of individual causal effects in network setup for multiple treatments. (arXiv:2312.11573v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11573
Code URL: null
Copy Paste: [[2312.11573]] Estimation of individual causal effects in network setup for multiple treatments(http://arxiv.org/abs/2312.11573)
Summary:
We study the problem of estimation of Individual Treatment Effects (ITE) in the context of multiple treatments and networked observational data. Leveraging the network information, we aim to utilize hidden confounders that may not be directly accessible in the observed data, thereby enhancing the practical applicability of the strong ignorability assumption. To achieve this, we first employ Graph Convolutional Networks (GCN) to learn a shared representation of the confounders. Then, our approach utilizes separate neural networks to infer potential outcomes for each treatment. We design a loss function as a weighted combination of two components: representation loss and Mean Squared Error (MSE) loss on the factual outcomes. To measure the representation loss, we extend existing metrics such as Wasserstein and Maximum Mean Discrepancy (MMD) from the binary treatment setting to the multiple treatments scenario. To validate the effectiveness of our proposed methodology, we conduct a series of experiments on the benchmark datasets such as BlogCatalog and Flickr. The experimental results consistently demonstrate the superior performance of our models when compared to baseline methods.

Title: Shapley-PC: Constraint-based Causal Structure Learning with Shapley Values. (arXiv:2312.11582v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11582
Code URL: https://github.com/briziorusso/shapleypc
Copy Paste: [[2312.11582]] Shapley-PC: Constraint-based Causal Structure Learning with Shapley Values(http://arxiv.org/abs/2312.11582)
Summary:
Causal Structure Learning (CSL), amounting to extracting causal relations among the variables in a dataset, is widely perceived as an important step towards robust and transparent models. Constraint-based CSL leverages conditional independence tests to perform causal discovery. We propose Shapley-PC, a novel method to improve constraint-based CSL algorithms by using Shapley values over the possible conditioning sets to decide which variables are responsible for the observed conditional (in)dependences. We prove soundness and asymptotic consistency and demonstrate that it can outperform state-of-the-art constraint-based, search-based and functional causal model-based methods, according to standard metrics in CSL.

Title: Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants. (arXiv:2312.11934v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11934
Code URL: null
Copy Paste: [[2312.11934]] Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants(http://arxiv.org/abs/2312.11934)
Summary:
Causal discovery with latent variables is a crucial but challenging task. Despite the emergence of numerous methods aimed at addressing this challenge, they are not fully identified to the structure that two observed variables are influenced by one latent variable and there might be a directed edge in between. Interestingly, we notice that this structure can be identified through the utilization of higher-order cumulants. By leveraging the higher-order cumulants of non-Gaussian data, we provide an analytical solution for estimating the causal coefficients or their ratios. With the estimated (ratios of) causal coefficients, we propose a novel approach to identify the existence of a causal edge between two observed variables subject to latent variable influence. In case when such a causal edge exits, we introduce an asymmetry criterion to determine the causal direction. The experimental results demonstrate the effectiveness of our proposed method.

Title: Time-Series Contrastive Learning against False Negatives and Class Imbalance. (arXiv:2312.11939v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11939
Code URL: null
Copy Paste: [[2312.11939]] Time-Series Contrastive Learning against False Negatives and Class Imbalance(http://arxiv.org/abs/2312.11939)
Summary:
As an exemplary self-supervised approach for representation learning, time-series contrastive learning has exhibited remarkable advancements in contemporary research. While recent contrastive learning strategies have focused on how to construct appropriate positives and negatives, in this study, we conduct theoretical analysis and find they have overlooked the fundamental issues: false negatives and class imbalance inherent in the InfoNCE loss-based framework. Therefore, we introduce a straightforward modification grounded in the SimCLR framework, universally adaptable to models engaged in the instance discrimination task. By constructing instance graphs to facilitate interactive learning among instances, we emulate supervised contrastive learning via the multiple-instances discrimination task, mitigating the harmful impact of false negatives. Moreover, leveraging the graph structure and few-labeled data, we perform semi-supervised consistency classification and enhance the representative ability of minority classes. We compared our method with the most popular time-series contrastive learning methods on four real-world time-series datasets and demonstrated our significant advantages in overall performance.

Title: Fast Decision Boundary based Out-of-Distribution Detector. (arXiv:2312.11536v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.11536
Code URL: null
Copy Paste: [[2312.11536]] Fast Decision Boundary based Out-of-Distribution Detector(http://arxiv.org/abs/2312.11536)
Summary:
Efficient and effective Out-of-Distribution (OOD) detection is essential for the safe deployment of AI in latency-critical applications. Recently, studies have revealed that detecting OOD based on feature space information can be highly effective. Despite their effectiveness, however, exiting feature space OOD methods may incur non-negligible computational overhead, given their reliance on auxiliary models built from training features. In this paper, we aim to obviate auxiliary models to optimize computational efficiency while leveraging the rich information embedded in the feature space. We investigate from the novel perspective of decision boundaries and propose to detect OOD using the feature distance to decision boundaries. To minimize the cost of measuring the distance, we introduce an efficient closed-form estimation, analytically proven to tightly lower bound the distance. We observe that ID features tend to reside further from the decision boundaries than OOD features. Our observation aligns with the intuition that models tend to be more decisive on ID samples, considering that distance to decision boundaries quantifies model uncertainty. From our understanding, we propose a hyperparameter-free, auxiliary model-free OOD detector. Our OOD detector matches or surpasses the effectiveness of state-of-the-art methods across extensive experiments. Meanwhile, our OOD detector incurs practically negligible overhead in inference latency. Overall, we significantly enhance the efficiency-effectiveness trade-off in OOD detection.

2023-12-20

language model

Title: Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data. (arXiv:2312.11502v1 [cs.CL])

Title: The performance of multiple language models in identifying offensive language on social media. (arXiv:2312.11504v1 [cs.CL])

Title: LLM in a flash: Efficient Large Language Model Inference with Limited Memory. (arXiv:2312.11514v1 [cs.CL])

Title: User Modeling in the Era of Large Language Models: Current Research and Future Directions. (arXiv:2312.11518v1 [cs.CL])

Title: Large Language Models are Complex Table Parsers. (arXiv:2312.11521v1 [cs.CL])

Title: ToViLaG: Your Visual-Language Generative Model is Also An Evildoer. (arXiv:2312.11523v1 [cs.CL])

Title: Evaluating Language-Model Agents on Realistic Autonomous Tasks. (arXiv:2312.11671v1 [cs.CL])

Title: Agent-based Learning of Materials Datasets from Scientific Literature. (arXiv:2312.11690v1 [cs.AI])

Title: Robust Stochastic Graph Generator for Counterfactual Explanations. (arXiv:2312.11747v1 [cs.LG])

Title: Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment. (arXiv:2312.11813v1 [cs.AI])

Title: An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training. (arXiv:2312.11819v1 [cs.LG])

Title: Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach. (arXiv:2312.11865v1 [cs.AI])

Title: Sparse is Enough in Fine-tuning Pre-trained Large Language Model. (arXiv:2312.11875v1 [cs.LG])

Title: ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference. (arXiv:2312.11882v1 [cs.CL])

Title: Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives. (arXiv:2312.11970v1 [cs.AI])

Title: Fluctuation-based Adaptive Structured Pruning for Large Language Models. (arXiv:2312.11983v1 [cs.CL])

Title: Active Preference Inference using Language Models and Probabilistic Reasoning. (arXiv:2312.12009v1 [cs.CL])

Title: Synergistic Anchored Contrastive Pre-training for Few-Shot Relation Extraction. (arXiv:2312.12021v1 [cs.CL])

Title: Zero-Shot Fact-Checking with Semantic Triples and Knowledge Graphs. (arXiv:2312.11785v1 [cs.CL])

Title: Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health. (arXiv:2312.11803v1 [cs.CL])

Title: Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction. (arXiv:2312.11890v1 [cs.CL])

Title: External Knowledge Augmented Polyphone Disambiguation Using Large Language Model. (arXiv:2312.11920v1 [cs.CL])

Title: Climate Change from Large Language Models. (arXiv:2312.11985v1 [cs.CL])

gpt

Title: Assessing GPT4-V on Structured Reasoning Tasks. (arXiv:2312.11524v1 [cs.CL])

Title: Founder-GPT: Self-play to evaluate the Founder-Idea fit. (arXiv:2312.12037v1 [cs.CL])

Title: A Revisit of Fake News Dataset with Augmented Fact-checking by ChatGPT. (arXiv:2312.11870v1 [cs.CL])

Title: Can ChatGPT be Your Personal Medical Assistant?. (arXiv:2312.12006v1 [cs.CL])

llm

Title: Variety and Quality over Quantity: Towards Versatile Instruction Curation. (arXiv:2312.11508v1 [cs.CL])

Title: ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity. (arXiv:2312.11511v1 [cs.CL])

Title: KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know. (arXiv:2312.11539v1 [cs.AI])

Title: CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare. (arXiv:2312.11541v1 [cs.AI])

Title: Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity. (arXiv:2312.11779v1 [cs.CL])

Title: Neural Network Approximation for Pessimistic Offline Reinforcement Learning. (arXiv:2312.11863v1 [cs.LG])

long context

lora

Title: Exploration-Exploitation Model of Moth-Inspired Olfactory Navigation. (arXiv:2312.11492v1 [cs.AI])

Title: A Survey of Reasoning with Foundation Models. (arXiv:2312.11562v1 [cs.AI])

Title: MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA. (arXiv:2312.11795v1 [cs.CL])

hallucination

prompt

code

Title: Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation. (arXiv:2312.11532v1 [cs.CL])

Title: Deciphering Compatibility Relationships with Textual Descriptions via Extraction and Explanation. (arXiv:2312.11554v1 [cs.CL])

Title: Bridging Logic and Learning: A Neural-Symbolic Approach for Enhanced Reasoning in Neural Models (ASPER). (arXiv:2312.11651v1 [cs.AI])

Title: Time-Transformer: Integrating Local and Global Features for Better Time Series Generation. (arXiv:2312.11714v1 [cs.LG])

Title: Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models. (arXiv:2312.11720v1 [cs.CL])

Title: Poker Hand History File Format Specification. (arXiv:2312.11753v1 [cs.AI])

Title: A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking. (arXiv:2312.11816v1 [cs.AI])

Title: Relation-Aware Question Answering for Heterogeneous Knowledge Graphs. (arXiv:2312.11922v1 [cs.CL])

Title: Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting. (arXiv:2312.11945v1 [cs.CL])

Title: Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling. (arXiv:2312.11947v1 [cs.CL])

Title: Coreference Graph Guidance for Mind-Map Generation. (arXiv:2312.11997v1 [cs.CL])

Title: Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions. (arXiv:2312.11735v1 [cs.LG])

Title: Big Learning Expectation Maximization. (arXiv:2312.11926v1 [cs.LG])

chat

Title: TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with Decentralized Natural Language Understanding Models. (arXiv:2312.11828v1 [cs.CL])

retrieval augmented generation

rag

Title: Probabilistic Offline Policy Ranking with Approximate Bayesian Computation. (arXiv:2312.11551v1 [cs.LG])

Title: COPD-FlowNet: Elevating Non-invasive COPD Diagnosis with CFD Simulations. (arXiv:2312.11561v1 [cs.LG])

Title: Estimation of individual causal effects in network setup for multiple treatments. (arXiv:2312.11573v1 [cs.LG])

Title: Shapley-PC: Constraint-based Causal Structure Learning with Shapley Values. (arXiv:2312.11582v1 [cs.LG])

Title: Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants. (arXiv:2312.11934v1 [cs.LG])

Title: Time-Series Contrastive Learning against False Negatives and Class Imbalance. (arXiv:2312.11939v1 [cs.LG])

Title: Fast Decision Boundary based Out-of-Distribution Detector. (arXiv:2312.11536v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought