language model

Title: Hijacking Context in Large Multi-modal Models. (arXiv:2312.07553v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07553
Code URL: null
Copy Paste: [[2312.07553]] Hijacking Context in Large Multi-modal Models(http://arxiv.org/abs/2312.07553)
Summary:
Recently, Large Multi-modal Models (LMMs) have demonstrated their ability to understand the visual contents of images given the instructions regarding the images. Built upon the Large Language Models (LLMs), LMMs also inherit their abilities and characteristics such as in-context learning where a coherent sequence of images and texts are given as the input prompt. However, we identify a new limitation of off-the-shelf LMMs where a small fraction of incoherent images or text descriptions mislead LMMs to only generate biased output about the hijacked context, not the originally intended context. To address this, we propose a pre-filtering method that removes irrelevant contexts via GPT-4V, based on its robustness towards distribution shift within the contexts. We further investigate whether replacing the hijacked visual and textual contexts with the correlated ones via GPT-4V and text-to-image models can help yield coherent responses.

Title: PaperQA: Retrieval-Augmented Generative Agent for Scientific Research. (arXiv:2312.07559v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07559
Code URL: null
Copy Paste: [[2312.07559]] PaperQA: Retrieval-Augmented Generative Agent for Scientific Research(http://arxiv.org/abs/2312.07559)
Summary:
Large Language Models (LLMs) generalize well across language tasks, but suffer from hallucinations and uninterpretability, making it difficult to assess their accuracy without ground-truth. Retrieval-Augmented Generation (RAG) models have been proposed to reduce hallucinations and provide provenance for how an answer was generated. Applying such models to the scientific literature may enable large-scale, systematic processing of scientific knowledge. We present PaperQA, a RAG agent for answering questions over the scientific literature. PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers. Viewing this agent as a question answering model, we find it exceeds performance of existing LLMs and LLM agents on current science QA benchmarks. To push the field closer to how humans perform research on scientific literature, we also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature. Finally, we demonstrate PaperQA's matches expert human researchers on LitQA.

Title: Leveraging Large Language Models to Build and Execute Computational Workflows. (arXiv:2312.07711v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07711
Code URL: null
Copy Paste: [[2312.07711]] Leveraging Large Language Models to Build and Execute Computational Workflows(http://arxiv.org/abs/2312.07711)
Summary:
The recent development of large language models (LLMs) with multi-billion parameters, coupled with the creation of user-friendly application programming interfaces (APIs), has paved the way for automatically generating and executing code in response to straightforward human queries. This paper explores how these emerging capabilities can be harnessed to facilitate complex scientific workflows, eliminating the need for traditional coding methods. We present initial findings from our attempt to integrate Phyloflow with OpenAI's function-calling API, and outline a strategy for developing a comprehensive workflow management system based on these concepts.

Title: Large Human Language Models: A Need and the Challenges. (arXiv:2312.07751v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07751
Code URL: null
Copy Paste: [[2312.07751]] Large Human Language Models: A Need and the Challenges(http://arxiv.org/abs/2312.07751)
Summary:
As research in human-centered NLP advances, there is a growing recognition of the importance of incorporating human and social factors into NLP models. At the same time, our NLP systems have become heavily reliant on LLMs, most of which do not model authors. To build NLP systems that can truly understand human language, we must better integrate human contexts into LLMs. This brings to the fore a range of design considerations and challenges in terms of what human aspects to capture, how to represent them, and what modeling strategies to pursue. To address these, we advocate for three positions toward creating large human language models (LHLMs) using concepts from psychological and behavioral sciences: First, LM training should include the human context. Second, LHLMs should recognize that people are more than their group(s). Third, LHLMs should be able to account for the dynamic and temporally-dependent nature of the human context. We refer to relevant advances and present open challenges that need to be addressed and their possible solutions in realizing these goals.

Title: Large Language Model Enhanced Multi-Agent Systems for 6G Communications. (arXiv:2312.07850v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07850
Code URL: null
Copy Paste: [[2312.07850]] Large Language Model Enhanced Multi-Agent Systems for 6G Communications(http://arxiv.org/abs/2312.07850)
Summary:
The rapid development of the Large Language Model (LLM) presents huge opportunities for 6G communications, e.g., network optimization and management by allowing users to input task requirements to LLMs by nature language. However, directly applying native LLMs in 6G encounters various challenges, such as a lack of private communication data and knowledge, limited logical reasoning, evaluation, and refinement abilities. Integrating LLMs with the capabilities of retrieval, planning, memory, evaluation and reflection in agents can greatly enhance the potential of LLMs for 6G communications. To this end, we propose a multi-agent system with customized communication knowledge and tools for solving communication related tasks using natural language, comprising three components: (1) Multi-agent Data Retrieval (MDR), which employs the condensate and inference agents to refine and summarize communication knowledge from the knowledge base, expanding the knowledge boundaries of LLMs in 6G communications; (2) Multi-agent Collaborative Planning (MCP), which utilizes multiple planning agents to generate feasible solutions for the communication related task from different perspectives based on the retrieved knowledge; (3) Multi-agent Evaluation and Reflecxion (MER), which utilizes the evaluation agent to assess the solutions, and applies the reflexion agent and refinement agent to provide improvement suggestions for current solutions. Finally, we validate the effectiveness of the proposed multi-agent system by designing a semantic communication system, as a case study of 6G communications.

Title: Causality Analysis for Evaluating the Security of Large Language Models. (arXiv:2312.07876v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07876
Code URL: https://github.com/casperllm/casper
Copy Paste: [[2312.07876]] Causality Analysis for Evaluating the Security of Large Language Models(http://arxiv.org/abs/2312.07876)
Summary:
Large Language Models (LLMs) such as GPT and Llama2 are increasingly adopted in many safety-critical applications. Their security is thus essential. Even with considerable efforts spent on reinforcement learning from human feedback (RLHF), recent studies have shown that LLMs are still subject to attacks such as adversarial perturbation and Trojan attacks. Further research is thus needed to evaluate their security and/or understand the lack of it. In this work, we propose a framework for conducting light-weight causality-analysis of LLMs at the token, layer, and neuron level. We applied our framework to open-source LLMs such as Llama2 and Vicuna and had multiple interesting discoveries. Based on a layer-level causality analysis, we show that RLHF has the effect of overfitting a model to harmful prompts. It implies that such security can be easily overcome by `unusual' harmful prompts. As evidence, we propose an adversarial perturbation method that achieves 100\% attack success rate on the red-teaming tasks of the Trojan Detection Competition 2023. Furthermore, we show the existence of one mysterious neuron in both Llama2 and Vicuna that has an unreasonably high causal effect on the output. While we are uncertain on why such a neuron exists, we show that it is possible to conduct a ``Trojan'' attack targeting that particular neuron to completely cripple the LLM, i.e., we can generate transferable suffixes to prompts that frequently make the LLM produce meaningless responses.

Title: PromptBench: A Unified Library for Evaluation of Large Language Models. (arXiv:2312.07910v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07910
Code URL: https://github.com/microsoft/promptbench
Copy Paste: [[2312.07910]] PromptBench: A Unified Library for Evaluation of Large Language Models(http://arxiv.org/abs/2312.07910)
Summary:
The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported.

Title: Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning. (arXiv:2312.08027v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08027
Code URL: null
Copy Paste: [[2312.08027]] Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning(http://arxiv.org/abs/2312.08027)
Summary:
Large language models (LLMs) can be used as accessible and intelligent chatbots by constructing natural language queries and directly inputting the prompt into the large language model. However, different prompt' constructions often lead to uncertainty in the answers and thus make it hard to utilize the specific knowledge of LLMs (like ChatGPT). To alleviate this, we use an interpretable structure to explain the prompt learning principle in LLMs, which certificates that the effectiveness of language models is determined by position changes of the task's related tokens. Therefore, we propose MTPrompt, a multi-dimensional task prompt learning method consisting based on task-related object, summary, and task description information. By automatically building and searching for appropriate prompts, our proposed MTPrompt achieves the best results on few-shot samples setting and five different datasets. In addition, we demonstrate the effectiveness and stability of our method in different experimental settings and ablation experiments. In interaction with large language models, embedding more task-related information into prompts will make it easier to stimulate knowledge embedded in large language models.

Title: High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models. (arXiv:2312.08274v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08274
Code URL: null
Copy Paste: [[2312.08274]] High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models(http://arxiv.org/abs/2312.08274)
Summary:
Objective: To develop a high-throughput biomedical relation extraction system that takes advantage of the large language models' (LLMs) reading comprehension ability and biomedical world knowledge in a scalable and evidential manner. Methods: We formulate the relation extraction task as a simple binary classification problem for large language models such as ChatGPT. Specifically, LLMs make the decision based on the external corpus and its world knowledge, giving the reason for the judgment to factual verification. This method is tailored for semi-structured web articles, wherein we designate the main title as the tail entity and explicitly incorporate it into the context, and the potential head entities are matched based on a biomedical thesaurus. Moreover, lengthy contents are sliced into text chunks, embedded, and retrieved with additional embedding models, ensuring compatibility with the context window size constraints of available open-source LLMs. Results: Using an open-source LLM, we extracted 304315 relation triplets of three distinct relation types from four reputable biomedical websites. To assess the efficacy of the basic pipeline employed for biomedical relation extraction, we curated a benchmark dataset annotated by a medical expert. Evaluation results indicate that the pipeline exhibits performance comparable to that of GPT-4. Case studies further illuminate challenges faced by contemporary LLMs in the context of biomedical relation extraction for semi-structured web articles. Conclusion: The proposed method has demonstrated its effectiveness in leveraging the strengths of LLMs for high-throughput biomedical relation extraction. Its adaptability is evident, as it can be seamlessly extended to diverse semi-structured biomedical websites, facilitating the extraction of various types of biomedical relations with ease.

Title: Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models. (arXiv:2312.08303v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08303
Code URL: null
Copy Paste: [[2312.08303]] Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models(http://arxiv.org/abs/2312.08303)
Summary:
Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (LLMs) have shown promise in toxic content detection due to their superior zero-shot and few-shot in-context learning ability as well as broad transferability on ML tasks. However, efficiently designing prompts for LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder their deployments in production. To address these challenges, in this work, we propose BD-LLM, a novel and efficient approach to Bootstrapping and Distilling LLMs for toxic content detection. Specifically, we design a novel prompting method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs' detection performance and extract high-quality rationales. DToT can automatically select more fine-grained context to re-prompt LLMs when their responses lack confidence. Additionally, we use the rationales extracted via DToT to fine-tune student LMs. Our experimental results on various datasets demonstrate that DToT can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs fine-tuned with rationales extracted via DToT outperform baselines on all datasets with up to 16.9\% accuracy improvement, while being more than 60x smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned with rationales exhibit better cross-dataset transferability.

Title: An Invitation to Deep Reinforcement Learning. (arXiv:2312.08365v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08365
Code URL: null
Copy Paste: [[2312.08365]] An Invitation to Deep Reinforcement Learning(http://arxiv.org/abs/2312.08365)
Summary:
Training a deep neural network to maximize a target objective has become the standard recipe for successful machine learning over the last decade. These networks can be optimized with supervised learning, if the target objective is differentiable. For many interesting problems, this is however not the case. Common objectives like intersection over union (IoU), bilingual evaluation understudy (BLEU) score or rewards cannot be optimized with supervised learning. A common workaround is to define differentiable surrogate losses, leading to suboptimal solutions with respect to the actual objective. Reinforcement learning (RL) has emerged as a promising alternative for optimizing deep neural networks to maximize non-differentiable objectives in recent years. Examples include aligning large language models via human feedback, code generation, object detection or control problems. This makes RL techniques relevant to the larger machine learning audience. The subject is, however, time intensive to approach due to the large range of methods, as well as the often very theoretical presentation. In this introduction, we take an alternative approach, different from classic reinforcement learning textbooks. Rather than focusing on tabular problems, we introduce reinforcement learning as a generalization of supervised learning, which we first apply to non-differentiable objectives and later to temporal problems. Assuming only basic knowledge of supervised learning, the reader will be able to understand state-of-the-art deep RL algorithms like proximal policy optimization (PPO) after reading this tutorial.

Title: Language Model Alignment with Elastic Reset. (arXiv:2312.07551v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07551
Code URL: https://github.com/mnoukhov/elastic-reset
Copy Paste: [[2312.07551]] Language Model Alignment with Elastic Reset(http://arxiv.org/abs/2312.07551)
Summary:
Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly after resets and achieves higher reward with less drift in the same number of steps. We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLaMA-7B. Code available at github.com/mnoukhov/elastic-reset.

Title: Large Language Models for Intent-Driven Session Recommendations. (arXiv:2312.07552v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07552
Code URL: https://github.com/llm4sr/po4isr
Copy Paste: [[2312.07552]] Large Language Models for Intent-Driven Session Recommendations(http://arxiv.org/abs/2312.07552)
Summary:
Intent-aware session recommendation (ISR) is pivotal in discerning user intents within sessions for precise predictions. Traditional approaches, however, face limitations due to their presumption of a uniform number of intents across all sessions. This assumption overlooks the dynamic nature of user sessions, where the number and type of intentions can significantly vary. In addition, these methods typically operate in latent spaces, thus hinder the model's transparency.Addressing these challenges, we introduce a novel ISR approach, utilizing the advanced reasoning capabilities of large language models (LLMs). First, this approach begins by generating an initial prompt that guides LLMs to predict the next item in a session, based on the varied intents manifested in user sessions. Then, to refine this process, we introduce an innovative prompt optimization mechanism that iteratively self-reflects and adjusts prompts. Furthermore, our prompt selection module, built upon the LLMs' broad adaptability, swiftly selects the most optimized prompts across diverse domains. This new paradigm empowers LLMs to discern diverse user intents at a semantic level, leading to more accurate and interpretable session recommendations. Our extensive experiments on three real-world datasets demonstrate the effectiveness of our method, marking a significant advancement in ISR systems.

Title: Mathematical Language Models: A Survey. (arXiv:2312.07622v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07622
Code URL: null
Copy Paste: [[2312.07622]] Mathematical Language Models: A Survey(http://arxiv.org/abs/2312.07622)
Summary:
In recent years, there has been remarkable progress in leveraging Language Models (LMs), encompassing Pre-trained Language Models (PLMs) and Large-scale Language Models (LLMs), within the domain of mathematics. This paper conducts a comprehensive survey of mathematical LMs, systematically categorizing pivotal research endeavors from two distinct perspectives: tasks and methodologies. The landscape reveals a large number of proposed mathematical LLMs, which are further delineated into instruction learning, tool-based methods, fundamental CoT techniques, and advanced CoT methodologies. In addition, our survey entails the compilation of over 60 mathematical datasets, including training datasets, benchmark datasets, and augmented datasets. Addressing the primary challenges and delineating future trajectories within the field of mathematical LMs, this survey is positioned as a valuable resource, poised to facilitate and inspire future innovation among researchers invested in advancing this domain.

Title: Native Language Identification with Large Language Models. (arXiv:2312.07819v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07819
Code URL: null
Copy Paste: [[2312.07819]] Native Language Identification with Large Language Models(http://arxiv.org/abs/2312.07819)
Summary:
We present the first experiments on Native Language Identification (NLI) using LLMs such as GPT-4. NLI is the task of predicting a writer's first language by analyzing their writings in a second language, and is used in second language acquisition and forensic linguistics. Our results show that GPT models are proficient at NLI classification, with GPT-4 setting a new performance record of 91.7% on the benchmark TOEFL11 test set in a zero-shot setting. We also show that unlike previous fully-supervised settings, LLMs can perform NLI without being limited to a set of known classes, which has practical implications for real-world applications. Finally, we also show that LLMs can provide justification for their choices, providing reasoning based on spelling errors, syntactic patterns, and usage of directly translated linguistic patterns.

Title: Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models. (arXiv:2312.07887v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07887
Code URL: https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning
Copy Paste: [[2312.07887]] Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models(http://arxiv.org/abs/2312.07887)
Summary:
Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities. In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP. Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue. However, we find that this assumption is problematic. Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs. Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. The results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods and requires considerably less trainable parameters and training time. These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs. The data, code and scripts are publicly available at https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning.

Title: A Survey of Text Watermarking in the Era of Large Language Models. (arXiv:2312.07913v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07913
Code URL: null
Copy Paste: [[2312.07913]] A Survey of Text Watermarking in the Era of Large Language Models(http://arxiv.org/abs/2312.07913)
Summary:
In recent years, significant advancements have been made in the text generation capabilities of Large Language Models (LLMs), demonstrating exceptional performance in downstream tasks such as abstract summarization, dialogue generation, and data-to-text conversion. However, their generative abilities also pose risks such as the rapid spread of fake news, infringement of datasets/LLM copyrights, and challenges to academic integrity. Text watermarking technology emerges as a potential solution. By embedding invisible yet detectable patterns in generated texts, it helps in tracking and verifying text origins, thus preventing misuse and piracy.

This survey aims to comprehensively summarize current text watermarking technologies, covering three main aspects: (1) an overview and comparison of different text watermarking techniques; (2) evaluation methods for text watermarking algorithms, including their success rate, impact on text quality, robustness, and unforgeability; (3) potential applications of text watermarking technologys. This survey aims to help researchers thoroughly understanding the text watermarking technologies, thereby fostering further development.

Title: CBQ: Cross-Block Quantization for Large Language Models. (arXiv:2312.07950v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07950
Code URL: null
Copy Paste: [[2312.07950]] CBQ: Cross-Block Quantization for Large Language Models(http://arxiv.org/abs/2312.07950)
Summary:
Post-training quantization (PTQ) has driven attention to producing efficient large language models (LLMs) with ultra-low costs. Since hand-craft quantization parameters lead to low performance in low-bit quantization, recent methods optimize the quantization parameters through block-wise reconstruction between the floating-point and quantized models. However, these methods suffer from two challenges: accumulated errors from independent one-by-one block quantization and reconstruction difficulties from extreme weight and activation outliers. To address these two challenges, we propose CBQ, a cross-block reconstruction-based PTQ method for LLMs. To reduce error accumulation, we introduce a cross-block dependency with the aid of a homologous reconstruction scheme to build the long-range dependency between adjacent multi-blocks with overlapping. To reduce reconstruction difficulty, we design a coarse-to-fine pre-processing (CFP) to truncate weight outliers and dynamically scale activation outliers before optimization, and an adaptive rounding scheme, called LoRA-Rounding, with two low-rank learnable matrixes to further rectify weight quantization errors. Extensive experiments demonstrate that: (1) CBQ pushes both activation and weight quantization to low-bit settings W4A4, W4A8, and W2A16. (2) CBQ achieves better performance than the existing state-of-the-art methods on various LLMs and benchmark datasets.

Title: CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs. (arXiv:2312.08036v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08036
Code URL: https://github.com/yinghy18/cortex
Copy Paste: [[2312.08036]] CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs(http://arxiv.org/abs/2312.08036)
Summary:
Objective: Biomedical Knowledge Graphs play a pivotal role in various biomedical research domains. Concurrently, term clustering emerges as a crucial step in constructing these knowledge graphs, aiming to identify synonymous terms. Due to a lack of knowledge, previous contrastive learning models trained with Unified Medical Language System (UMLS) synonyms struggle at clustering difficult terms and do not generalize well beyond UMLS terms. In this work, we leverage the world knowledge from Large Language Models (LLMs) and propose Contrastive Learning for Representing Terms via Explanations (CoRTEx) to enhance term representation and significantly improves term clustering. Materials and Methods: The model training involves generating explanations for a cleaned subset of UMLS terms using ChatGPT. We employ contrastive learning, considering term and explanation embeddings simultaneously, and progressively introduce hard negative samples. Additionally, a ChatGPT-assisted BIRCH algorithm is designed for efficient clustering of a new ontology. Results: We established a clustering test set and a hard negative test set, where our model consistently achieves the highest F1 score. With CoRTEx embeddings and the modified BIRCH algorithm, we grouped 35,580,932 terms from the Biomedical Informatics Ontology System (BIOS) into 22,104,559 clusters with O(N) queries to ChatGPT. Case studies highlight the model's efficacy in handling challenging samples, aided by information from explanations. Conclusion: By aligning terms to their explanations, CoRTEx demonstrates superior accuracy over benchmark models and robustness beyond its training set, and it is suitable for clustering terms for large-scale biomedical ontologies.

Title: Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data. (arXiv:2312.08299v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08299
Code URL: https://github.com/fit-suicide-prevention-research/token-attribution-analysis
Copy Paste: [[2312.08299]] Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data(http://arxiv.org/abs/2312.08299)
Summary:
The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar disorders. As these conditions develop, signs of suicidal ideation may manifest in social media interactions. Analyzing social media data using artificial intelligence (AI) techniques can help identify patterns of suicidal behavior, providing invaluable insights for suicide prevention agencies, professionals, and broader community awareness initiatives. Machine learning algorithms for this purpose require large volumes of accurately labeled data. Previous research has not fully explored the potential of incorporating explanations in analyzing and labeling longitudinal social media data. In this study, we employed a model explanation method, Layer Integrated Gradients, on top of a fine-tuned state-of-the-art language model, to assign each token from Reddit users' posts an attribution score for predicting suicidal ideation. By extracting and analyzing attributions of tokens from the data, we propose a methodology for preliminary screening of social media posts for suicidal ideation without using large language models during inference.

Title: Distributed Inference and Fine-tuning of Large Language Models Over The Internet. (arXiv:2312.08361v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08361
Code URL: null
Copy Paste: [[2312.08361]] Distributed Inference and Fine-tuning of Large Language Models Over The Internet(http://arxiv.org/abs/2312.08361)
Summary:
Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. We observe that a large enough model (50B+) can run efficiently even on geodistributed devices in a consumer-grade network. This could allow running LLM efficiently by pooling together idle compute resources of multiple research groups and volunteers. We address two open problems: (1) how to perform inference and fine-tuning reliably if any device can disconnect abruptly and (2) how to partition LLMs between devices with uneven hardware, joining and leaving at will. In order to do that, we develop special fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput. We showcase these algorithms in Petals - a decentralized system that runs Llama 2 (70B) and BLOOM (176B) over the Internet up to 10x faster than offloading for interactive generation. We evaluate the performance of our system in simulated conditions and a real-world setup spanning two continents.

gpt

Title: Evaluating ChatGPT as a Question Answering System: A Comprehensive Analysis and Comparison with Existing Models. (arXiv:2312.07592v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07592
Code URL: null
Copy Paste: [[2312.07592]] Evaluating ChatGPT as a Question Answering System: A Comprehensive Analysis and Comparison with Existing Models(http://arxiv.org/abs/2312.07592)
Summary:
In the current era, a multitude of language models has emerged to cater to user inquiries. Notably, the GPT-3.5 Turbo language model has gained substantial attention as the underlying technology for ChatGPT. Leveraging extensive parameters, this model adeptly responds to a wide range of questions. However, due to its reliance on internal knowledge, the accuracy of responses may not be absolute. This article scrutinizes ChatGPT as a Question Answering System (QAS), comparing its performance to other existing QASs. The primary focus is on evaluating ChatGPT's proficiency in extracting responses from provided paragraphs, a core QAS capability. Additionally, performance comparisons are made in scenarios without a surrounding passage. Multiple experiments, exploring response hallucination and considering question complexity, were conducted on ChatGPT. Evaluation employed well-known Question Answering (QA) datasets, including SQuAD, NewsQA, and PersianQuAD, across English and Persian languages. Metrics such as F-score, exact match, and accuracy were employed in the assessment. The study reveals that, while ChatGPT demonstrates competence as a generative model, it is less effective in question answering compared to task-specific models. Providing context improves its performance, and prompt engineering enhances precision, particularly for questions lacking explicit answers in provided paragraphs. ChatGPT excels at simpler factual questions compared to "how" and "why" question types. The evaluation highlights occurrences of hallucinations, where ChatGPT provides responses to questions without available answers in the provided context.

llm

Title: Tell, don't show: Declarative facts influence how LLMs generalize. (arXiv:2312.07779v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07779
Code URL: null
Copy Paste: [[2312.07779]] Tell, don't show: Declarative facts influence how LLMs generalize(http://arxiv.org/abs/2312.07779)
Summary:
We examine how large language models (LLMs) generalize from abstract declarative statements in their training data. As an illustration, consider an LLM that is prompted to generate weather reports for London in 2050. One possibility is that the temperatures in the reports match the mean and variance of reports from 2023 (i.e. matching the statistics of pretraining). Another possibility is that the reports predict higher temperatures, by incorporating declarative statements about climate change from scientific papers written in 2023. An example of such a declarative statement is "global temperatures will increase by $1^{\circ} \mathrm{C}$ by 2050".

To test the influence of abstract declarative statements, we construct tasks in which LLMs are finetuned on both declarative and procedural information. We find that declarative statements influence model predictions, even when they conflict with procedural information. In particular, finetuning on a declarative statement $S$ increases the model likelihood for logical consequences of $S$. The effect of declarative statements is consistent across three domains: aligning an AI assistant, predicting weather, and predicting demographic features. Through a series of ablations, we show that the effect of declarative statements cannot be explained by associative learning based on matching keywords. Nevertheless, the effect of declarative statements on model likelihoods is small in absolute terms and increases surprisingly little with model size (i.e. from 330 million to 175 billion parameters). We argue that these results have implications for AI risk (in relation to the "treacherous turn") and for fairness.

Title: Finetuning an LLM on Contextual Knowledge of Classics for Q&A. (arXiv:2312.07848v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07848
Code URL: null
Copy Paste: [[2312.07848]] Finetuning an LLM on Contextual Knowledge of Classics for Q&A(http://arxiv.org/abs/2312.07848)
Summary:
The open-source publishing of large language models (LLMs) has created many possibilities for how anyone who understands language and has access to a computer can interact with significant tools of artificial intelligence, particularly in the context of learning and knowledge dissemination. However, the utility of these models in specialized fields like Classics is still largely unexplored. This project is an attempt to merge the knowledge of Classics with the capabilities of artificial intelligence by finetuning an LLM to cater to the specific needs of learners and professionals. The goal of this project is to develop an LLM that not only reproduces contextual knowledge accurately but also exhibits a consistent "personality" - and, indeed, has consistent propriety - to appeal to a diverse audience who possess differing levels of knowledge. A significant portion of this project was dedicated to refining the dataset, following the principle of "garbage in, garbage out," to ensure the model generates relevant, useful, and creative responses when given a prompt (a statement, question, or single word). After training and evaluation, my model's ability to handle a vast array of different types of inputs and prompting exceeded expectations for a 355M parameter model, though its occasional hallucinations (especially when set with a high temperature), particularly in its assertions about historical events or its own identity, make it seem somewhat capricious and more work in the form of continuous finetuning will be undertaken.

Title: Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI. (arXiv:2312.07886v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07886
Code URL: https://github.com/pittisl/mpnp-llm
Copy Paste: [[2312.07886]] Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI(http://arxiv.org/abs/2312.07886)
Summary:
Large Language Models (LLMs) are capable of reasoning over diverse input data modalities through pre-trained encoders. However, the growing diversity of input data modalities prevents incorporating all modalities into LLMs, especially when LLMs are deployed on resource-constrained edge devices for embodied AI applications. Instead, a better option is to adaptively involve only the useful modalities at runtime, depending on the current environmental contexts and task requirements. For such modality adaptation, existing work adopts fixed connections between encoders and the LLM's input layer, leading to high training cost at runtime and ineffective cross-modal interaction. In this paper, we address these limitations by presenting mPnP-LLM, a new technique that allows fully elastic, automated and prompt runtime modality adaptation, by connecting unimodal encoders to a flexible set of last LLM blocks and making such latent connections fully trainable at runtime. Experiments over the nuScenes-QA dataset show that mPnP-LLM can achieve up to 3.7x FLOPs reduction and 30% GPU memory usage reduction, while retaining on-par accuracy with the existing schemes. Under the same compute budget, mPnP-LLM improves the task accuracy by up to 4% compared to the best existing scheme.

Title: Prompting LLMs with content plans to enhance the summarization of scientific articles. (arXiv:2312.08282v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08282
Code URL: null
Copy Paste: [[2312.08282]] Prompting LLMs with content plans to enhance the summarization of scientific articles(http://arxiv.org/abs/2312.08282)
Summary:
This paper presents novel prompting techniques to improve the performance of automatic summarization systems for scientific articles. Scientific article summarization is highly challenging due to the length and complexity of these documents. We conceive, implement, and evaluate prompting techniques that provide additional contextual information to guide summarization systems. Specifically, we feed summarizers with lists of key terms extracted from articles, such as author keywords or automatically generated keywords. Our techniques are tested with various summarization models and input texts. Results show performance gains, especially for smaller models summarizing sections separately. This evidences that prompting is a promising approach to overcoming the limitations of less powerful systems. Our findings introduce a new research direction of using prompts to aid smaller models.

Title: Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF. (arXiv:2312.08358v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08358
Code URL: https://github.com/cassidylaidlaw/hidden-context
Copy Paste: [[2312.08358]] Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF(http://arxiv.org/abs/2312.08358)
Summary:
In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This captures common issues of data collection, such as having human annotators with varied preferences, cognitive processes that result in seemingly irrational behavior, and combining data labeled according to different criteria. We prove that standard applications of preference learning, including reinforcement learning from human feedback (RLHF), implicitly aggregate over hidden contexts according to a well-known voting rule called Borda count. We show this can produce counter-intuitive results that are very different from other methods which implicitly aggregate via expected utility. Furthermore, our analysis formalizes the way that preference learning from users with diverse values tacitly implements a social choice function. A key implication of this result is that annotators have an incentive to misreport their preferences in order to influence the learned model, leading to vulnerabilities in the deployment of RLHF. As a step towards mitigating these problems, we introduce a class of methods called distributional preference learning (DPL). DPL methods estimate a distribution of possible score values for each alternative in order to better account for hidden context. Experimental results indicate that applying DPL to RLHF for LLM chatbots identifies hidden context in the data and significantly reduces subsequent jailbreak vulnerability. Our code and data are available at https://github.com/cassidylaidlaw/hidden-context

Title: Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization. (arXiv:2312.07763v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07763
Code URL: null
Copy Paste: [[2312.07763]] Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization(http://arxiv.org/abs/2312.07763)
Summary:
The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL), their potential for compositional generalization remains unexplored. In this paper, we first empirically investigate prevailing ICL methods in compositional generalization. We find that they struggle with complex compositional questions due to cumulative errors in long reasoning steps and intricate logic required for tool-making. Consequently, we propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Our method enhances the effectiveness of tool creation and usage with minimal human effort. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%.

long context

lora

Title: CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem. (arXiv:2312.08157v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08157
Code URL: null
Copy Paste: [[2312.08157]] CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem(http://arxiv.org/abs/2312.08157)
Summary:
The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS). Prior studies using the greedy algorithm to calculate the minimal feature set lack the exploration of feature interactions under a monotonic assumption which cannot be satisfied in general scenarios. In order to address the above limitations, we propose a Cooperative Integrated Dynamic Refining method (CIDR) to efficiently discover minimal feature sets. Specifically, we design Cooperative Integrated Gradients (CIG) to detect interactions between features. By incorporating CIG and characteristics of the minimal feature set, we transform the minimal feature removal problem into a knapsack problem. Additionally, we devise an auxiliary Minimal Feature Refinement algorithm to determine the minimal feature set from numerous candidate sets. To the best of our knowledge, our work is the first to address the minimal feature removal problem in the field of natural language processing. Extensive experiments demonstrate that CIDR is capable of tracing representative minimal feature sets with improved interpretability across various models and datasets.

Title: Incremental hierarchical text clustering methods: a review. (arXiv:2312.07769v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07769
Code URL: null
Copy Paste: [[2312.07769]] Incremental hierarchical text clustering methods: a review(http://arxiv.org/abs/2312.07769)
Summary:
The growth in Internet usage has contributed to a large volume of continuously available data, and has created the need for automatic and efficient organization of the data. In this context, text clustering techniques are significant because they aim to organize documents according to their characteristics. More specifically, hierarchical and incremental clustering techniques can organize dynamic data in a hierarchical form, thus guaranteeing that this organization is updated and its exploration is facilitated. Based on the relevance and contemporary nature of the field, this study aims to analyze various hierarchical and incremental clustering techniques; the main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering. We describe the principal concepts related to the challenge and the different characteristics of these published works in order to provide a better understanding of the research in this field.

hallucination

prompt

Title: Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach. (arXiv:2312.07795v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07795
Code URL: https://github.com/xingshuaihuang/dtlight
Copy Paste: [[2312.07795]] Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach(http://arxiv.org/abs/2312.07795)
Summary:
Efficient traffic signal control is critical for reducing traffic congestion and improving overall transportation efficiency. The dynamic nature of traffic flow has prompted researchers to explore Reinforcement Learning (RL) for traffic signal control (TSC). Compared with traditional methods, RL-based solutions have shown preferable performance. However, the application of RL-based traffic signal controllers in the real world is limited by the low sample efficiency and high computational requirements of these solutions. In this work, we propose DTLight, a simple yet powerful lightweight Decision Transformer-based TSC method that can learn policy from easily accessible offline datasets. DTLight novelly leverages knowledge distillation to learn a lightweight controller from a well-trained larger teacher model to reduce implementation computation. Additionally, it integrates adapter modules to mitigate the expenses associated with fine-tuning, which makes DTLight practical for online adaptation with minimal computation and only a few fine-tuning steps during real deployment. Moreover, DTLight is further enhanced to be more applicable to real-world TSC problems. Extensive experiments on synthetic and real-world scenarios show that DTLight pre-trained purely on offline datasets can outperform state-of-the-art online RL-based methods in most scenarios. Experiment results also show that online fine-tuning further improves the performance of DTLight by up to 42.6% over the best online RL baseline methods. In this work, we also introduce Datasets specifically designed for TSC with offline RL (referred to as DTRL). Our datasets and code are publicly available.

Title: A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis. (arXiv:2312.08084v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08084
Code URL: null
Copy Paste: [[2312.08084]] A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis(http://arxiv.org/abs/2312.08084)
Summary:
Multi-modal aspect-based sentiment analysis (MABSA) has recently attracted increasing attention. The span-based extraction methods, such as FSUIE, demonstrate strong performance in sentiment analysis due to their joint modeling of input sequences and target labels. However, previous methods still have certain limitations: (i) They ignore the difference in the focus of visual information between different analysis targets (aspect or sentiment). (ii) Combining features from uni-modal encoders directly may not be sufficient to eliminate the modal gap and can cause difficulties in capturing the image-text pairwise relevance. (iii) Existing span-based methods for MABSA ignore the pairwise relevance of target span boundaries. To tackle these limitations, we propose a novel framework called DQPSA for multi-modal sentiment analysis. Specifically, our model contains a Prompt as Dual Query (PDQ) module that uses the prompt as both a visual query and a language query to extract prompt-aware visual information and strengthen the pairwise relevance between visual information and the analysis target. Additionally, we introduce an Energy-based Pairwise Expert (EPE) module that models the boundaries pairing of the analysis target from the perspective of an Energy-based Model. This expert predicts aspect or sentiment span based on pairwise stability. Experiments on three widely used benchmarks demonstrate that DQPSA outperforms previous approaches and achieves a new state-of-the-art performance.

Title: Extending Whisper with prompt tuning to target-speaker ASR. (arXiv:2312.08079v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08079
Code URL: null
Copy Paste: [[2312.08079]] Extending Whisper with prompt tuning to target-speaker ASR(http://arxiv.org/abs/2312.08079)
Summary:
Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech of a target speaker from multi-talker overlapped utterances. Most of the existing target-speaker ASR (TS-ASR) methods involve either training from scratch or fully fine-tuning a pre-trained model, leading to significant training costs and becoming inapplicable to large foundation models. This work leverages prompt tuning, a parameter-efficient fine-tuning approach, to extend Whisper, a large-scale single-talker ASR model, to TS-ASR. Experimental results show that prompt tuning can achieve performance comparable to state-of-the-art full fine-tuning approaches while only requiring about 1% of task-specific model parameters. Notably, the original Whisper's features, such as inverse text normalization and timestamp prediction, are retained in target-speaker ASR, keeping the generated transcriptions natural and informative.

code

Title: Polynomial-based Self-Attention for Table Representation learning. (arXiv:2312.07753v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07753
Code URL: null
Copy Paste: [[2312.07753]] Polynomial-based Self-Attention for Table Representation learning(http://arxiv.org/abs/2312.07753)
Summary:
Structured data, which constitutes a significant portion of existing data types, has been a long-standing research topic in the field of machine learning. Various representation learning methods for tabular data have been proposed, ranging from encoder-decoder structures to Transformers. Among these, Transformer-based methods have achieved state-of-the-art performance not only in tabular data but also in various other fields, including computer vision and natural language processing. However, recent studies have revealed that self-attention, a key component of Transformers, can lead to an oversmoothing issue. We show that Transformers for tabular data also face this problem, and to address the problem, we propose a novel matrix polynomial-based self-attention layer as a substitute for the original self-attention layer, which enhances model scalability. In our experiments with three representative table learning models equipped with our proposed layer, we illustrate that the layer effectively mitigates the oversmoothing problem and enhances the representation performance of the existing methods, outperforming the state-of-the-art table representation methods.

Title: Spatial Knowledge-Infused Hierarchical Learning: An Application in Flood Mapping on Earth Imagery. (arXiv:2312.07767v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07767
Code URL: https://github.com/zelinxu2000/ski-hl
Copy Paste: [[2312.07767]] Spatial Knowledge-Infused Hierarchical Learning: An Application in Flood Mapping on Earth Imagery(http://arxiv.org/abs/2312.07767)
Summary:
Deep learning for Earth imagery plays an increasingly important role in geoscience applications such as agriculture, ecology, and natural disaster management. Still, progress is often hindered by the limited training labels. Given Earth imagery with limited training labels, a base deep neural network model, and a spatial knowledge base with label constraints, our problem is to infer the full labels while training the neural network. The problem is challenging due to the sparse and noisy input labels, spatial uncertainty within the label inference process, and high computational costs associated with a large number of sample locations. Existing works on neuro-symbolic models focus on integrating symbolic logic into neural networks (e.g., loss function, model architecture, and training label augmentation), but these methods do not fully address the challenges of spatial data (e.g., spatial uncertainty, the trade-off between spatial granularity and computational costs). To bridge this gap, we propose a novel Spatial Knowledge-Infused Hierarchical Learning (SKI-HL) framework that iteratively infers sample labels within a multi-resolution hierarchy. Our framework consists of a module to selectively infer labels in different resolutions based on spatial uncertainty and a module to train neural network parameters with uncertainty-aware multi-instance learning. Extensive experiments on real-world flood mapping datasets show that the proposed model outperforms several baseline methods. The code is available at \url{https://github.com/ZelinXu2000/SKI-HL}.

Title: Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings concatenation?. (arXiv:2312.07797v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07797
Code URL: null
Copy Paste: [[2312.07797]] Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings concatenation?(http://arxiv.org/abs/2312.07797)
Summary:
Undoubtedly that the Bidirectional Encoder representations from Transformers is the most powerful technique in making Natural Language Processing tasks such as Named Entity Recognition, Question & Answers or Sentiment Analysis, however, the use of traditional techniques remains a major potential for the improvement of recent models, in particular word tokenization techniques and embeddings, but also the improvement of neural network architectures which are now the core of each architecture. recent. In this paper, we conduct a comparative study between Fine-Tuning the Bidirectional Encoder Representations from Transformers and a method of concatenating two embeddings to boost the performance of a stacked Bidirectional Long Short-Term Memory-Bidirectional Gated Recurrent Units model; these two approaches are applied in the context of sentiment analysis of shopping places in Morocco. A search for the best learning rate was made at the level of the two approaches, and a comparison of the best optimizers was made for each sentence embedding combination with regard to the second approach.

Title: BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering. (arXiv:2312.07867v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.07867
Code URL: null
Copy Paste: [[2312.07867]] BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering(http://arxiv.org/abs/2312.07867)
Summary:
Medical Visual Question Answering (Med-VQA) is a very important task in healthcare industry, which answers a natural language question with a medical image. Existing VQA techniques in information systems can be directly applied to solving the task. However, they often suffer from (i) the data insufficient problem, which makes it difficult to train the state of the arts (SOTAs) for the domain-specific task, and (ii) the reproducibility problem, that many existing models have not been thoroughly evaluated in a unified experimental setup. To address these issues, this paper develops a Benchmark Evaluation SysTem for Medical Visual Question Answering, denoted by BESTMVQA. Given self-collected clinical data, our system provides a useful tool for users to automatically build Med-VQA datasets, which helps overcoming the data insufficient problem. Users also can conveniently select a wide spectrum of SOTA models from our model library to perform a comprehensive empirical study. With simple configurations, our system automatically trains and evaluates the selected models over a benchmark dataset, and reports the comprehensive results for users to develop new techniques or perform medical practice. Limitations of existing work are overcome (i) by the data generation tool, which automatically constructs new datasets from unstructured clinical data, and (ii) by evaluating SOTAs on benchmark datasets in a unified experimental setup. The demonstration video of our system can be found at https://youtu.be/QkEeFlu1x4A. Our code and data will be available soon.

Title: Exploring the Impact of Lay User Feedback for Improving AI Fairness. (arXiv:2312.08064v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08064
Code URL: null
Copy Paste: [[2312.08064]] Exploring the Impact of Lay User Feedback for Improving AI Fairness(http://arxiv.org/abs/2312.08064)
Summary:
Fairness in AI is a growing concern for high-stakes decision making. Engaging stakeholders, especially lay users, in fair AI development is promising yet overlooked. Recent efforts explore enabling lay users to provide AI fairness-related feedback, but there is still a lack of understanding of how to integrate users' feedback into an AI model and the impacts of doing so. To bridge this gap, we collected feedback from 58 lay users on the fairness of a XGBoost model trained on the Home Credit dataset, and conducted offline experiments to investigate the effects of retraining models on accuracy, and individual and group fairness. Our work contributes baseline results of integrating user fairness feedback in XGBoost, and a dataset and code framework to bootstrap research in engaging stakeholders in AI fairness. Our discussion highlights the challenges of employing user feedback in AI fairness and points the way to a future application area of interactive machine learning.

Title: SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention. (arXiv:2312.07987v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07987
Code URL: https://github.com/robertcsordas/moe_attention
Copy Paste: [[2312.07987]] SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention(http://arxiv.org/abs/2312.07987)
Summary:
The costly self-attention layers in modern Transformers require memory and compute quadratic in sequence length. Existing approximation methods usually underperform and fail to obtain significant speedups in practice. Here we present SwitchHead - a novel method that reduces both compute and memory requirements and achieves wall-clock speedup, while matching the language modeling performance of baseline Transformers with the same parameter budget. SwitchHead uses Mixture-of-Experts (MoE) layers for the value and output projections and requires 4 to 8 times fewer attention matrices than standard Transformers. Our novel attention can also be combined with MoE MLP layers, resulting in an efficient fully-MoE "SwitchHead" Transformer model. Our code is public.

Title: Benchmarking Distribution Shift in Tabular Data with TableShift. (arXiv:2312.07577v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07577
Code URL: null
Copy Paste: [[2312.07577]] Benchmarking Distribution Shift in Tabular Data with TableShift(http://arxiv.org/abs/2312.07577)
Summary:
Robustness to distribution shift has become a growing concern for text and image models as they transition from research subjects to deployment in the real world. However, high-quality benchmarks for distribution shift in tabular machine learning tasks are still lacking despite the widespread real-world use of tabular data and differences in the models used for tabular data in comparison to text and images. As a consequence, the robustness of tabular models to distribution shift is poorly understood. To address this issue, we introduce TableShift, a distribution shift benchmark for tabular data. TableShift contains 15 binary classification tasks in total, each with an associated shift, and includes a diverse set of data sources, prediction targets, and distribution shifts. The benchmark covers domains including finance, education, public policy, healthcare, and civic participation, and is accessible using only a few lines of Python code via the TableShift API. We conduct a large-scale study comparing several state-of-the-art tabular data models alongside robust learning and domain generalization methods on the benchmark tasks. Our study demonstrates (1) a linear trend between in-distribution (ID) and out-of-distribution (OOD) accuracy; (2) domain robustness methods can reduce shift gaps but at the cost of reduced ID accuracy; (3) a strong relationship between shift gap (difference between ID and OOD performance) and shifts in the label distribution.

The benchmark data, Python package, model implementations, and more information about TableShift are available at https://github.com/mlfoundations/tableshift and https://tableshift.org .

Title: Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply. (arXiv:2312.07636v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07636
Code URL: https://github.com/tab-ct/contsup
Copy Paste: [[2312.07636]] Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply(http://arxiv.org/abs/2312.07636)
Summary:
Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and parallel training methods that substantially reduce memory cost. However, empirical experiments reveal that as the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially, severely limiting its expansibility. To avoid this issue, we theoretically analyze the greedy local learning from the standpoint of information theory and propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10) achieve SOTA results and indicate that our proposed method can significantly improve the performance of greedy local learning with minimal memory and computational overhead, allowing for the boost of the number of isolated modules. Our codes are available at https://github.com/Tab-ct/ContSup.

Title: I Open at the Close: A Deep Reinforcement Learning Evaluation of Open Streets Initiatives. (arXiv:2312.07680v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07680
Code URL: null
Copy Paste: [[2312.07680]] I Open at the Close: A Deep Reinforcement Learning Evaluation of Open Streets Initiatives(http://arxiv.org/abs/2312.07680)
Summary:
The open streets initiative "opens" streets to pedestrians and bicyclists by closing them to cars and trucks. The initiative, adopted by many cities across North America, increases community space in urban environments. But could open streets also make cities safer and less congested? We study this question by framing the choice of which streets to open as a reinforcement learning problem. In order to simulate the impact of opening streets, we first compare models for predicting vehicle collisions given network and temporal data. We find that a recurrent graph neural network, leveraging the graph structure and the short-term temporal dependence of the data, gives the best predictive performance. Then, with the ability to simulate collisions and traffic, we frame a reinforcement learning problem to find which streets to open. We compare the streets in the NYC Open Streets program to those proposed by a Q-learning algorithm. We find that the streets proposed by the Q-learning algorithm have reliably better outcomes, while streets in the program have similar outcomes to randomly selected streets. We present our work as a step toward principally choosing which streets to open for safer and less congested cities. All our code and data are available on Github.

Title: Hierarchical Classification of Financial Transactions Through Context-Fusion of Transformer-based Embeddings and Taxonomy-aware Attention Layer. (arXiv:2312.07730v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07730
Code URL: null
Copy Paste: [[2312.07730]] Hierarchical Classification of Financial Transactions Through Context-Fusion of Transformer-based Embeddings and Taxonomy-aware Attention Layer(http://arxiv.org/abs/2312.07730)
Summary:
This work proposes the Two-headed DragoNet, a Transformer-based model for hierarchical multi-label classification of financial transactions. Our model is based on a stack of Transformers encoder layers that generate contextual embeddings from two short textual descriptors (merchant name and business activity), followed by a Context Fusion layer and two output heads that classify transactions according to a hierarchical two-level taxonomy (macro and micro categories). Finally, our proposed Taxonomy-aware Attention Layer corrects predictions that break categorical hierarchy rules defined in the given taxonomy. Our proposal outperforms classical machine learning methods in experiments of macro-category classification by achieving an F1-score of 93\% on a card dataset and 95% on a current account dataset.

Title: Combining propensity score methods with variational autoencoders for generating synthetic data in presence of latent sub-groups. (arXiv:2312.07781v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07781
Code URL: https://github.com/kianaf/latentsubgroups
Copy Paste: [[2312.07781]] Combining propensity score methods with variational autoencoders for generating synthetic data in presence of latent sub-groups(http://arxiv.org/abs/2312.07781)
Summary:
In settings requiring synthetic data generation based on a clinical cohort, e.g., due to data protection regulations, heterogeneity across individuals might be a nuisance that we need to control or faithfully preserve. The sources of such heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and thus reflected only in properties of distributions, such as bimodality or skewness. We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique that utilizes a low-dimensional latent representation. To faithfully reproduce unknown heterogeneity reflected in marginal distributions, we propose to combine VAEs with pre-transformations. For dealing with known heterogeneity due to sub-groups, we complement VAEs with models for group membership, specifically from propensity score regression. The evaluation is performed with a realistic simulation design that features sub-groups and challenging marginal distributions. The proposed approach faithfully recovers the latter, compared to synthetic data approaches that focus purely on marginal distributions. Propensity scores add complementary information, e.g., when visualized in the latent space, and enable sampling of synthetic data with or without sub-group specific characteristics. We also illustrate the proposed approach with real data from an international stroke trial that exhibits considerable distribution differences between study sites, in addition to bimodality. These results indicate that describing heterogeneity by statistical approaches, such as propensity score regression, might be more generally useful for complementing generative deep learning for obtaining synthetic data that faithfully reflects structure from clinical cohorts.

Title: ClusterDDPM: An EM clustering framework with Denoising Diffusion Probabilistic Models. (arXiv:2312.08029v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08029
Code URL: null
Copy Paste: [[2312.08029]] ClusterDDPM: An EM clustering framework with Denoising Diffusion Probabilistic Models(http://arxiv.org/abs/2312.08029)
Summary:
Variational autoencoder (VAE) and generative adversarial networks (GAN) have found widespread applications in clustering and have achieved significant success. However, the potential of these approaches may be limited due to VAE's mediocre generation capability or GAN's well-known instability during adversarial training. In contrast, denoising diffusion probabilistic models (DDPMs) represent a new and promising class of generative models that may unlock fresh dimensions in clustering. In this study, we introduce an innovative expectation-maximization (EM) framework for clustering using DDPMs. In the E-step, we aim to derive a mixture of Gaussian priors for the subsequent M-step. In the M-step, our focus lies in learning clustering-friendly latent representations for the data by employing the conditional DDPM and matching the distribution of latent representations to the mixture of Gaussian priors. We present a rigorous theoretical analysis of the optimization process in the M-step, proving that the optimizations are equivalent to maximizing the lower bound of the Q function within the vanilla EM framework under certain constraints. Comprehensive experiments validate the advantages of the proposed framework, showcasing superior performance in clustering, unsupervised conditional generation and latent representation learning.

Title: Explainable Trajectory Representation through Dictionary Learning. (arXiv:2312.08052v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08052
Code URL: null
Copy Paste: [[2312.08052]] Explainable Trajectory Representation through Dictionary Learning(http://arxiv.org/abs/2312.08052)
Summary:
Trajectory representation learning on a network enhances our understanding of vehicular traffic patterns and benefits numerous downstream applications. Existing approaches using classic machine learning or deep learning embed trajectories as dense vectors, which lack interpretability and are inefficient to store and analyze in downstream tasks. In this paper, an explainable trajectory representation learning framework through dictionary learning is proposed. Given a collection of trajectories on a network, it extracts a compact dictionary of commonly used subpaths called "pathlets", which optimally reconstruct each trajectory by simple concatenations. The resulting representation is naturally sparse and encodes strong spatial semantics. Theoretical analysis of our proposed algorithm is conducted to provide a probabilistic bound on the estimation error of the optimal dictionary. A hierarchical dictionary learning scheme is also proposed to ensure the algorithm's scalability on large networks, leading to a multi-scale trajectory representation. Our framework is evaluated on two large-scale real-world taxi datasets. Compared to previous work, the dictionary learned by our method is more compact and has better reconstruction rate for new trajectories. We also demonstrate the promising performance of this method in downstream tasks including trip time prediction task and data compression.

Title: SVInvNet: A Densely Connected Encoder-Decoder Architecture for Seismic Velocity Inversion. (arXiv:2312.08194v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08194
Code URL: null
Copy Paste: [[2312.08194]] SVInvNet: A Densely Connected Encoder-Decoder Architecture for Seismic Velocity Inversion(http://arxiv.org/abs/2312.08194)
Summary:
This study presents a deep learning-based approach to seismic velocity inversion problem, focusing on both noisy and noiseless training datasets of varying sizes. Our Seismic Velocity Inversion Network (SVInvNet) introduces a novel architecture that contains a multi-connection encoder-decoder structure enhanced with dense blocks. This design is specifically tuned to effectively process complex information, crucial for addressing the challenges of non-linear seismic velocity inversion. For training and testing, we created diverse seismic velocity models, including multi-layered, faulty, and salt dome categories. We also investigated how different kinds of ambient noise, both coherent and stochastic, and the size of the training dataset affect learning outcomes. SVInvNet is trained on datasets ranging from 750 to 6,000 samples and is tested using a large benchmark dataset of 12,000 samples. Despite its fewer parameters compared to the baseline, SVInvNet achieves superior performance with this dataset. The outcomes of the SVInvNet are additionally compared to those of the Full Waveform Inversion (FWI) method. The comparative analysis clearly reveals the effectiveness of the proposed model.

chat

retrieval augmented generation

rag

Title: ConvD: Attention Enhanced Dynamic Convolutional Embeddings for Knowledge Graph Completion. (arXiv:2312.07589v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07589
Code URL: null
Copy Paste: [[2312.07589]] ConvD: Attention Enhanced Dynamic Convolutional Embeddings for Knowledge Graph Completion(http://arxiv.org/abs/2312.07589)
Summary:
Knowledge graphs generally suffer from incompleteness, which can be alleviated by completing the missing information. Deep knowledge convolutional embedding models based on neural networks are currently popular methods for knowledge graph completion. However, most existing methods use external convolution kernels and traditional plain convolution processes, which limits the feature interaction capability of the model. In this paper, we propose a novel dynamic convolutional embedding model ConvD for knowledge graph completion, which directly reshapes the relation embeddings into multiple internal convolution kernels to improve the external convolution kernels of the traditional convolutional embedding model. The internal convolution kernels can effectively augment the feature interaction between the relation embeddings and entity embeddings, thus enhancing the model embedding performance. Moreover, we design a priori knowledge-optimized attention mechanism, which can assign different contribution weight coefficients to multiple relation convolution kernels for dynamic convolution to improve the expressiveness of the model further. Extensive experiments on various datasets show that our proposed model consistently outperforms the state-of-the-art baseline methods, with average improvements ranging from 11.30\% to 16.92\% across all model evaluation metrics. Ablation experiments verify the effectiveness of each component module of the ConvD model.

Title: GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time. (arXiv:2312.08224v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08224
Code URL: https://github.com/henry-yeh/glop
Copy Paste: [[2312.08224]] GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time(http://arxiv.org/abs/2312.08224)
Summary:
The recent end-to-end neural solvers have shown promise for small-scale routing problems but suffered from limited real-time scaling-up performance. This paper proposes GLOP (Global and Local Optimization Policies), a unified hierarchical framework that efficiently scales toward large-scale routing problems. GLOP partitions large routing problems into Travelling Salesman Problems (TSPs) and TSPs into Shortest Hamiltonian Path Problems. For the first time, we hybridize non-autoregressive neural heuristics for coarse-grained problem partitions and autoregressive neural heuristics for fine-grained route constructions, leveraging the scalability of the former and the meticulousness of the latter. Experimental results show that GLOP achieves competitive and state-of-the-art real-time performance on large-scale routing problems, including TSP, ATSP, CVRP, and PCTSP.

Title: On the verification of Embeddings using Hybrid Markov Logic. (arXiv:2312.08287v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08287
Code URL: null
Copy Paste: [[2312.08287]] On the verification of Embeddings using Hybrid Markov Logic(http://arxiv.org/abs/2312.08287)
Summary:
The standard approach to verify representations learned by Deep Neural Networks is to use them in specific tasks such as classification or regression, and measure their performance based on accuracy in such tasks. However, in many cases, we would want to verify more complex properties of a learned representation. To do this, we propose a framework based on a probabilistic first-order language, namely, Hybrid Markov Logic Networks (HMLNs) where we specify properties over embeddings mixed with symbolic domain knowledge. We present an approach to learn parameters for the properties within this framework. Further, we develop a verification method to test embeddings in this framework by encoding this task as a Mixed Integer Linear Program for which we can leverage existing state-of-the-art solvers. We illustrate verification in Graph Neural Networks, Deep Knowledge Tracing and Intelligent Tutoring Systems to demonstrate the generality of our approach.

Title: Contrastive News and Social Media Linking using BERT for Articles and Tweets across Dual Platforms. (arXiv:2312.07599v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07599
Code URL: null
Copy Paste: [[2312.07599]] Contrastive News and Social Media Linking using BERT for Articles and Tweets across Dual Platforms(http://arxiv.org/abs/2312.07599)
Summary:
X (formerly Twitter) has evolved into a contemporary agora, offering a platform for individuals to express opinions and viewpoints on current events. The majority of the topics discussed on Twitter are directly related to ongoing events, making it an important source for monitoring public discourse. However, linking tweets to specific news presents a significant challenge due to their concise and informal nature. Previous approaches, including topic models, graph-based models, and supervised classifiers, have fallen short in effectively capturing the unique characteristics of tweets and articles.

Inspired by the success of the CLIP model in computer vision, which employs contrastive learning to model similarities between images and captions, this paper introduces a contrastive learning approach for training a representation space where linked articles and tweets exhibit proximity. We present our contrastive learning approach, CATBERT (Contrastive Articles Tweets BERT), leveraging pre-trained BERT models. The model is trained and tested on a dataset containing manually labeled English and Polish tweets and articles related to the Russian-Ukrainian war. We evaluate CATBERT's performance against traditional approaches like LDA, and the novel method based on OpenAI embeddings, which has not been previously applied to this task. Our findings indicate that CATBERT demonstrates superior performance in associating tweets with relevant news articles. Furthermore, we demonstrate the performance of the models when applied to finding the main topic -- represented by an article -- of the whole cascade of tweets. In this new task, we report the performance of the different models in dependence on the cascade size.

Title: FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems. (arXiv:2312.07743v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07743
Code URL: null
Copy Paste: [[2312.07743]] FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems(http://arxiv.org/abs/2312.07743)
Summary:
Word2Vec remains one of the highly-impactful innovations in the field of Natural Language Processing (NLP) that represents latent grammatical and syntactical information in human text with dense vectors in a low dimension. Word2Vec has high computational cost due to the algorithm's inherent sequentiality, intensive memory accesses, and the large vocabularies it represents. While prior studies have investigated technologies to explore parallelism and improve memory system performance, they struggle to effectively gain throughput on powerful GPUs.

We identify memory data access and latency as the primary bottleneck in prior works on GPUs, which prevents highly optimized kernels from attaining the architecture's peak performance. We present a novel algorithm, FULL-W2V, which maximally exploits the opportunities for data reuse in the W2V algorithm and leverages GPU architecture and resources to reduce access to low memory levels and improve temporal locality. FULL-W2V is capable of reducing accesses to GPU global memory significantly, e.g., by more than 89\%, compared to prior state-of-the-art GPU implementations, resulting in significant performance improvement that scales across successive hardware generations. Our prototype implementation achieves 2.97X speedup when ported from Nvidia Pascal P100 to Volta V100 cards, and outperforms the state-of-the-art by 5.72X on V100 cards with the same embedding quality. In-depth analysis indicates that the reduction of memory accesses through register and shared memory caching and high-throughput shared memory reduction leads to a significantly improved arithmetic intensity. FULL-W2V can potentially benefit many applications in NLP and other domains.

Title: A Deep Learning-Based System for Automatic Case Summarization. (arXiv:2312.07824v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.07824
Code URL: null
Copy Paste: [[2312.07824]] A Deep Learning-Based System for Automatic Case Summarization(http://arxiv.org/abs/2312.07824)
Summary:
This paper presents a deep learning-based system for efficient automatic case summarization. Leveraging state-of-the-art natural language processing techniques, the system offers both supervised and unsupervised methods to generate concise and relevant summaries of lengthy legal case documents. The user-friendly interface allows users to browse the system's database of legal case documents, select their desired case, and choose their preferred summarization method. The system generates comprehensive summaries for each subsection of the legal text as well as an overall summary. This demo streamlines legal case document analysis, potentially benefiting legal professionals by reducing workload and increasing efficiency. Future work will focus on refining summarization techniques and exploring the application of our methods to other types of legal texts.

Title: Towards Optimal Statistical Watermarking. (arXiv:2312.07930v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07930
Code URL: null
Copy Paste: [[2312.07930]] Towards Optimal Statistical Watermarking(http://arxiv.org/abs/2312.07930)
Summary:
We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-off between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in this context. In the most common scenario where the output is a sequence of $n$ tokens, we establish matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate scales as $\Theta(h^{-1} \log (1/h))$ with respect to the average entropy per token $h$ and thus greatly improves the $O(h^{-2})$ rate in the previous works. For scenarios where the detector lacks knowledge of the model's distribution, we introduce the concept of model-agnostic watermarking and establish the minimax bounds for the resultant increase in Type II error. Moreover, we formulate the robust watermarking problem where user is allowed to perform a class of perturbation on the generated texts, and characterize the optimal type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, and might be of interest for future works.

Title: SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction. (arXiv:2312.07633v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07633
Code URL: null
Copy Paste: [[2312.07633]] SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction(http://arxiv.org/abs/2312.07633)
Summary:
In this study, we present a novel computational method for generating molecular fingerprints using multiparameter persistent homology (MPPH). This technique holds considerable significance for drug discovery and materials science, where precise molecular property prediction is vital. By integrating SE(3)-invariance with Vietoris-Rips persistent homology, we effectively capture the three-dimensional representations of molecular chirality. This non-superimposable mirror image property directly influences the molecular interactions, serving as an essential factor in molecular property prediction. We explore the underlying topologies and patterns in molecular structures by applying Vietoris-Rips persistent homology across varying scales and parameters such as atomic weight, partial charge, bond type, and chirality. Our method's efficacy can be improved by incorporating additional parameters such as aromaticity, orbital hybridization, bond polarity, conjugated systems, as well as bond and torsion angles. Additionally, we leverage Stochastic Gradient Langevin Boosting in a Bayesian ensemble of GBDTs to obtain aleatoric and epistemic uncertainty estimates for gradient boosting models. With these uncertainty estimates, we prioritize high-uncertainty samples for active learning and model fine-tuning, benefiting scenarios where data labeling is costly or time consuming. Compared to conventional GNNs which usually suffer from oversmoothing and oversquashing, MPPH provides a more comprehensive and interpretable characterization of molecular data topology. We substantiate our approach with theoretical stability guarantees and demonstrate its superior performance over existing state-of-the-art methods in predicting molecular properties through extensive evaluations on the MoleculeNet benchmark datasets.

Title: Bayesian Online Learning for Consensus Prediction. (arXiv:2312.07679v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07679
Code URL: null
Copy Paste: [[2312.07679]] Bayesian Online Learning for Consensus Prediction(http://arxiv.org/abs/2312.07679)
Summary:
Given a pre-trained classifier and multiple human experts, we investigate the task of online classification where model predictions are provided for free but querying humans incurs a cost. In this practical but under-explored setting, oracle ground truth is not available. Instead, the prediction target is defined as the consensus vote of all experts. Given that querying full consensus can be costly, we propose a general framework for online Bayesian consensus estimation, leveraging properties of the multivariate hypergeometric distribution. Based on this framework, we propose a family of methods that dynamically estimate expert consensus from partial feedback by producing a posterior over expert and model beliefs. Analyzing this posterior induces an interpretable trade-off between querying cost and classification performance. We demonstrate the efficacy of our framework against a variety of baselines on CIFAR-10H and ImageNet-16H, two large-scale crowdsourced datasets.

Title: An Online, Adaptive and Unsupervised Regression Framework with Drift Detection for Label Scarcity Contexts. (arXiv:2312.07682v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07682
Code URL: https://github.com/redsofa/unsupervised-online-regression
Copy Paste: [[2312.07682]] An Online, Adaptive and Unsupervised Regression Framework with Drift Detection for Label Scarcity Contexts(http://arxiv.org/abs/2312.07682)
Summary:
In scenarios where obtaining real-time labels proves challenging, conventional approaches may result in sub-optimal performance. This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression. The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism to enable dynamic model adaptations in response to evolving patterns in the data. To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE). ADWIN facilitates real-time drift detection, while RMSE provides a robust measure of model prediction accuracy. This combination enables our multivariate method to effectively navigate the challenges of streaming data, continuously adapting to changing patterns while maintaining a high level of predictive precision. Finally, we evaluate the performance of our multivariate method across various public datasets, comparing it to non-adapting baselines. Through comprehensive assessments, we demonstrate the superior efficacy of our adaptive regression technique for tasks where obtaining labels in real-time is a significant challenge. The results underscore the method's capacity to outperform traditional approaches and highlight its potential in scenarios characterized by label scarcity and evolving data patterns.

Title: Levenshtein Distance Embedding with Poisson Regression for DNA Storage. (arXiv:2312.07931v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07931
Code URL: null
Copy Paste: [[2312.07931]] Levenshtein Distance Embedding with Poisson Regression for DNA Storage(http://arxiv.org/abs/2312.07931)
Summary:
Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural network-based sequence embedding technique using Poisson regression is proposed. We first provide a theoretical analysis of the impact of embedding dimension on model performance and present a criterion for selecting an appropriate embedding dimension. Under this embedding dimension, the Poisson regression is introduced by assuming the Levenshtein distance between sequences of fixed length following a Poisson distribution, which naturally aligns with the definition of Levenshtein distance. Moreover, from the perspective of the distribution of embedding distances, Poisson regression approximates the negative log likelihood of the chi-squared distribution and offers advancements in removing the skewness. Through comprehensive experiments on real DNA storage data, we demonstrate the superior performance of the proposed method compared to state-of-the-art approaches.

Title: Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation. (arXiv:2312.07981v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.07981
Code URL: null
Copy Paste: [[2312.07981]] Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation(http://arxiv.org/abs/2312.07981)
Summary:
Diffusion models have demonstrated robust data generation capabilities in various research fields. In this paper, a Time Series Diffusion Method (TSDM) is proposed for vibration signal generation, leveraging the foundational principles of diffusion models. The TSDM uses an improved U-net architecture with attention block to effectively segment and extract features from one-dimensional time series data. It operates based on forward diffusion and reverse denoising processes for time-series generation. Experimental validation is conducted using single-frequency, multi-frequency datasets, and bearing fault datasets. The results show that TSDM can accurately generate the single-frequency and multi-frequency features in the time series and retain the basic frequency features for the diffusion generation results of the bearing fault series. Finally, TSDM is applied to the small sample fault diagnosis of three public bearing fault datasets, and the results show that the accuracy of small sample fault diagnosis of the three datasets is improved by 32.380%, 18.355% and 9.298% at most, respectively

Title: An Incentive Mechanism for Federated Learning Based on Multiple Resource Exchange. (arXiv:2312.08096v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08096
Code URL: null
Copy Paste: [[2312.08096]] An Incentive Mechanism for Federated Learning Based on Multiple Resource Exchange(http://arxiv.org/abs/2312.08096)
Summary:
Federated Learning (FL) is a distributed machine learning paradigm that addresses privacy concerns in machine learning and still guarantees high test accuracy. However, achieving the necessary accuracy by having all clients participate in FL is impractical, given the constraints of client local computing resource. In this paper, we introduce a multi-user collaborative computing framework, categorizing users into two roles: model owners (MOs) and data owner (DOs). Without resorting to monetary incentives, an MO can encourage more DOs to join in FL by allowing the DOs to offload extra local computing tasks to the MO for execution. This exchange of "data" for "computing resources" streamlines the incentives for clients to engage more effectively in FL. We formulate the interaction between MO and DOs as an optimization problem, and the objective is to effectively utilize the communication and computing resource of the MO and DOs to minimize the time to complete an FL task. The proposed problem is a mixed integer nonlinear programming (MINLP) with high computational complexity. We first decompose it into two distinct subproblems, namely the client selection problem and the resource allocation problem to segregate the integer variables from the continuous variables. Then, an effective iterative algorithm is proposed to solve problem. Simulation results demonstrate that the proposed collaborative computing framework can achieve an accuracy of more than 95\% while minimizing the overall time to complete an FL task.

language model

Title: Hijacking Context in Large Multi-modal Models. (arXiv:2312.07553v1 [cs.AI])

Title: PaperQA: Retrieval-Augmented Generative Agent for Scientific Research. (arXiv:2312.07559v1 [cs.CL])

Title: Leveraging Large Language Models to Build and Execute Computational Workflows. (arXiv:2312.07711v1 [cs.AI])

Title: Large Human Language Models: A Need and the Challenges. (arXiv:2312.07751v1 [cs.CL])

Title: Large Language Model Enhanced Multi-Agent Systems for 6G Communications. (arXiv:2312.07850v1 [cs.AI])

Title: Causality Analysis for Evaluating the Security of Large Language Models. (arXiv:2312.07876v1 [cs.AI])

Title: PromptBench: A Unified Library for Evaluation of Large Language Models. (arXiv:2312.07910v1 [cs.AI])

Title: Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning. (arXiv:2312.08027v1 [cs.CL])

Title: High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models. (arXiv:2312.08274v1 [cs.CL])

Title: Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models. (arXiv:2312.08303v1 [cs.CL])

Title: An Invitation to Deep Reinforcement Learning. (arXiv:2312.08365v1 [cs.LG])

Title: Language Model Alignment with Elastic Reset. (arXiv:2312.07551v1 [cs.CL])

Title: Large Language Models for Intent-Driven Session Recommendations. (arXiv:2312.07552v1 [cs.CL])

Title: Mathematical Language Models: A Survey. (arXiv:2312.07622v1 [cs.CL])

Title: Native Language Identification with Large Language Models. (arXiv:2312.07819v1 [cs.CL])

Title: Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models. (arXiv:2312.07887v1 [cs.CL])

Title: A Survey of Text Watermarking in the Era of Large Language Models. (arXiv:2312.07913v1 [cs.CL])

Title: CBQ: Cross-Block Quantization for Large Language Models. (arXiv:2312.07950v1 [cs.LG])

Title: CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs. (arXiv:2312.08036v1 [cs.CL])

Title: Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data. (arXiv:2312.08299v1 [cs.CL])

Title: Distributed Inference and Fine-tuning of Large Language Models Over The Internet. (arXiv:2312.08361v1 [cs.LG])

gpt

Title: Evaluating ChatGPT as a Question Answering System: A Comprehensive Analysis and Comparison with Existing Models. (arXiv:2312.07592v1 [cs.CL])

llm

Title: Tell, don't show: Declarative facts influence how LLMs generalize. (arXiv:2312.07779v1 [cs.AI])

Title: Finetuning an LLM on Contextual Knowledge of Classics for Q&A. (arXiv:2312.07848v1 [cs.CL])

Title: Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI. (arXiv:2312.07886v1 [cs.AI])

Title: Prompting LLMs with content plans to enhance the summarization of scientific articles. (arXiv:2312.08282v1 [cs.CL])

Title: Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF. (arXiv:2312.08358v1 [cs.LG])

Title: Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization. (arXiv:2312.07763v1 [cs.CL])

long context

lora

Title: CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem. (arXiv:2312.08157v1 [cs.AI])

Title: Incremental hierarchical text clustering methods: a review. (arXiv:2312.07769v1 [cs.LG])

hallucination

prompt

Title: Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach. (arXiv:2312.07795v1 [cs.LG])

Title: A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis. (arXiv:2312.08084v1 [cs.AI])

Title: Extending Whisper with prompt tuning to target-speaker ASR. (arXiv:2312.08079v1 [cs.CL])

code

Title: Polynomial-based Self-Attention for Table Representation learning. (arXiv:2312.07753v1 [cs.AI])

Title: Spatial Knowledge-Infused Hierarchical Learning: An Application in Flood Mapping on Earth Imagery. (arXiv:2312.07767v1 [cs.AI])

Title: Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings concatenation?. (arXiv:2312.07797v1 [cs.CL])

Title: BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering. (arXiv:2312.07867v1 [cs.AI])

Title: Exploring the Impact of Lay User Feedback for Improving AI Fairness. (arXiv:2312.08064v1 [cs.AI])

Title: SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention. (arXiv:2312.07987v1 [cs.LG])

Title: Benchmarking Distribution Shift in Tabular Data with TableShift. (arXiv:2312.07577v1 [cs.LG])

Title: Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply. (arXiv:2312.07636v1 [cs.LG])

Title: I Open at the Close: A Deep Reinforcement Learning Evaluation of Open Streets Initiatives. (arXiv:2312.07680v1 [cs.LG])

Title: Hierarchical Classification of Financial Transactions Through Context-Fusion of Transformer-based Embeddings and Taxonomy-aware Attention Layer. (arXiv:2312.07730v1 [cs.LG])

Title: Combining propensity score methods with variational autoencoders for generating synthetic data in presence of latent sub-groups. (arXiv:2312.07781v1 [cs.LG])

Title: ClusterDDPM: An EM clustering framework with Denoising Diffusion Probabilistic Models. (arXiv:2312.08029v1 [cs.LG])

Title: Explainable Trajectory Representation through Dictionary Learning. (arXiv:2312.08052v1 [cs.LG])

Title: SVInvNet: A Densely Connected Encoder-Decoder Architecture for Seismic Velocity Inversion. (arXiv:2312.08194v1 [cs.LG])

chat

retrieval augmented generation

rag

Title: ConvD: Attention Enhanced Dynamic Convolutional Embeddings for Knowledge Graph Completion. (arXiv:2312.07589v1 [cs.CL])

Title: GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time. (arXiv:2312.08224v1 [cs.AI])

Title: On the verification of Embeddings using Hybrid Markov Logic. (arXiv:2312.08287v1 [cs.LG])

Title: Contrastive News and Social Media Linking using BERT for Articles and Tweets across Dual Platforms. (arXiv:2312.07599v1 [cs.CL])

Title: FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems. (arXiv:2312.07743v1 [cs.LG])

Title: A Deep Learning-Based System for Automatic Case Summarization. (arXiv:2312.07824v1 [cs.CL])

Title: Towards Optimal Statistical Watermarking. (arXiv:2312.07930v1 [cs.LG])

Title: SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction. (arXiv:2312.07633v1 [cs.LG])

Title: Bayesian Online Learning for Consensus Prediction. (arXiv:2312.07679v1 [cs.LG])

Title: An Online, Adaptive and Unsupervised Regression Framework with Drift Detection for Label Scarcity Contexts. (arXiv:2312.07682v1 [cs.LG])

Title: Levenshtein Distance Embedding with Poisson Regression for DNA Storage. (arXiv:2312.07931v1 [cs.LG])

Title: Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation. (arXiv:2312.07981v1 [cs.LG])

Title: An Incentive Mechanism for Federated Learning Based on Multiple Resource Exchange. (arXiv:2312.08096v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought