language model

Title: Towards a Psychological Generalist AI: A Survey of Current Applications of Large Language Models and Future Prospects. (arXiv:2312.04578v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04578
Code URL: null
Copy Paste: [[2312.04578]] Towards a Psychological Generalist AI: A Survey of Current Applications of Large Language Models and Future Prospects(http://arxiv.org/abs/2312.04578)
Summary:
The complexity of psychological principles underscore a significant societal challenge, given the vast social implications of psychological problems. Bridging the gap between understanding these principles and their actual clinical and real-world applications demands rigorous exploration and adept implementation. In recent times, the swift advancement of highly adaptive and reusable artificial intelligence (AI) models has emerged as a promising way to unlock unprecedented capabilities in the realm of psychology. This paper emphasizes the importance of performance validation for these large-scale AI models, emphasizing the need to offer a comprehensive assessment of their verification from diverse perspectives. Moreover, we review the cutting-edge advancements and practical implementations of these expansive models in psychology, highlighting pivotal work spanning areas such as social media analytics, clinical nursing insights, vigilant community monitoring, and the nuanced exploration of psychological theories. Based on our review, we project an acceleration in the progress of psychological fields, driven by these large-scale AI models. These future generalist AI models harbor the potential to substantially curtail labor costs and alleviate social stress. However, this forward momentum will not be without its set of challenges, especially when considering the paradigm changes and upgrades required for medical instrumentation and related applications.

Title: TOD-Flow: Modeling the Structure of Task-Oriented Dialogues. (arXiv:2312.04668v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04668
Code URL: https://github.com/srsohn/tod-flow
Copy Paste: [[2312.04668]] TOD-Flow: Modeling the Structure of Task-Oriented Dialogues(http://arxiv.org/abs/2312.04668)
Summary:
Task-Oriented Dialogue (TOD) systems have become crucial components in interactive artificial intelligence applications. While recent advances have capitalized on pre-trained language models (PLMs), they exhibit limitations regarding transparency and controllability. To address these challenges, we propose a novel approach focusing on inferring the TOD-Flow graph from dialogue data annotated with dialog acts, uncovering the underlying task structure in the form of a graph. The inferred TOD-Flow graph can be easily integrated with any dialogue model to improve its prediction performance, transparency, and controllability. Our TOD-Flow graph learns what a model can, should, and should not predict, effectively reducing the search space and providing a rationale for the model's prediction. We show that the proposed TOD-Flow graph better resembles human-annotated graphs compared to prior approaches. Furthermore, when combined with several dialogue policies and end-to-end dialogue models, we demonstrate that our approach significantly improves dialog act classification and end-to-end response generation performance in the MultiWOZ and SGD benchmarks. Code available at: https://github.com/srsohn/TOD-Flow

Title: Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models. (arXiv:2312.04691v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04691
Code URL: null
Copy Paste: [[2312.04691]] Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models(http://arxiv.org/abs/2312.04691)
Summary:
Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Neural machine translation (NMT) is one such task that LLMs have been applied to with great success. However, little research has focused on applying LLMs to the more difficult subset of NMT called simultaneous translation (SimulMT), where translation begins before the entire source context is available to the model. In this paper, we address key challenges facing LLMs fine-tuned for SimulMT, validate classical SimulMT concepts and practices in the context of LLMs, explore adapting LLMs that are fine-tuned for NMT to the task of SimulMT, and introduce Simul-LLM, the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.

Title: Efficient Large Language Models Fine-Tuning On Graphs. (arXiv:2312.04737v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04737
Code URL: null
Copy Paste: [[2312.04737]] Efficient Large Language Models Fine-Tuning On Graphs(http://arxiv.org/abs/2312.04737)
Summary:
Learning from Text-Attributed Graphs (TAGs) has attracted significant attention due to its wide range of real-world applications. The rapid evolution of large language models (LLMs) has revolutionized the way we process textual data, which indicates a strong potential to replace shallow text embedding generally used in Graph Neural Networks (GNNs). However, we find that existing LLM approaches that exploit text information in graphs suffer from inferior computation and data efficiency. In this work, we introduce a novel and efficient approach for the end-to-end fine-tuning of Large Language Models (LLMs) on TAGs, named LEADING. The proposed approach maintains computation cost and memory overhead comparable to the graph-less fine-tuning of LLMs. Moreover, it transfers the rick knowledge in LLMs to downstream graph learning tasks effectively with limited labeled data in semi-supervised learning. Its superior computation and data efficiency are demonstrated through comprehensive experiments, offering a promising solution for a wide range of LLMs and graph learning tasks on TAGs.

Title: HuRef: HUman-REadable Fingerprint for Large Language Models. (arXiv:2312.04828v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04828
Code URL: null
Copy Paste: [[2312.04828]] HuRef: HUman-REadable Fingerprint for Large Language Models(http://arxiv.org/abs/2312.04828)
Summary:
Protecting the copyright of large language models (LLMs) has become crucial due to their resource-intensive training and accompanying carefully designed licenses. However, identifying the original base model of an LLM is challenging due to potential parameter alterations through fine-tuning or continued pretraining. In this study, we introduce HuRef, a human-readable fingerprint for LLMs that uniquely identifies the base model without exposing model parameters or interfering with training. We first observe that the vector direction of LLM parameters remains stable after the model has converged during pretraining, showing negligible perturbations through subsequent training steps, including continued pretraining, supervised fine-tuning (SFT), and RLHF, which makes it a sufficient condition to identify the base model. The necessity is validated by continuing to train an LLM with an extra term to drive away the model parameters' direction and the model becomes damaged. However, this direction is vulnerable to simple attacks like dimension permutation or matrix rotation, which significantly change it without affecting performance. To address this, leveraging the Transformer structure, we systematically analyze potential attacks and define three invariant terms that identify an LLM's base model. We make these invariant terms human-readable by mapping them to a Gaussian vector using a convolutional encoder and then converting it into a natural image with StyleGAN2. Our method generates a dog image as an identity fingerprint for an LLM, where the dog's appearance strongly indicates the LLM's base model. Experimental results across various LLMs demonstrate the effectiveness of our method, the generated dog image remains invariant to different training steps, including SFT, RLHF, or even continued pretraining with augmented vocabulary in a new language.

Title: Localized Symbolic Knowledge Distillation for Visual Commonsense Models. (arXiv:2312.04837v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04837
Code URL: null
Copy Paste: [[2312.04837]] Localized Symbolic Knowledge Distillation for Visual Commonsense Models(http://arxiv.org/abs/2312.04837)
Summary:
Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applications that require precise within-image reasoning. We build Localized Visual Commonsense models, which allow users to specify (multiple) regions as input. We train our model by sampling localized commonsense knowledge from a large language model (LLM): specifically, we prompt an LLM to collect commonsense knowledge given a global literal image description and a local literal region description automatically generated by a set of VL models. With a separately trained critic model that selects high-quality examples, we find that training on the localized commonsense corpus can successfully distill existing VL models to support a reference-as-input interface. Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.

Title: KwaiAgents: Generalized Information-seeking Agent System with Large Language Models. (arXiv:2312.04889v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04889
Code URL: https://github.com/kwaikeg/kwaiagents
Copy Paste: [[2312.04889]] KwaiAgents: Generalized Information-seeking Agent System with Large Language Models(http://arxiv.org/abs/2312.04889)
Summary:
Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this inquisitiveness. Despite not having the capacity to process and memorize vast amounts of information in their brains, humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the world, enabling them to find answers efficiently. The recent advancements in large language models (LLMs) suggest that machines might also possess the aforementioned human-like capabilities, allowing them to exhibit powerful abilities even with a constrained parameter count. In this paper, we introduce KwaiAgents, a generalized information-seeking agent system based on LLMs. Within KwaiAgents, we propose an agent system that employs LLMs as its cognitive core, which is capable of understanding a user's query, behavior guidelines, and referencing external documents. The agent can also update and retrieve information from its internal memory, plan and execute actions using a time-aware search-browse toolkit, and ultimately provide a comprehensive response. We further investigate the system's performance when powered by LLMs less advanced than GPT-4, and introduce the Meta-Agent Tuning (MAT) framework, designed to ensure even an open-sourced 7B or 13B model performs well among many agent systems. We exploit both benchmark and human evaluations to systematically validate these capabilities. Extensive experiments show the superiority of our agent system compared to other autonomous agents and highlight the enhanced generalized agent-abilities of our fine-tuned LLMs.

Title: EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism. (arXiv:2312.04916v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04916
Code URL: null
Copy Paste: [[2312.04916]] EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism(http://arxiv.org/abs/2312.04916)
Summary:
We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). While recent works have shown preliminary evidence for the efficacy of early exiting in accelerating LLM inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs by supporting their training and inference with massive 3D parallelism. Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting, including a lightweight method that facilitates backpropagation for the early-exit training objective with pipeline parallelism, techniques of leveraging idle resources in the original pipeline schedule for computation related to early-exit layers, and two approaches of early-exit inference that are compatible with KV caching for autoregressive generation. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead compared to standard LLM training, as well as outstanding inference speedup without compromising output quality. To facilitate further research and adoption, we release EE-LLM at https://github.com/pan-x-c/EE-LLM.

Title: The ICL Consistency Test. (arXiv:2312.04945v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04945
Code URL: null
Copy Paste: [[2312.04945]] The ICL Consistency Test(http://arxiv.org/abs/2312.04945)
Summary:
Just like the previous generation of task-tuned models, large language models (LLMs) that are adapted to tasks via prompt-based methods like in-context-learning (ICL) perform well in some setups but not in others. This lack of consistency in prompt-based learning hints at a lack of robust generalisation. We here introduce the ICL consistency test -- a contribution to the GenBench collaborative benchmark task (CBT) -- which evaluates how consistent a model makes predictions across many different setups while using the same data. The test is based on different established natural language inference tasks. We provide preprocessed data constituting 96 different 'setups' and a metric that estimates model consistency across these setups. The metric is provided on a fine-grained level to understand what properties of a setup render predictions unstable and on an aggregated level to compare overall model consistency. We conduct an empirical analysis of eight state-of-the-art models, and our consistency metric reveals how all tested LLMs lack robust generalisation.

Title: Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning. (arXiv:2312.05230v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.05230
Code URL: null
Copy Paste: [[2312.05230]] Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning(http://arxiv.org/abs/2312.05230)
Summary:
Despite their tremendous success in many applications, large language models often fall short of consistent reasoning and planning in various (language, embodied, and social) scenarios, due to inherent limitations in their inference, learning, and modeling capabilities. In this position paper, we present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Agent models, and World models, for more robust and versatile reasoning capabilities. In particular, we propose that world and agent models are a better abstraction of reasoning, that introduces the crucial elements of deliberate human-like reasoning, including beliefs about the world and other agents, anticipation of consequences, goals/rewards, and strategic planning. Crucially, language models in LAW serve as a backend to implement the system or its elements and hence provide the computational power and adaptability. We review the recent studies that have made relevant progress and discuss future research directions towards operationalizing the LAW framework.

Title: PyThaiNLP: Thai Natural Language Processing in Python. (arXiv:2312.04649v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04649
Code URL: https://github.com/PyThaiNLP/pythainlp
Copy Paste: [[2312.04649]] PyThaiNLP: Thai Natural Language Processing in Python(http://arxiv.org/abs/2312.04649)
Summary:
We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.

Title: How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey. (arXiv:2312.04775v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04775
Code URL: https://github.com/ba1jun/model-selection-nlp
Copy Paste: [[2312.04775]] How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey(http://arxiv.org/abs/2312.04775)
Summary:
Transferability estimation has been attached to great attention in the computer vision fields. Researchers try to estimate with low computational cost the performance of a model when transferred from a source task to a given target task. Considering the effectiveness of such estimations, the communities of natural language processing also began to study similar problems for the selection of pre-trained language models. However, there is a lack of a comprehensive comparison between these estimation methods yet. Also, the differences between vision and language scenarios make it doubtful whether previous conclusions can be established across fields. In this paper, we first conduct a thorough survey of existing transferability estimation methods being able to find the most suitable model, then we conduct a detailed empirical study for the surveyed methods based on the GLUE benchmark. From qualitative and quantitative analyses, we demonstrate the strengths and weaknesses of existing methods and show that H-Score generally performs well with superiorities in effectiveness and efficiency. We also outline the difficulties of consideration of training details, applicability to text generation, and consistency to certain metrics which shed light on future directions.

Title: Ophtha-LLaMA2: A Large Language Model for Ophthalmology. (arXiv:2312.04906v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04906
Code URL: null
Copy Paste: [[2312.04906]] Ophtha-LLaMA2: A Large Language Model for Ophthalmology(http://arxiv.org/abs/2312.04906)
Summary:
In recent years, pre-trained large language models (LLMs) have achieved tremendous success in the field of Natural Language Processing (NLP). Prior studies have primarily focused on general and generic domains, with relatively less research on specialized LLMs in the medical field. The specialization and high accuracy requirements for diagnosis in the medical field, as well as the challenges in collecting large-scale data, have constrained the application and development of LLMs in medical scenarios. In the field of ophthalmology, clinical diagnosis mainly relies on doctors' interpretation of reports and making diagnostic decisions. In order to take advantage of LLMs to provide decision support for doctors, we collected three modalities of ophthalmic report data and fine-tuned the LLaMA2 model, successfully constructing an LLM termed the "Ophtha-LLaMA2" specifically tailored for ophthalmic disease diagnosis. Inference test results show that even with a smaller fine-tuning dataset, Ophtha-LLaMA2 performs significantly better in ophthalmic diagnosis compared to other LLMs. It demonstrates that the Ophtha-LLaMA2 exhibits satisfying accuracy and efficiency in ophthalmic disease diagnosis, making it a valuable tool for ophthalmologists to provide improved diagnostic support for patients. This research provides a useful reference for the application of LLMs in the field of ophthalmology, while showcasing the immense potential and prospects in this domain.

Title: Zoology: Measuring and Improving Recall in Efficient Language Models. (arXiv:2312.04927v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04927
Code URL: https://github.com/hazyresearch/zoology
Copy Paste: [[2312.04927]] Zoology: Measuring and Improving Recall in Efficient Language Models(http://arxiv.org/abs/2312.04927)
Summary:
Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-convolution" language models, finding that SoTA gated-convolution architectures still underperform attention by up to 2.1 perplexity points on the Pile. In fine-grained analysis, we find 82% of the gap is explained by each model's ability to recall information that is previously mentioned in-context, e.g. "Hakuna Matata means no worries Hakuna Matata it means no" $\rightarrow$ "??". On this task, termed "associative recall", we find that attention outperforms gated-convolutions by a large margin: a 70M parameter attention model outperforms a 1.4 billion parameter gated-convolution model on associative recall. This is surprising because prior work shows gated convolutions can perfectly solve synthetic tests for AR capability. To close the gap between synthetics and real language, we develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language. We perform an empirical and theoretical study of MQAR that elucidates differences in the parameter-efficiency of attention and gated-convolution recall. Informed by our analysis, we evaluate simple convolution-attention hybrids and show that hybrids with input-dependent sparse attention patterns can close 97.4% of the gap to attention, while maintaining sub-quadratic scaling. Our code is accessible at: https://github.com/HazyResearch/zoology.

Title: PathFinder: Guided Search over Multi-Step Reasoning Paths. (arXiv:2312.05180v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.05180
Code URL: null
Copy Paste: [[2312.05180]] PathFinder: Guided Search over Multi-Step Reasoning Paths(http://arxiv.org/abs/2312.05180)
Summary:
With recent advancements in large language models, methods like chain-of-thought prompting to elicit reasoning chains have been shown to improve results on reasoning tasks. However, tasks that require multiple steps of reasoning still pose significant challenges to state-of-the-art models. Drawing inspiration from the beam search algorithm, we propose PathFinder, a tree-search-based reasoning path generation approach. It enhances diverse branching and multi-hop reasoning through the integration of dynamic decoding, enabled by varying sampling methods and parameters. Using constrained reasoning, PathFinder integrates novel quality constraints, pruning, and exploration methods to enhance the efficiency and the quality of generation. Moreover, it includes scoring and ranking features to improve candidate selection. Our approach outperforms competitive baselines on three complex arithmetic and commonsense reasoning tasks by 6% on average. Our model generalizes well to longer, unseen reasoning chains, reflecting similar complexities to beam search with large branching factors.

gpt

Title: From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis. (arXiv:2312.04720v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04720
Code URL: https://github.com/clarin-pl/text-augumentation-with-chatgpt
Copy Paste: [[2312.04720]] From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis(http://arxiv.org/abs/2312.04720)
Summary:
In the era of artificial intelligence, data is gold but costly to annotate. The paper demonstrates a groundbreaking solution to this dilemma using ChatGPT for text augmentation in sentiment analysis. We leverage ChatGPT's generative capabilities to create synthetic training data that significantly improves the performance of smaller models, making them competitive with, or even outperforming, their larger counterparts. This innovation enables models to be both efficient and effective, thereby reducing computational cost, inference time, and memory usage without compromising on quality. Our work marks a key advancement in the cost-effective development and deployment of robust sentiment analysis models.

Title: On Sarcasm Detection with OpenAI GPT-based Models. (arXiv:2312.04642v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04642
Code URL: null
Copy Paste: [[2312.04642]] On Sarcasm Detection with OpenAI GPT-based Models(http://arxiv.org/abs/2312.04642)
Summary:
Sarcasm is a form of irony that requires readers or listeners to interpret its intended meaning by considering context and social cues. Machine learning classification models have long had difficulty detecting sarcasm due to its social complexity and contradictory nature.

This paper explores the applications of the Generative Pretrained Transformer (GPT) models, including GPT-3, InstructGPT, GPT-3.5, and GPT-4, in detecting sarcasm in natural language. It tests fine-tuned and zero-shot models of different sizes and releases.

The GPT models were tested on the political and balanced (pol-bal) portion of the popular Self-Annotated Reddit Corpus (SARC 2.0) sarcasm dataset. In the fine-tuning case, the largest fine-tuned GPT-3 model achieves accuracy and $F_1$-score of 0.81, outperforming prior models. In the zero-shot case, one of GPT-4 models yields an accuracy of 0.70 and $F_1$-score of 0.75. Other models score lower. Additionally, a model's performance may improve or deteriorate with each release, highlighting the need to reassess performance after each release.

Title: Seeing ChatGPT Through Universities' Policies, Resources and Guidelines. (arXiv:2312.05235v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.05235
Code URL: null
Copy Paste: [[2312.05235]] Seeing ChatGPT Through Universities' Policies, Resources and Guidelines(http://arxiv.org/abs/2312.05235)
Summary:
The advancements in Artificial Intelligence (AI) technologies such as ChatGPT have gained popularity in recent days. The integration of ChatGPT in educational contexts has already created attractions due to a wide range of applications. However, the automatic generation of human-like texts also poses potential risks to academic integrity, especially when faced with writing-intensive language courses. Considering the ongoing debates, this study aims to investigate the academic policies and guidelines established by US universities regarding the use of ChatGPT in teaching and learning. The data sources include academic policies, statements, guidelines as well as relevant resources that were provided by the top 50 universities in the United States, according to U.S. News. Thematic analysis and qualitative analysis were employed in the analysis and showed that most top 50 universities were open but cautious towards the integration of generative AI in teaching and learning and also expressed their concerns on ethical usage, accuracy, and data privacy. Most universities also provided a variety of resources and guidelines, including syllabus templates/samples, workshops and discussions, shared articles, and one-on-one consultations, with focuses on general technical introduction, ethical concerns, pedagogical applications, preventive strategies, data privacy, limitations, and detective tools. The findings will inform future policy-making regarding the integration of ChatGPT in college-level education and influence the provision of supportive resources by universities for the appropriate application of ChatGPT in education.

llm

Title: SparQ Attention: Bandwidth-Efficient LLM Inference. (arXiv:2312.04985v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04985
Code URL: null
Copy Paste: [[2312.04985]] SparQ Attention: Bandwidth-Efficient LLM Inference(http://arxiv.org/abs/2312.04985)
Summary:
Generative large language models (LLMs) have opened up numerous novel possibilities, but due to their significant computational requirements their ubiquitous use remains challenging. Some of the most useful applications require processing large numbers of samples at a time and using long contexts, both significantly increasing the memory communication load of the models. We introduce SparQ Attention, a technique for increasing the inference throughput of LLMs by reducing the memory bandwidth requirements within the attention blocks through selective fetching of the cached history. Our proposed technique can be applied directly to off-the-shelf LLMs during inference, without requiring any modification to the pre-training setup or additional fine-tuning. We show how SparQ Attention can decrease the attention memory bandwidth requirements up to eight times without any loss in accuracy by evaluating Llama 2 and Pythia models on a wide range of downstream tasks.

long context

lora

hallucination

Title: HALO: An Ontology for Representing Hallucinations in Generative Models. (arXiv:2312.05209v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.05209
Code URL: null
Copy Paste: [[2312.05209]] HALO: An Ontology for Representing Hallucinations in Generative Models(http://arxiv.org/abs/2312.05209)
Summary:
Recent progress in generative AI, including large language models (LLMs) like ChatGPT, has opened up significant opportunities in fields ranging from natural language processing to knowledge discovery and data mining. However, there is also a growing awareness that the models can be prone to problems such as making information up or `hallucinations', and faulty reasoning on seemingly simple problems. Because of the popularity of models like ChatGPT, both academic scholars and citizen scientists have documented hallucinations of several different types and severity. Despite this body of work, a formal model for describing and representing these hallucinations (with relevant meta-data) at a fine-grained level, is still lacking. In this paper, we address this gap by presenting the Hallucination Ontology or HALO, a formal, extensible ontology written in OWL that currently offers support for six different types of hallucinations known to arise in LLMs, along with support for provenance and experimental metadata. We also collect and publish a dataset containing hallucinations that we inductively gathered across multiple independent Web sources, and show that HALO can be successfully used to model this dataset and answer competency questions.

Title: DelucionQA: Detecting Hallucinations in Domain-specific Question Answering. (arXiv:2312.05200v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.05200
Code URL: https://github.com/boschresearch/delucionqa
Copy Paste: [[2312.05200]] DelucionQA: Detecting Hallucinations in Domain-specific Question Answering(http://arxiv.org/abs/2312.05200)
Summary:
Hallucination is a well-known phenomenon in text generated by large language models (LLMs). The existence of hallucinatory responses is found in almost all application scenarios e.g., summarization, question-answering (QA) etc. For applications requiring high reliability (e.g., customer-facing assistants), the potential existence of hallucination in LLM-generated text is a critical problem. The amount of hallucination can be reduced by leveraging information retrieval to provide relevant background information to the LLM. However, LLMs can still generate hallucinatory content for various reasons (e.g., prioritizing its parametric knowledge over the context, failure to capture the relevant information from the context, etc.). Detecting hallucinations through automated methods is thus paramount. To facilitate research in this direction, we introduce a sophisticated dataset, DelucionQA, that captures hallucinations made by retrieval-augmented LLMs for a domain-specific QA task. Furthermore, we propose a set of hallucination detection methods to serve as baselines for future works from the research community. Analysis and case study are also provided to share valuable insights on hallucination phenomena in the target scenario.

prompt

Title: Improving Neural Machine Translation by Multi-Knowledge Integration with Prompting. (arXiv:2312.04807v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04807
Code URL: null
Copy Paste: [[2312.04807]] Improving Neural Machine Translation by Multi-Knowledge Integration with Prompting(http://arxiv.org/abs/2312.04807)
Summary:
Improving neural machine translation (NMT) systems with prompting has achieved significant progress in recent years. In this work, we focus on how to integrate multi-knowledge, multiple types of knowledge, into NMT models to enhance the performance with prompting. We propose a unified framework, which can integrate effectively multiple types of knowledge including sentences, terminologies/phrases and translation templates into NMT models. We utilize multiple types of knowledge as prefix-prompts of input for the encoder and decoder of NMT models to guide the translation process. The approach requires no changes to the model architecture and effectively adapts to domain-specific translation without retraining. The experiments on English-Chinese and English-German translation demonstrate that our approach significantly outperform strong baselines, achieving high translation quality and terminology match accuracy.

Title: Boosting Prompt-Based Self-Training With Mapping-Free Automatic Verbalizer for Multi-Class Classification. (arXiv:2312.04982v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04982
Code URL: https://github.com/yookyungkho/mav
Copy Paste: [[2312.04982]] Boosting Prompt-Based Self-Training With Mapping-Free Automatic Verbalizer for Multi-Class Classification(http://arxiv.org/abs/2312.04982)
Summary:
Recently, prompt-based fine-tuning has garnered considerable interest as a core technique for few-shot text classification task. This approach reformulates the fine-tuning objective to align with the Masked Language Modeling (MLM) objective. Leveraging unlabeled data, prompt-based self-training has shown greater effectiveness in binary and three-class classification. However, prompt-based self-training for multi-class classification has not been adequately investigated, despite its significant applicability to real-world scenarios. Moreover, extending current methods to multi-class classification suffers from the verbalizer that extracts the predicted value of manually pre-defined single label word for each class from MLM predictions. Consequently, we introduce a novel, efficient verbalizer structure, named Mapping-free Automatic Verbalizer (MAV). Comprising two fully connected layers, MAV serves as a trainable verbalizer that automatically extracts the requisite word features for classification by capitalizing on all available information from MLM predictions. Experimental results on five multi-class classification datasets indicate MAV's superior self-training efficacy.

code

Title: FREDSum: A Dialogue Summarization Corpus for French Political Debates. (arXiv:2312.04843v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04843
Code URL: https://github.com/linto-ai/fredsum
Copy Paste: [[2312.04843]] FREDSum: A Dialogue Summarization Corpus for French Political Debates(http://arxiv.org/abs/2312.04843)
Summary:
Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved the performance of abstractive summarization systems. The majority of research has focused on written documents, however, neglecting the problem of multi-party dialogue summarization. In this paper, we present a dataset of French political debates for the purpose of enhancing resources for multi-lingual dialogue summarization. Our dataset consists of manually transcribed and annotated political debates, covering a range of topics and perspectives. We highlight the importance of high quality transcription and annotations for training accurate and effective dialogue summarization models, and emphasize the need for multilingual resources to support dialogue summarization in non-English languages. We also provide baseline experiments using state-of-the-art methods, and encourage further research in this area to advance the field of dialogue summarization. Our dataset will be made publicly available for use by the research community.

Title: DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary Intelligence. (arXiv:2312.05171v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.05171
Code URL: null
Copy Paste: [[2312.05171]] DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary Intelligence(http://arxiv.org/abs/2312.05171)
Summary:
We present DARLEI, a framework that combines evolutionary algorithms with parallelized reinforcement learning for efficiently training and evolving populations of UNIMAL agents. Our approach utilizes Proximal Policy Optimization (PPO) for individual agent learning and pairs it with a tournament selection-based generational learning mechanism to foster morphological evolution. By building on Nvidia's Isaac Gym, DARLEI leverages GPU accelerated simulation to achieve over 20x speedup using just a single workstation, compared to previous work which required large distributed CPU clusters. We systematically characterize DARLEI's performance under various conditions, revealing factors impacting diversity of evolved morphologies. For example, by enabling inter-agent collisions within the simulator, we find that we can simulate some multi-agent interactions between the same morphology, and see how it influences individual agent capabilities and long-term evolutionary adaptation. While current results demonstrate limited diversity across generations, we hope to extend DARLEI in future work to include interactions between diverse morphologies in richer environments, and create a platform that allows for coevolving populations and investigating emergent behaviours in them. Our source code is also made publicly at https://saeejithnair.github.io/darlei.

Title: TaskMet: Task-Driven Metric Learning for Model Learning. (arXiv:2312.05250v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.05250
Code URL: null
Copy Paste: [[2312.05250]] TaskMet: Task-Driven Metric Learning for Model Learning(http://arxiv.org/abs/2312.05250)
Summary:
Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because seemingly small prediction errors may incur drastic task errors. The standard end-to-end learning approach is to make the task loss differentiable or to introduce a differentiable surrogate that the model can be trained on. In these settings, the task loss needs to be carefully balanced with the prediction loss because they may have conflicting objectives. We propose take the task loss signal one level deeper than the parameters of the model and use it to learn the parameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states. The source code to reproduce our experiments is available at https://github.com/facebookresearch/taskmet

Title: Converting Epics/Stories into Pseudocode using Transformers. (arXiv:2312.05047v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.05047
Code URL: null
Copy Paste: [[2312.05047]] Converting Epics/Stories into Pseudocode using Transformers(http://arxiv.org/abs/2312.05047)
Summary:
The conversion of user epics or stories into their appropriate representation in pseudocode or code is a time-consuming task, which can take up a large portion of the time in an industrial project. With this research paper, we aim to present a methodology to generate pseudocode from a given agile user story of small functionalities so as to reduce the overall time spent on the industrial project. Pseudocode is a programming language agnostic representation of the steps involved in a computer program, which can be easily converted into any programming language. Leveraging the potential of Natural Language Processing, we want to simplify the development process in organizations that use the Agile Model of Software Development. We present a methodology to convert a problem described in the English language into pseudocode. This methodology divides the Text to Pseudocode conversion task into two stages or subtasks, each of which is treated like an individual machine translation task. Stage 1 is Text to Code Conversion and Stage 2 is Code to Pseudocode Conversion. We find that the CodeT5 model gives the best results in terms of BLEU score when trained separately on the two subtasks mentioned above. BLEU score is a metric that is used to measure the similarity between a machine-translated text and a set of reference translations.

Title: Transferable Candidate Proposal with Bounded Uncertainty. (arXiv:2312.04604v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04604
Code URL: https://github.com/gokyeongryeol/tbu
Copy Paste: [[2312.04604]] Transferable Candidate Proposal with Bounded Uncertainty(http://arxiv.org/abs/2312.04604)
Summary:
From an empirical perspective, the subset chosen through active learning cannot guarantee an advantage over random sampling when transferred to another model. While it underscores the significance of verifying transferability, experimental design from previous works often neglected that the informativeness of a data subset can change over model configurations. To tackle this issue, we introduce a new experimental design, coined as Candidate Proposal, to find transferable data candidates from which active learning algorithms choose the informative subset. Correspondingly, a data selection algorithm is proposed, namely Transferable candidate proposal with Bounded Uncertainty (TBU), which constrains the pool of transferable data candidates by filtering out the presumably redundant data points based on uncertainty estimation. We verified the validity of TBU in image classification benchmarks, including CIFAR-10/100 and SVHN. When transferred to different model configurations, TBU consistency improves performance in existing active learning algorithms. Our code is available at https://github.com/gokyeongryeol/TBU.

Title: StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning. (arXiv:2312.04865v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04865
Code URL: null
Copy Paste: [[2312.04865]] StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning(http://arxiv.org/abs/2312.04865)
Summary:
Graph contrastive learning (GCL) has become a powerful tool for learning graph data, but its scalability remains a significant challenge. In this work, we propose a simple yet effective training framework called Structural Compression (StructComp) to address this issue. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss. We theoretically prove that the original GCL loss can be approximated with the contrastive loss computed by StructComp. Moreover, StructComp can be regarded as an additional regularization term for GCL models, resulting in a more robust encoder. Empirical studies on seven benchmark datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.

chat

retrieval augmented generation

rag

Title: Federated Learning for 6G: Paradigms, Taxonomy, Recent Advances and Insights. (arXiv:2312.04688v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04688
Code URL: null
Copy Paste: [[2312.04688]] Federated Learning for 6G: Paradigms, Taxonomy, Recent Advances and Insights(http://arxiv.org/abs/2312.04688)
Summary:
Artificial Intelligence (AI) is expected to play an instrumental role in the next generation of wireless systems, such as sixth-generation (6G) mobile network. However, massive data, energy consumption, training complexity, and sensitive data protection in wireless systems are all crucial challenges that must be addressed for training AI models and gathering intelligence and knowledge from distributed devices. Federated Learning (FL) is a recent framework that has emerged as a promising approach for multiple learning agents to build an accurate and robust machine learning models without sharing raw data. By allowing mobile handsets and devices to collaboratively learn a global model without explicit sharing of training data, FL exhibits high privacy and efficient spectrum utilization. While there are a lot of survey papers exploring FL paradigms and usability in 6G privacy, none of them has clearly addressed how FL can be used to improve the protocol stack and wireless operations. The main goal of this survey is to provide a comprehensive overview on FL usability to enhance mobile services and enable smart ecosystems to support novel use-cases. This paper examines the added-value of implementing FL throughout all levels of the protocol stack. Furthermore, it presents important FL applications, addresses hot topics, provides valuable insights and explicits guidance for future research and developments. Our concluding remarks aim to leverage the synergy between FL and future 6G, while highlighting FL's potential to revolutionize wireless industry and sustain the development of cutting-edge mobile services.

Title: Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning. (arXiv:2312.04736v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04736
Code URL: https://github.com/uoe-agents/feedback-dt
Copy Paste: [[2312.04736]] Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning(http://arxiv.org/abs/2312.04736)
Summary:
Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning. One possible way to help bridge this gap be to provide RL agents with richer, more human-like feedback expressed in natural language. To investigate this idea, we first extend BabyAI to automatically generate language feedback from the environment dynamics and goal condition success. Then, we modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents' generalisation performance, and that agents can benefit from feedback even when this is only available during training, but not at inference.

Title: Train 'n Trade: Foundations of Parameter Markets. (arXiv:2312.04740v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04740
Code URL: null
Copy Paste: [[2312.04740]] Train 'n Trade: Foundations of Parameter Markets(http://arxiv.org/abs/2312.04740)
Summary:
Organizations typically train large models individually. This is costly and time-consuming, particularly for large-scale foundation models. Such vertical production is known to be suboptimal. Inspired by this economic insight, we ask whether it is possible to leverage others' expertise by trading the constituent parts in models, i.e., sets of weights, as if they were market commodities. While recent advances in aligning and interpolating models suggest that doing so may be possible, a number of fundamental questions must be answered to create viable parameter markets. In this work, we address these basic questions, propose a framework containing the infrastructure necessary for market operations to take place, study strategies for exchanging parameters, and offer means for agents to monetize parameters. Excitingly, compared to agents who train siloed models from scratch, we show that it is possible to mutually gain by using the market, even in competitive settings. This suggests that the notion of parameter markets may be a useful paradigm for improving large-scale model training in the future.

Title: The Graph Lottery Ticket Hypothesis: Finding Sparse, Informative Graph Structure. (arXiv:2312.04762v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04762
Code URL: null
Copy Paste: [[2312.04762]] The Graph Lottery Ticket Hypothesis: Finding Sparse, Informative Graph Structure(http://arxiv.org/abs/2312.04762)
Summary:
Graph learning methods help utilize implicit relationships among data items, thereby reducing training label requirements and improving task performance. However, determining the optimal graph structure for a particular learning task remains a challenging research problem.

In this work, we introduce the Graph Lottery Ticket (GLT) Hypothesis - that there is an extremely sparse backbone for every graph, and that graph learning algorithms attain comparable performance when trained on that subgraph as on the full graph. We identify and systematically study 8 key metrics of interest that directly influence the performance of graph learning algorithms. Subsequently, we define the notion of a "winning ticket" for graph structure - an extremely sparse subset of edges that can deliver a robust approximation of the entire graph's performance. We propose a straightforward and efficient algorithm for finding these GLTs in arbitrary graphs. Empirically, we observe that performance of different graph learning algorithms can be matched or even exceeded on graphs with the average degree as low as 5.

Title: Predictive Chemistry Augmented with Text Retrieval. (arXiv:2312.04881v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04881
Code URL: https://github.com/thomas0809/textreact
Copy Paste: [[2312.04881]] Predictive Chemistry Augmented with Text Retrieval(http://arxiv.org/abs/2312.04881)
Summary:
This paper focuses on using natural language descriptions to enhance predictive models in the chemistry field. Conventionally, chemoinformatics models are trained with extensive structured data manually extracted from the literature. In this paper, we introduce TextReact, a novel method that directly augments predictive chemistry with texts retrieved from the literature. TextReact retrieves text descriptions relevant for a given chemical reaction, and then aligns them with the molecular representation of the reaction. This alignment is enhanced via an auxiliary masked LM objective incorporated in the predictor training. We empirically validate the framework on two chemistry tasks: reaction condition recommendation and one-step retrosynthesis. By leveraging text retrieval, TextReact significantly outperforms state-of-the-art chemoinformatics models trained solely on molecular data.

Title: Generating Explanations to Understand and Repair Embedding-based Entity Alignment. (arXiv:2312.04877v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04877
Code URL: https://github.com/txb-nju/exea
Copy Paste: [[2312.04877]] Generating Explanations to Understand and Repair Embedding-based Entity Alignment(http://arxiv.org/abs/2312.04877)
Summary:
Entity alignment seeks identical entities in different knowledge graphs, which is a long-standing task in the database research. Recent work leverages deep learning to embed entities in vector space and align them via nearest neighbor search. Although embedding-based entity alignment has gained marked success in recent years, it lacks explanations for alignment decisions. In this paper, we present the first framework that can generate explanations for understanding and repairing embedding-based entity alignment results. Given an entity alignment pair produced by an embedding model, we first compare its neighbor entities and relations to build a matching subgraph as a local explanation. We then construct an alignment dependency graph to understand the pair from an abstract perspective. Finally, we repair the pair by resolving three types of alignment conflicts based on dependency graphs. Experiments on five datasets demonstrate the effectiveness and generalization of our framework in explaining and repairing embedding-based entity alignment results.

Title: Seamless: Multilingual Expressive and Streaming Speech Translation. (arXiv:2312.05187v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.05187
Code URL: https://github.com/facebookresearch/seamless_communication
Copy Paste: [[2312.05187]] Seamless: Multilingual Expressive and Streaming Speech Translation(http://arxiv.org/abs/2312.05187)
Summary:
Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication

Title: Relational Deep Learning: Graph Representation Learning on Relational Databases. (arXiv:2312.04615v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04615
Code URL: null
Copy Paste: [[2312.04615]] Relational Deep Learning: Graph Representation Learning on Relational Databases(http://arxiv.org/abs/2312.04615)
Summary:
Much of the world's most valued data is stored in relational databases and data warehouses, where the data is organized into many tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming. The core problem is that no machine learning method is capable of learning on multiple tables interconnected by primary-foreign key relations. Current methods can only learn from a single table, so the data must first be manually joined and aggregated into a single training table, the process known as feature engineering. Feature engineering is slow, error prone and leads to suboptimal models. Here we introduce an end-to-end deep representation learning approach to directly learn on data laid out across multiple tables. We name our approach Relational Deep Learning (RDL). The core idea is to view relational databases as a temporal, heterogeneous graph, with a node for each row in each table, and edges specified by primary-foreign key links. Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all input data, without any manual feature engineering. Relational Deep Learning leads to more accurate models that can be built much faster. To facilitate research in this area, we develop RelBench, a set of benchmark datasets and an implementation of Relational Deep Learning. The data covers a wide spectrum, from discussions on Stack Exchange to book reviews on the Amazon Product Catalog. Overall, we define a new research area that generalizes graph machine learning and broadens its applicability to a wide set of AI use cases.

Title: PAC-Bayes Generalization Certificates for Learned Inductive Conformal Prediction. (arXiv:2312.04658v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04658
Code URL: null
Copy Paste: [[2312.04658]] PAC-Bayes Generalization Certificates for Learned Inductive Conformal Prediction(http://arxiv.org/abs/2312.04658)
Summary:
Inductive Conformal Prediction (ICP) provides a practical and effective approach for equipping deep learning models with uncertainty estimates in the form of set-valued predictions which are guaranteed to contain the ground truth with high probability. Despite the appeal of this coverage guarantee, these sets may not be efficient: the size and contents of the prediction sets are not directly controlled, and instead depend on the underlying model and choice of score function. To remedy this, recent work has proposed learning model and score function parameters using data to directly optimize the efficiency of the ICP prediction sets. While appealing, the generalization theory for such an approach is lacking: direct optimization of empirical efficiency may yield prediction sets that are either no longer efficient on test data, or no longer obtain the required coverage on test data. In this work, we use PAC-Bayes theory to obtain generalization bounds on both the coverage and the efficiency of set-valued predictors which can be directly optimized to maximize efficiency while satisfying a desired test coverage. In contrast to prior work, our framework allows us to utilize the entire calibration dataset to learn the parameters of the model and score function, instead of requiring a separate hold-out set for obtaining test-time coverage guarantees. We leverage these theoretical results to provide a practical algorithm for using calibration data to simultaneously fine-tune the parameters of a model and score function while guaranteeing test-time coverage and efficiency of the resulting prediction sets. We evaluate the approach on regression and classification tasks, and outperform baselines calibrated using a Hoeffding bound-based PAC guarantee on ICP, especially in the low-data regime.

Title: Reverse Engineering Deep ReLU Networks An Optimization-based Algorithm. (arXiv:2312.04675v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04675
Code URL: null
Copy Paste: [[2312.04675]] Reverse Engineering Deep ReLU Networks An Optimization-based Algorithm(http://arxiv.org/abs/2312.04675)
Summary:
Reverse engineering deep ReLU networks is a critical problem in understanding the complex behavior and interpretability of neural networks. In this research, we present a novel method for reconstructing deep ReLU networks by leveraging convex optimization techniques and a sampling-based approach. Our method begins by sampling points in the input space and querying the black box model to obtain the corresponding hyperplanes. We then define a convex optimization problem with carefully chosen constraints and conditions to guarantee its convexity. The objective function is designed to minimize the discrepancy between the reconstructed networks output and the target models output, subject to the constraints. We employ gradient descent to optimize the objective function, incorporating L1 or L2 regularization as needed to encourage sparse or smooth solutions. Our research contributes to the growing body of work on reverse engineering deep ReLU networks and paves the way for new advancements in neural network interpretability and security.

Title: Distributed Optimization via Kernelized Multi-armed Bandits. (arXiv:2312.04719v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04719
Code URL: null
Copy Paste: [[2312.04719]] Distributed Optimization via Kernelized Multi-armed Bandits(http://arxiv.org/abs/2312.04719)
Summary:
Multi-armed bandit algorithms provide solutions for sequential decision-making where learning takes place by interacting with the environment. In this work, we model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting. In this setup, the agents collaboratively aim to maximize a global objective function which is an average of local objective functions. The agents can access only bandit feedback (noisy reward) obtained from the associated unknown local function with a small norm in reproducing kernel Hilbert space (RKHS). We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels while preserving privacy. It does not necessitate the agents to share their actions, rewards, or estimates of their local function. In the proposed approach, the agents sample their individual local functions in a way that benefits the whole network by utilizing a running consensus to estimate the upper confidence bound on the global function. Furthermore, we propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network. It provides improved performance by utilizing a delay in the estimation update step at the cost of more communication.

Title: Not All Negatives AreWorth Attending to: Meta-Bootstrapping Negative Sampling Framework for Link Prediction. (arXiv:2312.04815v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04815
Code URL: null
Copy Paste: [[2312.04815]] Not All Negatives AreWorth Attending to: Meta-Bootstrapping Negative Sampling Framework for Link Prediction(http://arxiv.org/abs/2312.04815)
Summary:
The rapid development of graph neural networks (GNNs) encourages the rising of link prediction, achieving promising performance with various applications. Unfortunately, through a comprehensive analysis, we surprisingly find that current link predictors with dynamic negative samplers (DNSs) suffer from the migration phenomenon between "easy" and "hard" samples, which goes against the preference of DNS of choosing "hard" negatives, thus severely hindering capability. Towards this end, we propose the MeBNS framework, serving as a general plugin that can potentially improve current negative sampling based link predictors. In particular, we elaborately devise a Meta-learning Supported Teacher-student GNN (MST-GNN) that is not only built upon teacher-student architecture for alleviating the migration between "easy" and "hard" samples but also equipped with a meta learning based sample re-weighting module for helping the student GNN distinguish "hard" samples in a fine-grained manner. To effectively guide the learning of MST-GNN, we prepare a Structure enhanced Training Data Generator (STD-Generator) and an Uncertainty based Meta Data Collector (UMD-Collector) for supporting the teacher and student GNN, respectively. Extensive experiments show that the MeBNS achieves remarkable performance across six link prediction benchmark datasets.

Title: Neural Spectral Methods: Self-supervised learning in the spectral domain. (arXiv:2312.05225v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.05225
Code URL: null
Copy Paste: [[2312.05225]] Neural Spectral Methods: Self-supervised learning in the spectral domain(http://arxiv.org/abs/2312.05225)
Summary:
We present Neural Spectral Methods, a technique to solve parametric Partial Differential Equations (PDEs), grounded in classical spectral methods. Our method uses orthogonal bases to learn PDE solutions as mappings between spectral coefficients. In contrast to current machine learning approaches which enforce PDE constraints by minimizing the numerical quadrature of the residuals in the spatiotemporal domain, we leverage Parseval's identity and introduce a new training strategy through a \textit{spectral loss}. Our spectral loss enables more efficient differentiation through the neural network, and substantially reduces training complexity. At inference time, the computational cost of our method remains constant, regardless of the spatiotemporal resolution of the domain. Our experimental results demonstrate that our method significantly outperforms previous machine learning approaches in terms of speed and accuracy by one to two orders of magnitude on multiple different problems. When compared to numerical solvers of the same accuracy, our method demonstrates a $10\times$ increase in performance speed.

Title: Modeling Risk in Reinforcement Learning: A Literature Mapping. (arXiv:2312.05231v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.05231
Code URL: null
Copy Paste: [[2312.05231]] Modeling Risk in Reinforcement Learning: A Literature Mapping(http://arxiv.org/abs/2312.05231)
Summary:
Safe reinforcement learning deals with mitigating or avoiding unsafe situations by reinforcement learning (RL) agents. Safe RL approaches are based on specific risk representations for particular problems or domains. In order to analyze agent behaviors, compare safe RL approaches, and effectively transfer techniques between application domains, it is necessary to understand the types of risk specific to safe RL problems. We performed a systematic literature mapping with the objective to characterize risk in safe RL. Based on the obtained results, we present definitions, characteristics, and types of risk that hold on multiple application domains. Our literature mapping covers literature from the last 5 years (2017-2022), from a variety of knowledge areas (AI, finance, engineering, medicine) where RL approaches emphasize risk representation and management. Our mapping covers 72 papers filtered systematically from over thousands of papers on the topic. Our proposed notion of risk covers a variety of representations, disciplinary differences, common training exercises, and types of techniques. We encourage researchers to include explicit and detailed accounts of risk in future safe RL research reports, using this mapping as a starting point. With this information, researchers and practitioners could draw stronger conclusions on the effectiveness of techniques on different problems.

multi-run

chain-of-thought

Title: Latent Skill Discovery for Chain-of-Thought Reasoning. (arXiv:2312.04684v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04684
Code URL: null
Copy Paste: [[2312.04684]] Latent Skill Discovery for Chain-of-Thought Reasoning(http://arxiv.org/abs/2312.04684)
Summary:
Recent advances in Large Language Models (LLMs) have led to an emergent ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that adds intermediate rationale steps between questions and answers to construct prompts. Conditioned on these prompts, LLMs can effectively learn in context to generate rationales that lead to more accurate answers than when answering the same question directly. To design LLM prompts, one important setting, called demonstration selection, considers selecting demonstrations from an example bank. Existing methods use various heuristics for this selection, but for CoT prompting, which involves unique rationales, it is essential to base the selection upon the intrinsic skills that CoT rationales need, for instance, the skills of addition or subtraction for math word problems.

To address this requirement, we introduce a novel approach named Reasoning Skill Discovery (RSD) that use unsupervised learning to create a latent space representation of rationales, called a reasoning skill. Simultaneously, RSD learns a reasoning policy to determine the required reasoning skill for a given question. This can then guide the selection of examples that demonstrate the required reasoning skills. Our approach offers several desirable properties: it is (1) theoretically grounded, (2) sample-efficient, requiring no LLM inference or manual prompt design, and (3) LLM-agnostic. Empirically, RSD outperforms existing methods by up to 6% in terms of the answer accuracy across multiple reasoning tasks.

language model

Title: Towards a Psychological Generalist AI: A Survey of Current Applications of Large Language Models and Future Prospects. (arXiv:2312.04578v1 [cs.AI])

Title: TOD-Flow: Modeling the Structure of Task-Oriented Dialogues. (arXiv:2312.04668v1 [cs.CL])

Title: Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models. (arXiv:2312.04691v1 [cs.CL])

Title: Efficient Large Language Models Fine-Tuning On Graphs. (arXiv:2312.04737v1 [cs.LG])

Title: HuRef: HUman-REadable Fingerprint for Large Language Models. (arXiv:2312.04828v1 [cs.CL])

Title: Localized Symbolic Knowledge Distillation for Visual Commonsense Models. (arXiv:2312.04837v1 [cs.AI])

Title: KwaiAgents: Generalized Information-seeking Agent System with Large Language Models. (arXiv:2312.04889v1 [cs.AI])

Title: EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism. (arXiv:2312.04916v1 [cs.LG])

Title: The ICL Consistency Test. (arXiv:2312.04945v1 [cs.CL])

Title: Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning. (arXiv:2312.05230v1 [cs.AI])

Title: PyThaiNLP: Thai Natural Language Processing in Python. (arXiv:2312.04649v1 [cs.CL])

Title: How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey. (arXiv:2312.04775v1 [cs.CL])

Title: Ophtha-LLaMA2: A Large Language Model for Ophthalmology. (arXiv:2312.04906v1 [cs.CL])

Title: Zoology: Measuring and Improving Recall in Efficient Language Models. (arXiv:2312.04927v1 [cs.CL])

Title: PathFinder: Guided Search over Multi-Step Reasoning Paths. (arXiv:2312.05180v1 [cs.CL])

gpt

Title: From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis. (arXiv:2312.04720v1 [cs.CL])

Title: On Sarcasm Detection with OpenAI GPT-based Models. (arXiv:2312.04642v1 [cs.CL])

Title: Seeing ChatGPT Through Universities' Policies, Resources and Guidelines. (arXiv:2312.05235v1 [cs.CL])

llm

Title: SparQ Attention: Bandwidth-Efficient LLM Inference. (arXiv:2312.04985v1 [cs.LG])

long context

lora

hallucination

Title: HALO: An Ontology for Representing Hallucinations in Generative Models. (arXiv:2312.05209v1 [cs.AI])

Title: DelucionQA: Detecting Hallucinations in Domain-specific Question Answering. (arXiv:2312.05200v1 [cs.CL])

prompt

Title: Improving Neural Machine Translation by Multi-Knowledge Integration with Prompting. (arXiv:2312.04807v1 [cs.CL])

Title: Boosting Prompt-Based Self-Training With Mapping-Free Automatic Verbalizer for Multi-Class Classification. (arXiv:2312.04982v1 [cs.CL])

code

Title: FREDSum: A Dialogue Summarization Corpus for French Political Debates. (arXiv:2312.04843v1 [cs.CL])

Title: DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary Intelligence. (arXiv:2312.05171v1 [cs.AI])

Title: TaskMet: Task-Driven Metric Learning for Model Learning. (arXiv:2312.05250v1 [cs.LG])

Title: Converting Epics/Stories into Pseudocode using Transformers. (arXiv:2312.05047v1 [cs.CL])

Title: Transferable Candidate Proposal with Bounded Uncertainty. (arXiv:2312.04604v1 [cs.LG])

Title: StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning. (arXiv:2312.04865v1 [cs.LG])

chat

retrieval augmented generation

rag

Title: Federated Learning for 6G: Paradigms, Taxonomy, Recent Advances and Insights. (arXiv:2312.04688v1 [cs.LG])

Title: Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning. (arXiv:2312.04736v1 [cs.CL])

Title: Train 'n Trade: Foundations of Parameter Markets. (arXiv:2312.04740v1 [cs.LG])

Title: The Graph Lottery Ticket Hypothesis: Finding Sparse, Informative Graph Structure. (arXiv:2312.04762v1 [cs.LG])

Title: Predictive Chemistry Augmented with Text Retrieval. (arXiv:2312.04881v1 [cs.CL])

Title: Generating Explanations to Understand and Repair Embedding-based Entity Alignment. (arXiv:2312.04877v1 [cs.CL])

Title: Seamless: Multilingual Expressive and Streaming Speech Translation. (arXiv:2312.05187v1 [cs.CL])

Title: Relational Deep Learning: Graph Representation Learning on Relational Databases. (arXiv:2312.04615v1 [cs.LG])

Title: PAC-Bayes Generalization Certificates for Learned Inductive Conformal Prediction. (arXiv:2312.04658v1 [cs.LG])

Title: Reverse Engineering Deep ReLU Networks An Optimization-based Algorithm. (arXiv:2312.04675v1 [cs.LG])

Title: Distributed Optimization via Kernelized Multi-armed Bandits. (arXiv:2312.04719v1 [cs.LG])

Title: Not All Negatives AreWorth Attending to: Meta-Bootstrapping Negative Sampling Framework for Link Prediction. (arXiv:2312.04815v1 [cs.LG])

Title: Neural Spectral Methods: Self-supervised learning in the spectral domain. (arXiv:2312.05225v1 [cs.LG])

Title: Modeling Risk in Reinforcement Learning: A Literature Mapping. (arXiv:2312.05231v1 [cs.LG])

multi-run

chain-of-thought

Title: Latent Skill Discovery for Chain-of-Thought Reasoning. (arXiv:2312.04684v1 [cs.CL])

tree-of-thought