2023-12-18

language model

Title: Self-Evaluation Improves Selective Generation in Large Language Models. (arXiv:2312.09300v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09300
Code URL: null
Copy Paste: [[2312.09300]] Self-Evaluation Improves Selective Generation in Large Language Models(http://arxiv.org/abs/2312.09300)
Summary:
Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely employed, recent research has demonstrated the limitations of using sequence-level probability estimates given by LLMs as reliable indicators of generation quality. Conversely, LLMs have demonstrated strong calibration at the token level, particularly when it comes to choosing correct answers in multiple-choice questions or evaluating true/false statements. In this work, we reformulate open-ended generation tasks into token-level prediction tasks, and leverage LLMs' superior calibration at the token level. We instruct an LLM to self-evaluate its answers, employing either a multi-way comparison or a point-wise evaluation approach, with the option to include a ``None of the above'' option to express the model's uncertainty explicitly. We benchmark a range of scoring methods based on self-evaluation and evaluate their performance in selective generation using TruthfulQA and TL;DR. Through experiments with PaLM-2 and GPT-3, we demonstrate that self-evaluation based scores not only improve accuracy, but also correlate better with the overall quality of generated content.

Title: ArchiGuesser -- AI Art Architecture Educational Game. (arXiv:2312.09334v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09334
Code URL: null
Copy Paste: [[2312.09334]] ArchiGuesser -- AI Art Architecture Educational Game(http://arxiv.org/abs/2312.09334)
Summary:
The use of generative AI in education is a controversial topic. Current technology offers the potential to create educational content from text, speech, to images based on simple input prompts. This can enhance productivity by summarizing knowledge and improving communication, quickly adjusting to different types of learners. Moreover, generative AI holds the promise of making the learning itself more fun, by responding to user inputs and dynamically generating high-quality creative material. In this paper we present the multisensory educational game ArchiGuesser that combines various AI technologies from large language models, image generation, to computer vision to serve a single purpose: Teaching students in a playful way the diversity of our architectural history and how generative AI works.

Title: Large Language Models for Autonomous Driving: Real-World Experiments. (arXiv:2312.09397v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09397
Code URL: null
Copy Paste: [[2312.09397]] Large Language Models for Autonomous Driving: Real-World Experiments(http://arxiv.org/abs/2312.09397)
Summary:
Autonomous driving systems are increasingly popular in today's technological landscape, where vehicles with partial automation have already been widely available on the market, and the full automation era with ``driverless'' capabilities is near the horizon. However, accurately understanding humans' commands, particularly for autonomous vehicles that have only passengers instead of drivers, and achieving a high level of personalization remain challenging tasks in the development of autonomous driving systems. In this paper, we introduce a Large Language Model (LLM)-based framework Talk-to-Drive (Talk2Drive) to process verbal commands from humans and make autonomous driving decisions with contextual information, satisfying their personalized preferences for safety, efficiency, and comfort. First, a speech recognition module is developed for Talk2Drive to interpret verbal inputs from humans to textual instructions, which are then sent to LLMs for reasoning. Then, appropriate commands for the Electrical Control Unit (ECU) are generated, achieving a 100\% success rate in executing codes. Real-world experiments show that our framework can substantially reduce the takeover rate for a diverse range of drivers by up to 90.1\%. To the best of our knowledge, Talk2Drive marks the first instance of employing an LLM-based system in a real-world autonomous driving environment.

Title: Clinical Text Deduplication Practices for Efficient Pretraining and Improved Clinical Tasks. (arXiv:2312.09469v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09469
Code URL: null
Copy Paste: [[2312.09469]] Clinical Text Deduplication Practices for Efficient Pretraining and Improved Clinical Tasks(http://arxiv.org/abs/2312.09469)
Summary:
Despite being a unique source of information on patients' status and disease progression, clinical notes are characterized by high levels of duplication and information redundancy. In general domain text, it has been shown that deduplication does not harm language model (LM) pretraining, thus helping reduce the training cost. Although large LMs have proven to learn medical knowledge, they still require specialized domain adaptation for improved downstream clinical tasks. By leveraging large real-world clinical corpora, we first provided a fine-grained characterization of duplicates stemming from common writing practices and clinical relevancy. Second, we demonstrated that deduplicating clinical text can help clinical LMs encode less redundant information in a more efficient manner and do not harm classification tasks via prompt-based learning.

Title: Grounding for Artificial Intelligence. (arXiv:2312.09532v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09532
Code URL: null
Copy Paste: [[2312.09532]] Grounding for Artificial Intelligence(http://arxiv.org/abs/2312.09532)
Summary:
A core function of intelligence is grounding, which is the process of connecting the natural language and abstract knowledge to the internal representation of the real world in an intelligent being, e.g., a human. Human cognition is grounded in our sensorimotor experiences in the external world and subjective feelings in our internal world. We use languages to communicate with each other and the languages are grounded on our shared sensorimotor experiences and feelings. Without this shard grounding, it is impossible for us to understand each other because all natural languages are highly abstract and are only able to describe a tiny portion of what has happened or is happening in the real world. Although grounding at high or abstract levels has been studied in different fields and applications, to our knowledge, limited systematic work at fine-grained levels has been done. With the rapid progress of large language models (LLMs), it is imperative that we have a sound understanding of grounding in order to move to the next level of intelligence. It is also believed that grounding is necessary for Artificial General Intelligence (AGI). This paper makes an attempt to systematically study this problem.

Title: On a Functional Definition of Intelligence. (arXiv:2312.09546v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09546
Code URL: null
Copy Paste: [[2312.09546]] On a Functional Definition of Intelligence(http://arxiv.org/abs/2312.09546)
Summary:
Without an agreed-upon definition of intelligence, asking "is this system intelligent?"" is an untestable question. This lack of consensus hinders research, and public perception, on Artificial Intelligence (AI), particularly since the rise of generative- and large-language models. Most work on precisely capturing what we mean by "intelligence" has come from the fields of philosophy, psychology, and cognitive science. Because these perspectives are intrinsically linked to intelligence as it is demonstrated by natural creatures, we argue such fields cannot, and will not, provide a sufficiently rigorous definition that can be applied to artificial means. Thus, we present an argument for a purely functional, black-box definition of intelligence, distinct from how that intelligence is actually achieved; focusing on the "what", rather than the "how". To achieve this, we first distinguish other related concepts (sentience, sensation, agency, etc.) from the notion of intelligence, particularly identifying how these concepts pertain to artificial intelligent systems. As a result, we achieve a formal definition of intelligence that is conceptually testable from only external observation, that suggests intelligence is a continuous variable. We conclude by identifying challenges that still remain towards quantifiable measurement. This work provides a useful perspective for both the development of AI, and for public perception of the capabilities and risks of AI.

Title: Prompting Large Language Models for Topic Modeling. (arXiv:2312.09693v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09693
Code URL: null
Copy Paste: [[2312.09693]] Prompting Large Language Models for Topic Modeling(http://arxiv.org/abs/2312.09693)
Summary:
Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words. Moreover, these models often neglect sentence-level semantics, focusing primarily on token-level semantics. In this paper, we propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs) to address these challenges. It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths. This approach eliminates the need for manual parameter tuning and improves the quality of extracted topics. We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics. Furthermore, qualitative analysis showcases PromptTopic's ability to uncover relevant topics in multiple datasets.

Title: Improving Biomedical Entity Linking with Retrieval-enhanced Learning. (arXiv:2312.09806v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09806
Code URL: https://github.com/lzxlin/knn-bioel
Copy Paste: [[2312.09806]] Improving Biomedical Entity Linking with Retrieval-enhanced Learning(http://arxiv.org/abs/2312.09806)
Summary:
Biomedical entity linking (BioEL) has achieved remarkable progress with the help of pre-trained language models. However, existing BioEL methods usually struggle to handle rare and difficult entities due to long-tailed distribution. To address this limitation, we introduce a new scheme $k$NN-BioEL, which provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction, thus improving the generalization capabilities. Moreover, we design a contrastive learning objective with dynamic hard negative sampling (DHNS) that improves the quality of the retrieved neighbors during inference. Extensive experimental results show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.

Title: SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models. (arXiv:2312.09818v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09818
Code URL: https://github.com/smile-data/smile
Copy Paste: [[2312.09818]] SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models(http://arxiv.org/abs/2312.09818)
Summary:
Despite the recent advances of the artificial intelligence, building social intelligence remains a challenge. Among social signals, laughter is one of the distinctive expressions that occurs during social interactions between humans. In this work, we tackle a new challenge for machines to understand the rationale behind laughter in video, Video Laugh Reasoning. We introduce this new task to explain why people laugh in a particular video and a dataset for this task. Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh. We propose a baseline by leveraging the reasoning capacity of large language models (LLMs) with textual video representation. Experiments show that our baseline can generate plausible explanations for laughter. We further investigate the scalability of our baseline by probing other video understanding tasks and in-the-wild videos. We release our dataset, code, and model checkpoints on https://github.com/SMILE-data/SMILE.

Title: Neurosymbolic Value-Inspired AI (Why, What, and How). (arXiv:2312.09928v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09928
Code URL: null
Copy Paste: [[2312.09928]] Neurosymbolic Value-Inspired AI (Why, What, and How)(http://arxiv.org/abs/2312.09928)
Summary:
The rapid progression of Artificial Intelligence (AI) systems, facilitated by the advent of Large Language Models (LLMs), has resulted in their widespread application to provide human assistance across diverse industries. This trend has sparked significant discourse centered around the ever-increasing need for LLM-based AI systems to function among humans as part of human society, sharing human values, especially as these systems are deployed in high-stakes settings (e.g., healthcare, autonomous driving, etc.). Towards this end, neurosymbolic AI systems are attractive due to their potential to enable easy-to-understand and interpretable interfaces for facilitating value-based decision-making, by leveraging explicit representations of shared values. In this paper, we introduce substantial extensions to Khaneman's System one/two framework and propose a neurosymbolic computational framework called Value-Inspired AI (VAI). It outlines the crucial components essential for the robust and practical implementation of VAI systems, aiming to represent and integrate various dimensions of human values. Finally, we further offer insights into the current progress made in this direction and outline potential future directions for the field.

Title: Distilling Large Language Models for Matching Patients to Clinical Trials. (arXiv:2312.09958v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09958
Code URL: null
Copy Paste: [[2312.09958]] Distilling Large Language Models for Matching Patients to Clinical Trials(http://arxiv.org/abs/2312.09958)
Summary:
The recent success of large language models (LLMs) has paved the way for their adoption in the high-stakes domain of healthcare. Specifically, the application of LLMs in patient-trial matching, which involves assessing patient eligibility against clinical trial's nuanced inclusion and exclusion criteria, has shown promise. Recent research has shown that GPT-3.5, a widely recognized LLM developed by OpenAI, can outperform existing methods with minimal 'variable engineering' by simply comparing clinical trial information against patient summaries. However, there are significant challenges associated with using closed-source proprietary LLMs like GPT-3.5 in practical healthcare applications, such as cost, privacy and reproducibility concerns. To address these issues, this study presents the first systematic examination of the efficacy of both proprietary (GPT-3.5, and GPT-4) and open-source LLMs (LLAMA 7B,13B, and 70B) for the task of patient-trial matching. Employing a multifaceted evaluation framework, we conducted extensive automated and human-centric assessments coupled with a detailed error analysis for each model. To enhance the adaptability of open-source LLMs, we have created a specialized synthetic dataset utilizing GPT-4, enabling effective fine-tuning under constrained data conditions. Our findings reveal that open-source LLMs, when fine-tuned on this limited and synthetic dataset, demonstrate performance parity with their proprietary counterparts. This presents a massive opportunity for their deployment in real-world healthcare applications. To foster further research and applications in this field, we release both the annotated evaluation dataset along with the fine-tuned LLM -- Trial-LLAMA -- for public use.

Title: Data and Approaches for German Text simplification -- towards an Accessibility-enhanced Communication. (arXiv:2312.09966v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09966
Code URL: null
Copy Paste: [[2312.09966]] Data and Approaches for German Text simplification -- towards an Accessibility-enhanced Communication(http://arxiv.org/abs/2312.09966)
Summary:
This paper examines the current state-of-the-art of German text simplification, focusing on parallel and monolingual German corpora. It reviews neural language models for simplifying German texts and assesses their suitability for legal texts and accessibility requirements. Our findings highlight the need for additional training data and more appropriate approaches that consider the specific linguistic characteristics of German, as well as the importance of the needs and preferences of target groups with cognitive or language impairments. The authors launched the interdisciplinary OPEN-LS project in April 2023 to address these research gaps. The project aims to develop a framework for text formats tailored to individuals with low literacy levels, integrate legal texts, and enhance comprehensibility for those with linguistic or cognitive impairments. It will also explore cost-effective ways to enhance the data with audience-specific illustrations using image-generating AI.

For more and up-to-date information, please visit our project homepage https://open-ls.entavis.com

Title: Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. (arXiv:2312.09390v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09390
Code URL: null
Copy Paste: [[2312.09390]] Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision(http://arxiv.org/abs/2312.09390)
Summary:
Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model? We test this using a range of pretrained language models in the GPT-4 family on natural language processing (NLP), chess, and reward modeling tasks. We find that when we naively finetune strong pretrained models on labels generated by a weak model, they consistently perform better than their weak supervisors, a phenomenon we call weak-to-strong generalization. However, we are still far from recovering the full capabilities of strong models with naive finetuning alone, suggesting that techniques like RLHF may scale poorly to superhuman models without further work. We find that simple methods can often significantly improve weak-to-strong generalization: for example, when finetuning GPT-4 with a GPT-2-level supervisor and an auxiliary confidence loss, we can recover close to GPT-3.5-level performance on NLP tasks. Our results suggest that it is feasible to make empirical progress today on a fundamental challenge of aligning superhuman models.

Title: Marathon: A Race Through the Realm of Long Context with Large Language Models. (arXiv:2312.09542v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09542
Code URL: null
Copy Paste: [[2312.09542]] Marathon: A Race Through the Realm of Long Context with Large Language Models(http://arxiv.org/abs/2312.09542)
Summary:
Although there are currently many benchmarks available for evaluating the long context understanding and reasoning capability of large language models, with the expansion of the context window in these models, the existing long context benchmarks are no longer sufficient for evaluating the long context understanding and reasoning capability of large language models. In this paper, we have developed a fresh long context evaluation benchmark, which we name it Marathon in the form of multiple choice questions, inspired by benchmarks such as MMLU, for assessing the long context comprehension capability of large language models quickly, accurately, and objectively. We have evaluated several of the latest and most popular large language models, as well as three recent and effective long context optimization methods, on our benchmark. This showcases the long context reasoning and comprehension capabilities of these large language models and validates the effectiveness of these optimization methods. Marathon is available at https://huggingface.co/datasets/Lemoncoke/Marathon.

Title: Extending Context Window of Large Language Models via Semantic Compression. (arXiv:2312.09571v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09571
Code URL: null
Copy Paste: [[2312.09571]] Extending Context Window of Large Language Models via Semantic Compression(http://arxiv.org/abs/2312.09571)
Summary:
Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long texts. We propose a novel semantic compression method that enables generalization to texts that are 6-8 times longer, without incurring significant computational costs or requiring fine-tuning. Our proposed framework draws inspiration from source coding in information theory and employs a pre-trained model to reduce the semantic redundancy of long inputs before passing them to the LLMs for downstream tasks. Experimental results demonstrate that our method effectively extends the context window of LLMs across a range of tasks including question answering, summarization, few-shot learning, and information retrieval. Furthermore, the proposed semantic compression method exhibits consistent fluency in text generation while reducing the associated computational overhead.

Title: Probing Pretrained Language Models with Hierarchy Properties. (arXiv:2312.09670v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09670
Code URL: null
Copy Paste: [[2312.09670]] Probing Pretrained Language Models with Hierarchy Properties(http://arxiv.org/abs/2312.09670)
Summary:
Since Pretrained Language Models (PLMs) are the cornerstone of the most recent Information Retrieval (IR) models, the way they encode semantic knowledge is particularly important. However, little attention has been given to studying the PLMs' capability to capture hierarchical semantic knowledge. Traditionally, evaluating such knowledge encoded in PLMs relies on their performance on a task-dependent evaluation approach based on proxy tasks, such as hypernymy detection. Unfortunately, this approach potentially ignores other implicit and complex taxonomic relations. In this work, we propose a task-agnostic evaluation method able to evaluate to what extent PLMs can capture complex taxonomy relations, such as ancestors and siblings. The evaluation is based on intrinsic properties that capture the hierarchical nature of taxonomies. Our experimental evaluation shows that the lexico-semantic knowledge implicitly encoded in PLMs does not always capture hierarchical relations. We further demonstrate that the proposed properties can be injected into PLMs to improve their understanding of hierarchy. Through evaluations on taxonomy reconstruction, hypernym discovery and reading comprehension tasks, we show that the knowledge about hierarchy is moderately but not systematically transferable across tasks.

Title: RJUA-QA: A Comprehensive QA Dataset for Urology. (arXiv:2312.09785v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09785
Code URL: null
Copy Paste: [[2312.09785]] RJUA-QA: A Comprehensive QA Dataset for Urology(http://arxiv.org/abs/2312.09785)
Summary:
We introduce RJUA-QA, a novel medical dataset for question answering (QA) and reasoning with clinical evidence, contributing to bridge the gap between general large language models (LLMs) and medical-specific LLM applications. RJUA-QA is derived from realistic clinical scenarios and aims to facilitate LLMs in generating reliable diagnostic and advice. The dataset contains 2,132 curated Question-Context-Answer pairs, corresponding about 25,000 diagnostic records and clinical cases. The dataset covers 67 common urological disease categories, where the disease coverage exceeds 97.6\% of the population seeking medical services in urology. Each data instance in RJUA-QA comprises: (1) a question mirroring real patient to inquiry about clinical symptoms and medical conditions, (2) a context including comprehensive expert knowledge, serving as a reference for medical examination and diagnosis, (3) a doctor response offering the diagnostic conclusion and suggested examination guidance, (4) a diagnosed clinical disease as the recommended diagnostic outcome, and (5) clinical advice providing recommendations for medical examination. RJUA-QA is the first medical QA dataset for clinical reasoning over the patient inquiries, where expert-level knowledge and experience are required for yielding diagnostic conclusions and medical examination advice. A comprehensive evaluation is conducted to evaluate the performance of both medical-specific and general LLMs on the RJUA-QA dataset.

Title: ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs). (arXiv:2312.09801v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09801
Code URL: null
Copy Paste: [[2312.09801]] ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)(http://arxiv.org/abs/2312.09801)
Summary:
We introduce a novel writing method called Probing Chain of Thought (ProCoT), which prevents students from cheating using a Large Language Model (LLM), such as ChatGPT, while enhancing their active learning through such models. LLMs have disrupted education and many other feilds. For fear of students cheating, many educationists have resorted to banning their use, as their outputs can be human-like and hard to detect in some cases. These LLMs are also known for hallucinations (i.e. fake facts). We conduct studies with ProCoT in two different courses with a combined total of about 66 students. The students in each course were asked to prompt an LLM of their choice with one question from a set of four and required to affirm or refute statements in the LLM output by using peer reviewed references. The results show two things: (1) ProCoT stimulates creative/critical thinking and writing of students through engagement with LLMs when we compare the LLM solely output to ProCoT output and (2) ProCoT can prevent cheating because of clear limitations in existing LLMs when we compare students ProCoT output to LLM ProCoT output. We also discover that most students prefer to give answers in fewer words than LLMs, which are typically verbose. The average word counts for students, ChatGPT (v3.5) and Phind (v8) are 208, 391 and 383, respectively.

Title: Grammatical information in BERT sentence embeddings as two-dimensional arrays. (arXiv:2312.09890v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09890
Code URL: https://github.com/clcl-geneva/blm-snfdisentangling
Copy Paste: [[2312.09890]] Grammatical information in BERT sentence embeddings as two-dimensional arrays(http://arxiv.org/abs/2312.09890)
Summary:
Sentence embeddings induced with various transformer architectures encode much semantic and syntactic information in a distributed manner in a one-dimensional array. We investigate whether specific grammatical information can be accessed in these distributed representations. Using data from a task developed to test rule-like generalizations, our experiments on detecting subject-verb agreement yield several promising results. First, we show that while the usual sentence representations encoded as one-dimensional arrays do not easily support extraction of rule-like regularities, a two-dimensional reshaping of these vectors allows various learning architectures to access such information. Next, we show that various architectures can detect patterns in these two-dimensional reshaped sentence embeddings and successfully learn a model based on smaller amounts of simpler training data, which performs well on more complex test data. This indicates that current sentence embeddings contain information that is regularly distributed, and which can be captured when the embeddings are reshaped into higher dimensional arrays. Our results cast light on representations produced by language models and help move towards developing few-shot learning approaches.

Title: Generative Context-aware Fine-tuning of Self-supervised Speech Models. (arXiv:2312.09895v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09895
Code URL: null
Copy Paste: [[2312.09895]] Generative Context-aware Fine-tuning of Self-supervised Speech Models(http://arxiv.org/abs/2312.09895)
Summary:
When performing tasks like automatic speech recognition or spoken language understanding for a given utterance, access to preceding text or audio provides contextual information can improve performance. Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text. With appropriate prompts, LLM could generate a prediction of the next sentence or abstractive text like titles or topics. In this paper, we study the use of LLM-generated context information and propose an approach to distill the generated information during fine-tuning of self-supervised speech models, which we refer to as generative context-aware fine-tuning. This approach allows the fine-tuned model to make improved predictions without access to the true surrounding segments or to the LLM at inference time, while requiring only a very small additional context module. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis. The results show that generative context-aware fine-tuning outperforms a context injection fine-tuning approach that accesses the ground-truth previous text, and is competitive with a generative context injection fine-tuning approach that requires the LLM at inference time.

Title: The Art of Balancing: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment. (arXiv:2312.09979v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09979
Code URL: null
Copy Paste: [[2312.09979]] The Art of Balancing: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment(http://arxiv.org/abs/2312.09979)
Summary:
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. When the models are required to align with a broader range of downstream tasks, or there is a desire to notably improve the performance on a specific task, a substantial increase in fine-tuning data often emerges as the solution. However, we find that large-scale increases in instruction data can disrupt the world knowledge previously stored in the LLMs, i.e., world knowledge forgetting. In this paper, we introduce LoRAMoE to address above challenge. The LoRAMoE is a plugin version of Mixture of Experts (MoE). The plugin-form ensures the integrity of world knowledge by freezing the backbone model during the training phase. And we propose the use of localized balancing constraints to coordinate parts of experts for task utilization, meanwhile enables other experts to to fully leverage the world knowledge stored in the models. Experimental results demonstrate that LoRAMoE can reasonly coordinate experts based on data type during inference, and even dramatically increasing instruction data does not result in knowledge forgetting. Moreover, LoRAMoE provides additional benefits for the performance of downstream tasks, indicating the potential of our approach for multi-task learning.

Title: LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language. (arXiv:2312.09993v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09993
Code URL: null
Copy Paste: [[2312.09993]] LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language(http://arxiv.org/abs/2312.09993)
Summary:
Large Language Models represent state-of-the-art linguistic models designed to equip computers with the ability to comprehend natural language. With its exceptional capacity to capture complex contextual relationships, the LLaMA (Large Language Model Meta AI) family represents a novel advancement in the field of natural language processing by releasing foundational models designed to improve the natural language understanding abilities of the transformer architecture thanks to their large amount of trainable parameters (7, 13, and 70 billion parameters). In many natural language understanding tasks, these models obtain the same performances as private company models such as OpenAI Chat-GPT with the advantage to make publicly available weights and code for research and commercial uses. In this work, we investigate the possibility of Language Adaptation for LLaMA models, explicitly focusing on addressing the challenge of Italian Language coverage. Adopting an open science approach, we explore various tuning approaches to ensure a high-quality text generated in Italian suitable for common tasks in this underrepresented language in the original models' datasets. We aim to release effective text generation models with strong linguistic properties for many tasks that seem challenging using multilingual or general-purpose LLMs. By leveraging an open science philosophy, this study contributes to Language Adaptation strategies for the Italian language by introducing the novel LLaMAntino family of Italian LLMs.

Title: Faithful Persona-based Conversational Dataset Generation with Large Language Models. (arXiv:2312.10007v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.10007
Code URL: https://github.com/google-research-datasets/Synthetic-Persona-Chat
Copy Paste: [[2312.10007]] Faithful Persona-based Conversational Dataset Generation with Large Language Models(http://arxiv.org/abs/2312.10007)
Summary:
High-quality conversational datasets are essential for developing AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and comprehensive persona-based dataset can lead to conversational models that create a deeper connection with the user, and maintain their engagement. In this paper, we leverage the power of Large Language Models (LLMs) to create a large, high-quality conversational dataset from a seed dataset. We propose a Generator-Critic architecture framework to expand the initial dataset, while improving the quality of its conversations. The Generator is an LLM prompted to output conversations. The Critic consists of a mixture of expert LLMs that control the quality of the generated conversations. These experts select the best generated conversations, which we then use to improve the Generator. We release Synthetic-Persona-Chat, consisting of 20k conversations seeded from Persona-Chat. We evaluate the quality of Synthetic-Persona-Chat and our generation framework on different dimensions through extensive experiments, and observe that the losing rate of Synthetic-Persona-Chat against Persona-Chat during Turing test decreases from 17.2% to 8.8% over three iterations.

gpt

Title: Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM. (arXiv:2312.09366v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09366
Code URL: https://github.com/mbzuai-oryx/climategpt
Copy Paste: [[2312.09366]] Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM(http://arxiv.org/abs/2312.09366)
Summary:
Climate change is one of the most significant challenges we face together as a society. Creating awareness and educating policy makers the wide-ranging impact of climate change is an essential step towards a sustainable future. Recently, Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. While these models are close-source, recently alternative open-source LLMs such as Stanford Alpaca and Vicuna have shown promising results. However, these open-source models are not specifically tailored for climate related domain specific information and also struggle to generate meaningful responses in other languages such as, Arabic. To this end, we propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning curated Arabic dataset Clima500-Instruct with over 500k instructions about climate change and sustainability. Further, our model also utilizes a vector embedding based retrieval mechanism during inference. We validate our proposed model through quantitative and qualitative evaluations on climate-related queries. Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation. Furthermore, our human expert evaluation reveals an 81.6% preference for our model's responses over multiple popular open-source models. Our open-source demos, code-base and models are available here https://github.com/mbzuai-oryx/ClimateGPT.

Title: GPT-4 Surpassing Human Performance in Linguistic Pragmatics. (arXiv:2312.09545v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09545
Code URL: null
Copy Paste: [[2312.09545]] GPT-4 Surpassing Human Performance in Linguistic Pragmatics(http://arxiv.org/abs/2312.09545)
Summary:
As Large Language Models (LLMs) become increasingly integrated into everyday life, their capabilities to understand and emulate human cognition are under steady examination. This study investigates the ability of LLMs to comprehend and interpret linguistic pragmatics, an aspect of communication that considers context and implied meanings. Using Grice's communication principles, LLMs and human subjects (N=76) were evaluated based on their responses to various dialogue-based tasks. The findings revealed the superior performance and speed of LLMs, particularly GPT4, over human subjects in interpreting pragmatics. GPT4 also demonstrated accuracy in the pre-testing of human-written samples, indicating its potential in text analysis. In a comparative analysis of LLMs using human individual and average scores, the models exhibited significant chronological improvement. The models were ranked from lowest to highest score, with GPT2 positioned at 78th place, GPT3 ranking at 23rd, Bard at 10th, GPT3.5 placing 5th, Best Human scoring 2nd, and GPT4 achieving the top spot. The findings highlight the remarkable progress made in the development and performance of these LLMs. Future studies should consider diverse subjects, multiple languages, and other cognitive aspects to fully comprehend the capabilities of LLMs. This research holds significant implications for the development and application of AI-based models in communication-centered sectors.

Title: 3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V. (arXiv:2312.09738v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09738
Code URL: null
Copy Paste: [[2312.09738]] 3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V(http://arxiv.org/abs/2312.09738)
Summary:
In this work, we present a new visual prompting method called 3DAxiesPrompts (3DAP) to unleash the capabilities of GPT-4V in performing 3D spatial tasks. Our investigation reveals that while GPT-4V exhibits proficiency in discerning the position and interrelations of 2D entities through current visual prompting techniques, its abilities in handling 3D spatial tasks have yet to be explored. In our approach, we create a 3D coordinate system tailored to 3D imagery, complete with annotated scale information. By presenting images infused with the 3DAP visual prompt as inputs, we empower GPT-4V to ascertain the spatial positioning information of the given 3D target image with a high degree of precision. Through experiments, We identified three tasks that could be stably completed using the 3DAP method, namely, 2D to 3D Point Reconstruction, 2D to 3D point matching, and 3D Object Detection. We perform experiments on our proposed dataset 3DAP-Data, the results from these experiments validate the efficacy of 3DAP-enhanced GPT-4V inputs, marking a significant stride in 3D spatial task execution.

Title: A Novel Dataset for Financial Education Text Simplification in Spanish. (arXiv:2312.09897v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09897
Code URL: null
Copy Paste: [[2312.09897]] A Novel Dataset for Financial Education Text Simplification in Spanish(http://arxiv.org/abs/2312.09897)
Summary:
Text simplification, crucial in natural language processing, aims to make texts more comprehensible, particularly for specific groups like visually impaired Spanish speakers, a less-represented language in this field. In Spanish, there are few datasets that can be used to create text simplification systems. Our research has the primary objective to develop a Spanish financial text simplification dataset. We created a dataset with 5,314 complex and simplified sentence pairs using established simplification rules. We also compared our dataset with the simplifications generated from GPT-3, Tuner, and MT5, in order to evaluate the feasibility of data augmentation using these systems. In this manuscript we present the characteristics of our dataset and the findings of the comparisons with other systems. The dataset is available at Hugging face, saul1917/FEINA.

Title: Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China. (arXiv:2312.09917v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09917
Code URL: null
Copy Paste: [[2312.09917]] Red AI? Inconsistent Responses from GPT3(http://arxiv.org/abs/2312.09917)
Summary:
The rising popularity of ChatGPT and other AI-powered large language models (LLMs) has led to increasing studies highlighting their susceptibility to mistakes and biases. However, most of these studies focus on models trained on English texts. Taking an innovative approach, this study investigates political biases in GPT's multilingual models. We posed the same question about high-profile political issues in the United States and China to GPT in both English and simplified Chinese, and our analysis of the bilingual responses revealed that GPT's bilingual models' political "knowledge" (content) and the political "attitude" (sentiment) are significantly more inconsistent on political issues in China. The simplified Chinese GPT models not only tended to provide pro-China information but also presented the least negative sentiment towards China's problems, whereas the English GPT was significantly more negative towards China. This disparity may stem from Chinese state censorship and US-China geopolitical tensions, which influence the training corpora of GPT bilingual models. Moreover, both Chinese and English models tended to be less critical towards the issues of "their own" represented by the language used, than the issues of "the other." This suggests that GPT multilingual models could potentially develop a "political identity" and an associated sentiment bias based on their training language. We discussed the implications of our findings for information transmission and communication in an increasingly divided world.

llm

Title: Challenges with unsupervised LLM knowledge discovery. (arXiv:2312.10029v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.10029
Code URL: null
Copy Paste: [[2312.10029]] Challenges with unsupervised LLM knowledge discovery(http://arxiv.org/abs/2312.10029)
Summary:
We show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge -- instead they seem to discover whatever feature of the activations is most prominent. The idea behind unsupervised knowledge elicitation is that knowledge satisfies a consistency structure, which can be used to discover knowledge. We first prove theoretically that arbitrary features (not just knowledge) satisfy the consistency structure of a particular leading unsupervised knowledge-elicitation method, contrast-consistent search (Burns et al. - arXiv:2212.03827). We then present a series of experiments showing settings in which unsupervised methods result in classifiers that do not predict knowledge, but instead predict a different prominent feature. We conclude that existing unsupervised methods for discovering latent knowledge are insufficient, and we contribute sanity checks to apply to evaluating future knowledge elicitation methods. Conceptually, we hypothesise that the identification issues explored here, e.g. distinguishing a model's knowledge from that of a simulated character's, will persist for future unsupervised methods.

Title: ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent. (arXiv:2312.10003v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.10003
Code URL: null
Copy Paste: [[2312.10003]] ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent(http://arxiv.org/abs/2312.10003)
Summary:
Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is non-differentiable. To address these deficiencies, we define a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We further refine the agent through a ReST-like method that iteratively trains on previous trajectories, employing growing-batch reinforcement learning with AI feedback for continuous self-improvement and self-distillation. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model that achieves comparable performance on challenging compositional question-answering benchmarks with two orders of magnitude fewer parameters.

long context

lora

Title: Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning. (arXiv:2312.09539v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09539
Code URL: null
Copy Paste: [[2312.09539]] Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning(http://arxiv.org/abs/2312.09539)
Summary:
Learning to collaborate has witnessed significant progress in multi-agent reinforcement learning (MARL). However, promoting coordination among agents and enhancing exploration capabilities remain challenges. In multi-agent environments, interactions between agents are limited in specific situations. Effective collaboration between agents thus requires a nuanced understanding of when and how agents' actions influence others. To this end, in this paper, we propose a novel MARL algorithm named Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning (SCIC), which incorporates a novel Intrinsic reward mechanism based on a new cooperation criterion measured by situation-dependent causal influence among agents. Our approach aims to detect inter-agent causal influences in specific situations based on the criterion using causal intervention and conditional mutual information. This effectively assists agents in exploring states that can positively impact other agents, thus promoting cooperation between agents. The resulting update links coordinated exploration and intrinsic reward distribution, which enhance overall collaboration and performance. Experimental results on various MARL benchmarks demonstrate the superiority of our method compared to state-of-the-art approaches.

Title: Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations. (arXiv:2312.09950v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09950
Code URL: null
Copy Paste: [[2312.09950]] Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations(http://arxiv.org/abs/2312.09950)
Summary:
Peer learning is a novel high-level reinforcement learning framework for agents learning in groups. While standard reinforcement learning trains an individual agent in trial-and-error fashion, all on its own, peer learning addresses a related setting in which a group of agents, i.e., peers, learns to master a task simultaneously together from scratch. Peers are allowed to communicate only about their own states and actions recommended by others: "What would you do in my situation?". Our motivation is to study the learning behavior of these agents. We formalize the teacher selection process in the action advice setting as a multi-armed bandit problem and therefore highlight the need for exploration. Eventually, we analyze the learning behavior of the peers and observe their ability to rank the agents' performance within the study group and understand which agents give reliable advice. Further, we compare peer learning with single agent learning and a state-of-the-art action advice baseline. We show that peer learning is able to outperform single-agent learning and the baseline in several challenging discrete and continuous OpenAI Gym domains. Doing so, we also show that within such a framework complex policies from action recommendations beyond discrete action spaces can evolve.

hallucination

prompt

code

Title: OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators. (arXiv:2312.09411v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09411
Code URL: https://github.com/tianyic/only_train_once
Copy Paste: [[2312.09411]] OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators(http://arxiv.org/abs/2312.09411)
Summary:
Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm. This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. Despite advancements, existing methods suffers from complex, multi-stage processes that demand substantial engineering and domain knowledge, limiting their broader applications. We introduce the third-generation Only-Train-Once (OTOv3), which first automatically trains and compresses a general DNN through pruning and erasing operations, creating a compact and competitive sub-network without the need of fine-tuning. OTOv3 simplifies and automates the training and compression process, minimizes the engineering efforts required from users. It offers key technological advancements: (i) automatic search space construction for general DNNs based on dependency graph analysis; (ii) Dual Half-Space Projected Gradient (DHSPG) and its enhanced version with hierarchical search (H2SPG) to reliably solve (hierarchical) structured sparsity problems and ensure sub-network validity; and (iii) automated sub-network construction using solutions from DHSPG/H2SPG and dependency graphs. Our empirical results demonstrate the efficacy of OTOv3 across various benchmarks in structured pruning and neural architecture search. OTOv3 produces sub-networks that match or exceed the state-of-the-arts. The source code will be available at https://github.com/tianyic/only_train_once.

Title: GSQA: An End-to-End Model for Generative Spoken Question Answering. (arXiv:2312.09781v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09781
Code URL: null
Copy Paste: [[2312.09781]] GSQA: An End-to-End Model for Generative Spoken Question Answering(http://arxiv.org/abs/2312.09781)
Summary:
In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding spoken question answering capabilities of abstractive QA. Our code is available at \href{https://voidful.github.io/GSQA}{https://voidful.github.io/GSQA}

Title: Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark. (arXiv:2312.09857v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09857
Code URL: https://github.com/ericssonresearch/uda-4-tsc
Copy Paste: [[2312.09857]] Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark(http://arxiv.org/abs/2312.09857)
Summary:
Unsupervised Domain Adaptation (UDA) aims to harness labeled source data to train models for unlabeled target data. Despite extensive research in domains like computer vision and natural language processing, UDA remains underexplored for time series data, which has widespread real-world applications ranging from medicine and manufacturing to earth observation and human activity recognition. Our paper addresses this gap by introducing a comprehensive benchmark for evaluating UDA techniques for time series classification, with a focus on deep learning methods. We provide seven new benchmark datasets covering various domain shifts and temporal dynamics, facilitating fair and standardized UDA method assessments with state of the art neural network backbones (e.g. Inception) for time series data. This benchmark offers insights into the strengths and limitations of the evaluated approaches while preserving the unsupervised nature of domain adaptation, making it directly applicable to practical problems. Our paper serves as a vital resource for researchers and practitioners, advancing domain adaptation solutions for time series data and fostering innovation in this critical field. The implementation code of this benchmark is available at https://github.com/EricssonResearch/UDA-4-TSC.

Title: Distributed Learning of Mixtures of Experts. (arXiv:2312.09877v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09877
Code URL: null
Copy Paste: [[2312.09877]] Distributed Learning of Mixtures of Experts(http://arxiv.org/abs/2312.09877)
Summary:
In modern machine learning problems we deal with datasets that are either distributed by nature or potentially large for which distributing the computations is usually a standard way to proceed, since centralized algorithms are in general ineffective. We propose a distributed learning approach for mixtures of experts (MoE) models with an aggregation strategy to construct a reduction estimator from local estimators fitted parallelly to distributed subsets of the data. The aggregation is based on an optimal minimization of an expected transportation divergence between the large MoE composed of local estimators and the unknown desired MoE model. We show that the provided reduction estimator is consistent as soon as the local estimators to be aggregated are consistent, and its construction is performed by a proposed majorization-minimization (MM) algorithm that is computationally effective. We study the statistical and numerical properties for the proposed reduction estimator on experiments that demonstrate its performance compared to namely the global estimator constructed in a centralized way from the full dataset. For some situations, the computation time is more than ten times faster, for a comparable performance. Our source codes are publicly available on Github.

Title: RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding. (arXiv:2312.09932v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09932
Code URL: null
Copy Paste: [[2312.09932]] RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding(http://arxiv.org/abs/2312.09932)
Summary:
Natural language understanding (NLU) using neural network pipelines often requires additional context that is not solely present in the input data. Through Prior research, it has been evident that NLU benchmarks are susceptible to manipulation by neural models, wherein these models exploit statistical artifacts within the encoded external knowledge to artificially inflate performance metrics for downstream tasks. Our proposed approach, known as the Recap, Deliberate, and Respond (RDR) paradigm, addresses this issue by incorporating three distinct objectives within the neural network pipeline. Firstly, the Recap objective involves paraphrasing the input text using a paraphrasing model in order to summarize and encapsulate its essence. Secondly, the Deliberation objective entails encoding external graph information related to entities mentioned in the input text, utilizing a graph embedding model. Finally, the Respond objective employs a classification head model that utilizes representations from the Recap and Deliberation modules to generate the final prediction. By cascading these three models and minimizing a combined loss, we mitigate the potential for gaming the benchmark and establish a robust method for capturing the underlying semantic patterns, thus enabling accurate predictions. To evaluate the effectiveness of the RDR method, we conduct tests on multiple GLUE benchmark tasks. Our results demonstrate improved performance compared to competitive baselines, with an enhancement of up to 2\% on standard metrics. Furthermore, we analyze the observed evidence for semantic understanding exhibited by RDR models, emphasizing their ability to avoid gaming the benchmark and instead accurately capture the true underlying semantic patterns.

Title: Symbolic Numeric Planning with Patterns. (arXiv:2312.09963v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09963
Code URL: null
Copy Paste: [[2312.09963]] Symbolic Numeric Planning with Patterns(http://arxiv.org/abs/2312.09963)
Summary:
In this paper, we propose a novel approach for solving linear numeric planning problems, called Symbolic Pattern Planning. Given a planning problem $\Pi$, a bound $n$ and a pattern -- defined as an arbitrary sequence of actions -- we encode the problem of finding a plan for $\Pi$ with bound $n$ as a formula with fewer variables and/or clauses than the state-of-the-art rolled-up and relaxed-relaxed-$\exists$ encodings. More importantly, we prove that for any given bound, it is never the case that the latter two encodings allow finding a valid plan while ours does not. On the experimental side, we consider 6 other planning systems -- including the ones which participated in this year's International Planning Competition (IPC) -- and we show that our planner Patty has remarkably good comparative performances on this year's IPC problems.

Title: SAT-Based Algorithms for Regular Graph Pattern Matching. (arXiv:2312.09995v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09995
Code URL: null
Copy Paste: [[2312.09995]] SAT-Based Algorithms for Regular Graph Pattern Matching(http://arxiv.org/abs/2312.09995)
Summary:
Graph matching is a fundamental problem in pattern recognition, with many applications such as software analysis and computational biology. One well-known type of graph matching problem is graph isomorphism, which consists of deciding if two graphs are identical. Despite its usefulness, the properties that one may check using graph isomorphism are rather limited, since it only allows strict equality checks between two graphs. For example, it does not allow one to check complex structural properties such as if the target graph is an arbitrary length sequence followed by an arbitrary size loop.

We propose a generalization of graph isomorphism that allows one to check such properties through a declarative specification. This specification is given in the form of a Regular Graph Pattern (ReGaP), a special type of graph, inspired by regular expressions, that may contain wildcard nodes that represent arbitrary structures such as variable-sized sequences or subgraphs. We propose a SAT-based algorithm for checking if a target graph matches a given ReGaP. We also propose a preprocessing technique for improving the performance of the algorithm and evaluate it through an extensive experimental evaluation on benchmarks from the CodeSearchNet dataset.

Title: Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition. (arXiv:2312.09583v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09583
Code URL: null
Copy Paste: [[2312.09583]] Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition(http://arxiv.org/abs/2312.09583)
Summary:
In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model. It is possible to achieve human-like recognition without the need to build a pronunciation dictionary in advance. However, due to the relative scarcity of training data on code-switching, the performance of ASR models tends to degrade drastically when encountering this phenomenon. Most past studies have simplified the learning complexity of the model by splitting the code-switching task into multiple tasks dealing with a single language and then learning the domain-specific knowledge of each language separately. Therefore, in this paper, we attempt to introduce language identification information into the middle layer of the ASR model's encoder. We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.

Title: Adaptive Integration of Partial Label Learning and Negative Learning for Enhanced Noisy Label Learning. (arXiv:2312.09505v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09505
Code URL: null
Copy Paste: [[2312.09505]] Adaptive Integration of Partial Label Learning and Negative Learning for Enhanced Noisy Label Learning(http://arxiv.org/abs/2312.09505)
Summary:
There has been significant attention devoted to the effectiveness of various domains, such as semi-supervised learning, contrastive learning, and meta-learning, in enhancing the performance of methods for noisy label learning (NLL) tasks. However, most existing methods still depend on prior assumptions regarding clean samples amidst different sources of noise (\eg, a pre-defined drop rate or a small subset of clean samples). In this paper, we propose a simple yet powerful idea called \textbf{NPN}, which revolutionizes \textbf{N}oisy label learning by integrating \textbf{P}artial label learning (PLL) and \textbf{N}egative learning (NL). Toward this goal, we initially decompose the given label space adaptively into the candidate and complementary labels, thereby establishing the conditions for PLL and NL. We propose two adaptive data-driven paradigms of label disambiguation for PLL: hard disambiguation and soft disambiguation. Furthermore, we generate reliable complementary labels using all non-candidate labels for NL to enhance model robustness through indirect supervision. To maintain label reliability during the later stage of model training, we introduce a consistency regularization term that encourages agreement between the outputs of multiple augmentations. Experiments conducted on both synthetically corrupted and real-world noisy datasets demonstrate the superiority of NPN compared to other state-of-the-art (SOTA) methods. The source code has been made available at {\color{purple}{\url{https://github.com/NUST-Machine-Intelligence-Laboratory/NPN}}}.

Title: Physics-informed Neural Network Estimation of Material Properties in Soft Tissue Nonlinear Biomechanical Models. (arXiv:2312.09787v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09787
Code URL: null
Copy Paste: [[2312.09787]] Physics-informed Neural Network Estimation of Material Properties in Soft Tissue Nonlinear Biomechanical Models(http://arxiv.org/abs/2312.09787)
Summary:
The development of biophysical models for clinical applications is rapidly advancing in the research community, thanks to their predictive nature and their ability to assist the interpretation of clinical data. However, high-resolution and accurate multi-physics computational models are computationally expensive and their personalisation involves fine calibration of a large number of parameters, which may be space-dependent, challenging their clinical translation. In this work, we propose a new approach which relies on the combination of physics-informed neural networks (PINNs) with three-dimensional soft tissue nonlinear biomechanical models, capable of reconstructing displacement fields and estimating heterogeneous patient-specific biophysical properties. The proposed learning algorithm encodes information from a limited amount of displacement and, in some cases, strain data, that can be routinely acquired in the clinical setting, and combines it with the physics of the problem, represented by a mathematical model based on partial differential equations, to regularise the problem and improve its convergence properties. Several benchmarks are presented to show the accuracy and robustness of the proposed method and its great potential to enable the robust and effective identification of patient-specific, heterogeneous physical properties, s.a. tissue stiffness properties. In particular, we demonstrate the capability of the PINN to detect the presence, location and severity of scar tissue, which is beneficial to develop personalised simulation models for disease diagnosis, especially for cardiac applications.

Title: Calibrated One Round Federated Learning with Bayesian Inference in the Predictive Space. (arXiv:2312.09817v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09817
Code URL: https://github.com/hasanmohsin/betapredbayes_fl
Copy Paste: [[2312.09817]] Calibrated One Round Federated Learning with Bayesian Inference in the Predictive Space(http://arxiv.org/abs/2312.09817)
Summary:
Federated Learning (FL) involves training a model over a dataset distributed among clients, with the constraint that each client's dataset is localized and possibly heterogeneous. In FL, small and noisy datasets are common, highlighting the need for well-calibrated models that represent the uncertainty of predictions. The closest FL techniques to achieving such goals are the Bayesian FL methods which collect parameter samples from local posteriors, and aggregate them to approximate the global posterior. To improve scalability for larger models, one common Bayesian approach is to approximate the global predictive posterior by multiplying local predictive posteriors. In this work, we demonstrate that this method gives systematically overconfident predictions, and we remedy this by proposing $\beta$-Predictive Bayes, a Bayesian FL algorithm that interpolates between a mixture and product of the predictive posteriors, using a tunable parameter $\beta$. This parameter is tuned to improve the global ensemble's calibration, before it is distilled to a single model. Our method is evaluated on a variety of regression and classification datasets to demonstrate its superiority in calibration to other baselines, even as data heterogeneity increases. Code available at https://github.com/hasanmohsin/betaPredBayes_FL

Title: Learning Distributions on Manifolds with Free-form Flows. (arXiv:2312.09852v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09852
Code URL: https://github.com/vislearn/fff
Copy Paste: [[2312.09852]] Learning Distributions on Manifolds with Free-form Flows(http://arxiv.org/abs/2312.09852)
Summary:
Many real world data, particularly in the natural sciences and computer vision, lie on known Riemannian manifolds such as spheres, tori or the group of rotation matrices. The predominant approaches to learning a distribution on such a manifold require solving a differential equation in order to sample from the model and evaluate densities. The resulting sampling times are slowed down by a high number of function evaluations. In this work, we propose an alternative approach which only requires a single function evaluation followed by a projection to the manifold. Training is achieved by an adaptation of the recently proposed free-form flow framework to Riemannian manifolds. The central idea is to estimate the gradient of the negative log-likelihood via a trace evaluated in the tangent space. We evaluate our method on various manifolds, and find significantly faster inference at competitive performance compared to previous work. We make our code public at https://github.com/vislearn/FFF.

Title: Automating reward function configuration for drug design. (arXiv:2312.09865v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09865
Code URL: null
Copy Paste: [[2312.09865]] Automating reward function configuration for drug design(http://arxiv.org/abs/2312.09865)
Summary:
Designing reward functions that guide generative molecular design (GMD) algorithms to desirable areas of chemical space is of critical importance in AI-driven drug discovery. Traditionally, this has been a manual and error-prone task; the selection of appropriate computational methods to approximate biological assays is challenging and the aggregation of computed values into a single score even more so, leading to potential reliance on trial-and-error approaches. We propose a novel approach for automated reward configuration that relies solely on experimental data, mitigating the challenges of manual reward adjustment on drug discovery projects. Our method achieves this by constructing a ranking over experimental data based on Pareto dominance over the multi-objective space, then training a neural network to approximate the reward function such that rankings determined by the predicted reward correlate with those determined by the Pareto dominance relation. We validate our method using two case studies. In the first study we simulate Design-Make-Test-Analyse (DMTA) cycles by alternating reward function updates and generative runs guided by that function. We show that the learned function adapts over time to yield compounds that score highly with respect to evaluation functions taken from the literature. In the second study we apply our algorithm to historical data from four real drug discovery projects. We show that our algorithm yields reward functions that outperform the predictive accuracy of human-defined functions, achieving an improvement of up to 0.4 in Spearman's correlation against a ground truth evaluation function that encodes the target drug profile for that project. Our method provides an efficient data-driven way to configure reward functions for GMD, and serves as a strong baseline for future research into transformative approaches for the automation of drug discovery.

Title: Sketch and shift: a robust decoder for compressive clustering. (arXiv:2312.09940v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09940
Code URL: null
Copy Paste: [[2312.09940]] Sketch and shift: a robust decoder for compressive clustering(http://arxiv.org/abs/2312.09940)
Summary:
Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning, by first summarizing a large dataset into a low-dimensional sketch vector, and then decoding from this sketch the latent information needed for learning. In light of recent progress on information preservation guarantees for sketches based on random features, a major objective is to design easy-to-tune algorithms (called decoders) to robustly and efficiently extract this information. To address the underlying non-convex optimization problems, various heuristics have been proposed. In the case of compressive clustering, the standard heuristic is CL-OMPR, a variant of sliding Frank-Wolfe. Yet, CL-OMPR is hard to tune, and the examination of its robustness was overlooked. In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. To gain insight, we show how the deficiencies of this algorithm can be attributed to optimization difficulties related to the structure of a correlation function appearing at core steps of the algorithm. To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. Its design is notably inspired from the mean shift algorithm, a classic approach to detect the local maxima of kernel density estimators. The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.

Title: Modeling Unknown Stochastic Dynamical System via Autoencoder. (arXiv:2312.10001v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.10001
Code URL: null
Copy Paste: [[2312.10001]] Modeling Unknown Stochastic Dynamical System via Autoencoder(http://arxiv.org/abs/2312.10001)
Summary:
We present a numerical method to learn an accurate predictive model for an unknown stochastic dynamical system from its trajectory data. The method seeks to approximate the unknown flow map of the underlying system. It employs the idea of autoencoder to identify the unobserved latent random variables. In our approach, we design an encoding function to discover the latent variables, which are modeled as unit Gaussian, and a decoding function to reconstruct the future states of the system. Both the encoder and decoder are expressed as deep neural networks (DNNs). Once the DNNs are trained by the trajectory data, the decoder serves as a predictive model for the unknown stochastic system. Through an extensive set of numerical examples, we demonstrate that the method is able to produce long-term system predictions by using short bursts of trajectory data. It is also applicable to systems driven by non-Gaussian noises.

Title: Symplectic Autoencoders for Model Reduction of Hamiltonian Systems. (arXiv:2312.10004v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.10004
Code URL: null
Copy Paste: [[2312.10004]] Symplectic Autoencoders for Model Reduction of Hamiltonian Systems(http://arxiv.org/abs/2312.10004)
Summary:
Many applications, such as optimization, uncertainty quantification and inverse problems, require repeatedly performing simulations of large-dimensional physical systems for different choices of parameters. This can be prohibitively expensive.

In order to save computational cost, one can construct surrogate models by expressing the system in a low-dimensional basis, obtained from training data. This is referred to as model reduction.

Past investigations have shown that, when performing model reduction of Hamiltonian systems, it is crucial to preserve the symplectic structure associated with the system in order to ensure long-term numerical stability.

Up to this point structure-preserving reductions have largely been limited to linear transformations. We propose a new neural network architecture in the spirit of autoencoders, which are established tools for dimension reduction and feature extraction in data science, to obtain more general mappings.

In order to train the network, a non-standard gradient descent approach is applied that leverages the differential-geometric structure emerging from the network design.

The new architecture is shown to significantly outperform existing designs in accuracy.

chat

retrieval augmented generation

rag

Title: Distributional Latent Variable Models with an Application in Active Cognitive Testing. (arXiv:2312.09316v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.09316
Code URL: null
Copy Paste: [[2312.09316]] Distributional Latent Variable Models with an Application in Active Cognitive Testing(http://arxiv.org/abs/2312.09316)
Summary:
Cognitive modeling commonly relies on asking participants to complete a battery of varied tests in order to estimate attention, working memory, and other latent variables. In many cases, these tests result in highly variable observation models. A near-ubiquitous approach is to repeat many observations for each test, resulting in a distribution over the outcomes from each test given to each subject. In this paper, we explore the usage of latent variable modeling to enable learning across many correlated variables simultaneously. We extend latent variable models (LVMs) to the setting where observed data for each subject are a series of observations from many different distributions, rather than simple vectors to be reconstructed. By embedding test battery results for individuals in a latent space that is trained jointly across a population, we are able to leverage correlations both between tests for a single participant and between multiple participants. We then propose an active learning framework that leverages this model to conduct more efficient cognitive test batteries. We validate our approach by demonstrating with real-time data acquisition that it performs comparably to conventional methods in making item-level predictions with fewer test items.

Title: Prediction of rare events in the operation of household equipment using co-evolving time series. (arXiv:2312.09410v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09410
Code URL: null
Copy Paste: [[2312.09410]] Prediction of rare events in the operation of household equipment using co-evolving time series(http://arxiv.org/abs/2312.09410)
Summary:
In this study, we propose an approach for predicting rare events by exploiting time series in coevolution. Our approach involves a weighted autologistic regression model, where we leverage the temporal behavior of the data to enhance predictive capabilities. By addressing the issue of imbalanced datasets, we establish constraints leading to weight estimation and to improved performance. Evaluation on synthetic and real-world datasets confirms that our approach outperform state-of-the-art of predicting home equipment failure methods.

Title: Entropy Causal Graphs for Multivariate Time Series Anomaly Detection. (arXiv:2312.09478v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09478
Code URL: null
Copy Paste: [[2312.09478]] Entropy Causal Graphs for Multivariate Time Series Anomaly Detection(http://arxiv.org/abs/2312.09478)
Summary:
Many multivariate time series anomaly detection frameworks have been proposed and widely applied. However, most of these frameworks do not consider intrinsic relationships between variables in multivariate time series data, thus ignoring the causal relationship among variables and degrading anomaly detection performance. This work proposes a novel framework called CGAD, an entropy Causal Graph for multivariate time series Anomaly Detection. CGAD utilizes transfer entropy to construct graph structures that unveil the underlying causal relationships among time series data. Weighted graph convolutional networks combined with causal convolutions are employed to model both the causal graph structures and the temporal patterns within multivariate time series data. Furthermore, CGAD applies anomaly scoring, leveraging median absolute deviation-based normalization to improve the robustness of the anomaly identification process. Extensive experiments demonstrate that CGAD outperforms state-of-the-art methods on real-world datasets with a 15% average improvement based on three different multivariate time series anomaly detection metrics.

Title: Multiple Instance Learning for Uplift Modeling. (arXiv:2312.09639v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09639
Code URL: null
Copy Paste: [[2312.09639]] Multiple Instance Learning for Uplift Modeling(http://arxiv.org/abs/2312.09639)
Summary:
Uplift modeling is widely used in performance marketing to estimate effects of promotion campaigns (e.g., increase of customer retention rate). Since it is impossible to observe outcomes of a recipient in treatment (e.g., receiving a certain promotion) and control (e.g., without promotion) groups simultaneously (i.e., counter-factual), uplift models are mainly trained on instances of treatment and control groups separately to form two models respectively, and uplifts are predicted by the difference of predictions from these two models (i.e., two-model method). When responses are noisy and the treatment effect is fractional, induced individual uplift predictions will be inaccurate, resulting in targeting undesirable customers. Though it is impossible to obtain the ideal ground-truth individual uplifts, known as Individual Treatment Effects (ITEs), alternatively, an average uplift of a group of users, called Average Treatment Effect (ATE), can be observed from experimental deliveries. Upon this, similar to Multiple Instance Learning (MIL) in which each training sample is a bag of instances, our framework sums up individual user uplift predictions for each bag of users as its bag-wise ATE prediction, and regularizes it to its ATE label, thus learning more accurate individual uplifts. Additionally, to amplify the fractional treatment effect, bags are composed of instances with adjacent individual uplift predictions, instead of random instances. Experiments conducted on two datasets show the effectiveness and universality of the proposed framework.

Title: Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach. (arXiv:2312.09758v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09758
Code URL: null
Copy Paste: [[2312.09758]] Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach(http://arxiv.org/abs/2312.09758)
Summary:
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. The \emph{fake invariance} severely endangers OOD generalization since the trustful objective can not be diagnosed and existing causal surgeries are invalid to rectify. In this paper, we review a IRL family (InvRat) under the Partially and Fully Informative Invariant Feature Structural Causal Models (PIIF SCM /FIIF SCM) respectively, to certify their weaknesses in representing fake invariant features, then, unify their causal diagrams to propose ReStructured SCM (RS-SCM). RS-SCM can ideally rebuild the spurious and the fake invariant features simultaneously. Given this, we further develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects. It can be easily implemented by a small feature selection subnet introduced in the IRL family, which is alternatively optimized to achieve our goal. Experiments verified the superiority of our approach to fight against the fake invariant issue across a variety of OOD generalization benchmarks.

Title: Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation. (arXiv:2312.09844v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09844
Code URL: null
Copy Paste: [[2312.09844]] Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation(http://arxiv.org/abs/2312.09844)
Summary:
Offline reinforcement learning leverages pre-collected datasets of transitions to train policies. It can serve as effective initialization for online algorithms, enhancing sample efficiency and speeding up convergence. However, when such datasets are limited in size and quality, offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance. In this paper we propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective. Our approach leverages a world model of the environment trained on the offline dataset to augment states during offline pre-training. We evaluate our approach on a variety of MuJoCo robotic tasks and our results show it can jump-start online fine-tuning and substantially reduce - in some cases by an order of magnitude - the required number of environment interactions.

Title: MANTIS at #SMM4H 2023: Leveraging Hybrid and Ensemble Models for Detection of Social Anxiety Disorder on Reddit. (arXiv:2312.09451v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09451
Code URL: null
Copy Paste: [[2312.09451]] MANTIS at #SMM4H 2023: Leveraging Hybrid and Ensemble Models for Detection of Social Anxiety Disorder on Reddit(http://arxiv.org/abs/2312.09451)
Summary:
This paper presents our system employed for the Social Media Mining for Health 2023 Shared Task 4: Binary classification of English Reddit posts self-reporting a social anxiety disorder diagnosis. We systematically investigate and contrast the efficacy of hybrid and ensemble models that harness specialized medical domain-adapted transformers in conjunction with BiLSTM neural networks. The evaluation results outline that our best performing model obtained 89.31% F1 on the validation set and 83.76% F1 on the test set.

Title: Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach. (arXiv:2312.09718v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.09718
Code URL: https://github.com/homoscribens/shortcut_reasoning
Copy Paste: [[2312.09718]] Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach(http://arxiv.org/abs/2312.09718)
Summary:
Shortcut reasoning is an irrational process of inference, which degrades the robustness of an NLP model. While a number of previous work has tackled the identification of shortcut reasoning, there are still two major limitations: (i) a method for quantifying the severity of the discovered shortcut reasoning is not provided; (ii) certain types of shortcut reasoning may be missed. To address these issues, we propose a novel method for identifying shortcut reasoning. The proposed method quantifies the severity of the shortcut reasoning by leveraging out-of-distribution data and does not make any assumptions about the type of tokens triggering the shortcut reasoning. Our experiments on Natural Language Inference and Sentiment Analysis demonstrate that our framework successfully discovers known and unknown shortcut reasoning in the previous work.

Title: Optimal Regret Bounds for Collaborative Learning in Bandits. (arXiv:2312.09674v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09674
Code URL: null
Copy Paste: [[2312.09674]] Optimal Regret Bounds for Collaborative Learning in Bandits(http://arxiv.org/abs/2312.09674)
Summary:
We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.

Title: Urban Region Embedding via Multi-View Contrastive Prediction. (arXiv:2312.09681v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09681
Code URL: null
Copy Paste: [[2312.09681]] Urban Region Embedding via Multi-View Contrastive Prediction(http://arxiv.org/abs/2312.09681)
Summary:
Recently, learning urban region representations utilizing multi-modal data (information views) has become increasingly popular, for deep understanding of the distributions of various socioeconomic features in cities. However, previous methods usually blend multi-view information in a posteriors stage, falling short in learning coherent and consistent representations across different views. In this paper, we form a new pipeline to learn consistent representations across varying views, and propose the multi-view Contrastive Prediction model for urban Region embedding (ReCP), which leverages the multiple information views from point-of-interest (POI) and human mobility data. Specifically, ReCP comprises two major modules, namely an intra-view learning module utilizing contrastive learning and feature reconstruction to capture the unique information from each single view, and inter-view learning module that perceives the consistency between the two views using a contrastive prediction learning scheme. We conduct thorough experiments on two downstream tasks to assess the proposed model, i.e., land use clustering and region popularity prediction. The experimental results demonstrate that our model outperforms state-of-the-art baseline methods significantly in urban region representation learning.

Title: A Comparative Evaluation of Additive Separability Tests for Physics-Informed Machine Learning. (arXiv:2312.09775v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09775
Code URL: null
Copy Paste: [[2312.09775]] A Comparative Evaluation of Additive Separability Tests for Physics-Informed Machine Learning(http://arxiv.org/abs/2312.09775)
Summary:
Many functions characterising physical systems are additively separable. This is the case, for instance, of mechanical Hamiltonian functions in physics, population growth equations in biology, and consumer preference and utility functions in economics. We consider the scenario in which a surrogate of a function is to be tested for additive separability. The detection that the surrogate is additively separable can be leveraged to improve further learning. Hence, it is beneficial to have the ability to test for such separability in surrogates. The mathematical approach is to test if the mixed partial derivative of the surrogate is zero; or empirically, lower than a threshold. We present and comparatively and empirically evaluate the eight methods to compute the mixed partial derivative of a surrogate function.

Title: Hypergraph-MLP: Learning on Hypergraphs without Message Passing. (arXiv:2312.09778v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09778
Code URL: https://github.com/tbh-98/hypergraph-mlp
Copy Paste: [[2312.09778]] Hypergraph-MLP: Learning on Hypergraphs without Message Passing(http://arxiv.org/abs/2312.09778)
Summary:
Hypergraphs are vital in modelling data with higher-order relations containing more than two entities, gaining prominence in machine learning and signal processing. Many hypergraph neural networks leverage message passing over hypergraph structures to enhance node representation learning, yielding impressive performances in tasks like hypergraph node classification. However, these message-passing-based models face several challenges, including oversmoothing as well as high latency and sensitivity to structural perturbations at inference time. To tackle those challenges, we propose an alternative approach where we integrate the information about hypergraph structures into training supervision without explicit message passing, thus also removing the reliance on it at inference. Specifically, we introduce Hypergraph-MLP, a novel learning framework for hypergraph-structured data, where the learning model is a straightforward multilayer perceptron (MLP) supervised by a loss function based on a notion of signal smoothness on hypergraphs. Experiments on hypergraph node classification tasks demonstrate that Hypergraph-MLP achieves competitive performance compared to existing baselines, and is considerably faster and more robust against structural perturbations at inference.

Title: End-to-End Training of Neural Networks for Automotive Radar Interference Mitigation. (arXiv:2312.09790v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09790
Code URL: null
Copy Paste: [[2312.09790]] End-to-End Training of Neural Networks for Automotive Radar Interference Mitigation(http://arxiv.org/abs/2312.09790)
Summary:
In this paper we propose a new method for training neural networks (NNs) for frequency modulated continuous wave (FMCW) radar mutual interference mitigation. Instead of training NNs to regress from interfered to clean radar signals as in previous work, we train NNs directly on object detection maps. We do so by performing a continuous relaxation of the cell-averaging constant false alarm rate (CA-CFAR) peak detector, which is a well-established algorithm for object detection using radar. With this new training objective we are able to increase object detection performance by a large margin. Furthermore, we introduce separable convolution kernels to strongly reduce the number of parameters and computational complexity of convolutional NN architectures for radar applications. We validate our contributions with experiments on real-world measurement data and compare them against signal processing interference mitigation methods.

Title: Fragility, Robustness and Antifragility in Deep Learning. (arXiv:2312.09821v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09821
Code URL: null
Copy Paste: [[2312.09821]] Fragility, Robustness and Antifragility in Deep Learning(http://arxiv.org/abs/2312.09821)
Summary:
We propose a systematic analysis of deep neural networks (DNNs) based on a signal processing technique for network parameter removal, in the form of synaptic filters that identifies the fragility, robustness and antifragility characteristics of DNN parameters. Our proposed analysis investigates if the DNN performance is impacted negatively, invariantly, or positively on both clean and adversarially perturbed test datasets when the DNN undergoes synaptic filtering. We define three \textit{filtering scores} for quantifying the fragility, robustness and antifragility characteristics of DNN parameters based on the performances for (i) clean dataset, (ii) adversarial dataset, and (iii) the difference in performances of clean and adversarial datasets. We validate the proposed systematic analysis on ResNet-18, ResNet-50, SqueezeNet-v1.1 and ShuffleNet V2 x1.0 network architectures for MNIST, CIFAR10 and Tiny ImageNet datasets. The filtering scores, for a given network architecture, identify network parameters that are invariant in characteristics across different datasets over learning epochs. Vice-versa, for a given dataset, the filtering scores identify the parameters that are invariant in characteristics across different network architectures. We show that our synaptic filtering method improves the test accuracy of ResNet and ShuffleNet models on adversarial datasets when only the robust and antifragile parameters are selectively retrained at any given epoch, thus demonstrating applications of the proposed strategy in improving model robustness.

2023-12-18

language model

Title: Self-Evaluation Improves Selective Generation in Large Language Models. (arXiv:2312.09300v1 [cs.CL])

Title: ArchiGuesser -- AI Art Architecture Educational Game. (arXiv:2312.09334v1 [cs.AI])

Title: Large Language Models for Autonomous Driving: Real-World Experiments. (arXiv:2312.09397v1 [cs.AI])

Title: Clinical Text Deduplication Practices for Efficient Pretraining and Improved Clinical Tasks. (arXiv:2312.09469v1 [cs.CL])

Title: Grounding for Artificial Intelligence. (arXiv:2312.09532v1 [cs.AI])

Title: On a Functional Definition of Intelligence. (arXiv:2312.09546v1 [cs.AI])

Title: Prompting Large Language Models for Topic Modeling. (arXiv:2312.09693v1 [cs.AI])

Title: Improving Biomedical Entity Linking with Retrieval-enhanced Learning. (arXiv:2312.09806v1 [cs.CL])

Title: SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models. (arXiv:2312.09818v1 [cs.CL])

Title: Neurosymbolic Value-Inspired AI (Why, What, and How). (arXiv:2312.09928v1 [cs.AI])

Title: Distilling Large Language Models for Matching Patients to Clinical Trials. (arXiv:2312.09958v1 [cs.AI])

Title: Data and Approaches for German Text simplification -- towards an Accessibility-enhanced Communication. (arXiv:2312.09966v1 [cs.CL])

Title: Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision. (arXiv:2312.09390v1 [cs.CL])

Title: Marathon: A Race Through the Realm of Long Context with Large Language Models. (arXiv:2312.09542v1 [cs.CL])

Title: Extending Context Window of Large Language Models via Semantic Compression. (arXiv:2312.09571v1 [cs.CL])

Title: Probing Pretrained Language Models with Hierarchy Properties. (arXiv:2312.09670v1 [cs.CL])

Title: RJUA-QA: A Comprehensive QA Dataset for Urology. (arXiv:2312.09785v1 [cs.CL])

Title: ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs). (arXiv:2312.09801v1 [cs.CL])

Title: Grammatical information in BERT sentence embeddings as two-dimensional arrays. (arXiv:2312.09890v1 [cs.CL])

Title: Generative Context-aware Fine-tuning of Self-supervised Speech Models. (arXiv:2312.09895v1 [cs.CL])

Title: The Art of Balancing: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment. (arXiv:2312.09979v1 [cs.CL])

Title: LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language. (arXiv:2312.09993v1 [cs.CL])

Title: Faithful Persona-based Conversational Dataset Generation with Large Language Models. (arXiv:2312.10007v1 [cs.CL])

gpt

Title: Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM. (arXiv:2312.09366v1 [cs.CL])

Title: GPT-4 Surpassing Human Performance in Linguistic Pragmatics. (arXiv:2312.09545v1 [cs.CL])

Title: 3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V. (arXiv:2312.09738v1 [cs.AI])

Title: A Novel Dataset for Financial Education Text Simplification in Spanish. (arXiv:2312.09897v1 [cs.AI])

Title: Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China. (arXiv:2312.09917v1 [cs.CL])

llm

Title: Challenges with unsupervised LLM knowledge discovery. (arXiv:2312.10029v1 [cs.LG])

Title: ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent. (arXiv:2312.10003v1 [cs.CL])

long context

lora

Title: Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning. (arXiv:2312.09539v1 [cs.AI])

Title: Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations. (arXiv:2312.09950v1 [cs.LG])

hallucination

prompt

code

Title: OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators. (arXiv:2312.09411v1 [cs.LG])

Title: GSQA: An End-to-End Model for Generative Spoken Question Answering. (arXiv:2312.09781v1 [cs.CL])

Title: Deep Unsupervised Domain Adaptation for Time Series Classification: a Benchmark. (arXiv:2312.09857v1 [cs.LG])

Title: Distributed Learning of Mixtures of Experts. (arXiv:2312.09877v1 [cs.LG])

Title: RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding. (arXiv:2312.09932v1 [cs.CL])

Title: Symbolic Numeric Planning with Patterns. (arXiv:2312.09963v1 [cs.AI])

Title: SAT-Based Algorithms for Regular Graph Pattern Matching. (arXiv:2312.09995v1 [cs.AI])

Title: Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition. (arXiv:2312.09583v1 [cs.CL])

Title: Adaptive Integration of Partial Label Learning and Negative Learning for Enhanced Noisy Label Learning. (arXiv:2312.09505v1 [cs.LG])

Title: Physics-informed Neural Network Estimation of Material Properties in Soft Tissue Nonlinear Biomechanical Models. (arXiv:2312.09787v1 [cs.LG])

Title: Calibrated One Round Federated Learning with Bayesian Inference in the Predictive Space. (arXiv:2312.09817v1 [cs.LG])

Title: Learning Distributions on Manifolds with Free-form Flows. (arXiv:2312.09852v1 [cs.LG])

Title: Automating reward function configuration for drug design. (arXiv:2312.09865v1 [cs.LG])

Title: Sketch and shift: a robust decoder for compressive clustering. (arXiv:2312.09940v1 [cs.LG])

Title: Modeling Unknown Stochastic Dynamical System via Autoencoder. (arXiv:2312.10001v1 [cs.LG])

Title: Symplectic Autoencoders for Model Reduction of Hamiltonian Systems. (arXiv:2312.10004v1 [cs.LG])

chat

retrieval augmented generation

rag

Title: Distributional Latent Variable Models with an Application in Active Cognitive Testing. (arXiv:2312.09316v1 [cs.AI])

Title: Prediction of rare events in the operation of household equipment using co-evolving time series. (arXiv:2312.09410v1 [cs.LG])

Title: Entropy Causal Graphs for Multivariate Time Series Anomaly Detection. (arXiv:2312.09478v1 [cs.LG])

Title: Multiple Instance Learning for Uplift Modeling. (arXiv:2312.09639v1 [cs.LG])

Title: Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach. (arXiv:2312.09758v1 [cs.LG])

Title: Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation. (arXiv:2312.09844v1 [cs.LG])

Title: MANTIS at #SMM4H 2023: Leveraging Hybrid and Ensemble Models for Detection of Social Anxiety Disorder on Reddit. (arXiv:2312.09451v1 [cs.CL])

Title: Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach. (arXiv:2312.09718v1 [cs.CL])

Title: Optimal Regret Bounds for Collaborative Learning in Bandits. (arXiv:2312.09674v1 [cs.LG])

Title: Urban Region Embedding via Multi-View Contrastive Prediction. (arXiv:2312.09681v1 [cs.LG])

Title: A Comparative Evaluation of Additive Separability Tests for Physics-Informed Machine Learning. (arXiv:2312.09775v1 [cs.LG])

Title: Hypergraph-MLP: Learning on Hypergraphs without Message Passing. (arXiv:2312.09778v1 [cs.LG])

Title: End-to-End Training of Neural Networks for Automotive Radar Interference Mitigation. (arXiv:2312.09790v1 [cs.LG])

Title: Fragility, Robustness and Antifragility in Deep Learning. (arXiv:2312.09821v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought