language model

Title: An Evaluation of State-of-the-Art Large Language Models for Sarcasm Detection. (arXiv:2312.03706v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03706
Code URL: null
Copy Paste: [[2312.03706]] An Evaluation of State-of-the-Art Large Language Models for Sarcasm Detection(http://arxiv.org/abs/2312.03706)
Summary:
Sarcasm, as defined by Merriam-Webster, is the use of words by someone who means the opposite of what he is trying to say. In the field of sentimental analysis of Natural Language Processing, the ability to correctly identify sarcasm is necessary for understanding people's true opinions. Because the use of sarcasm is often context-based, previous research has used language representation models, such as Support Vector Machine (SVM) and Long Short-Term Memory (LSTM), to identify sarcasm with contextual-based information. Recent innovations in NLP have provided more possibilities for detecting sarcasm. In BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin et al. (2018) introduced a new language representation model and demonstrated higher precision in interpreting contextualized language. As proposed by Hazarika et al. (2018), CASCADE is a context-driven model that produces good results for detecting sarcasm. This study analyzes a Reddit corpus using these two state-of-the-art models and evaluates their performance against baseline models to find the ideal approach to sarcasm detection.

Title: Abstraction via exemplars? A representational case study on lexical category inference in BERT. (arXiv:2312.03708v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03708
Code URL: null
Copy Paste: [[2312.03708]] Abstraction via exemplars? A representational case study on lexical category inference in BERT(http://arxiv.org/abs/2312.03708)
Summary:
Exemplar based accounts are often considered to be in direct opposition to pure linguistic abstraction in explaining language learners' ability to generalize to novel expressions. However, the recent success of neural network language models on linguistically sensitive tasks suggests that perhaps abstractions can arise via the encoding of exemplars. We provide empirical evidence for this claim by adapting an existing experiment that studies how an LM (BERT) generalizes the usage of novel tokens that belong to lexical categories such as Noun/Verb/Adjective/Adverb from exposure to only a single instance of their usage. We analyze the representational behavior of the novel tokens in these experiments, and find that BERT's capacity to generalize to unseen expressions involving the use of these novel tokens constitutes the movement of novel token representations towards regions of known category exemplars in two-dimensional space. Our results suggest that learners' encoding of exemplars can indeed give rise to abstraction like behavior.

Title: Large Language Models in Law: A Survey. (arXiv:2312.03718v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03718
Code URL: null
Copy Paste: [[2312.03718]] Large Language Models in Law: A Survey(http://arxiv.org/abs/2312.03718)
Summary:
The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges.

Title: Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability. (arXiv:2312.03721v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03721
Code URL: null
Copy Paste: [[2312.03721]] Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability(http://arxiv.org/abs/2312.03721)
Summary:
There has been increasing interest in evaluations of language models for a variety of risks and characteristics. Evaluations relying on natural language understanding for grading can often be performed at scale by using other language models. We test the robustness of these model-graded evaluations to injections on different datasets including a new Deception Eval. These injections resemble direct communication between the testee and the evaluator to change their grading. We extrapolate that future, more intelligent models might manipulate or cooperate with their evaluation model. We find significant susceptibility to these injections in state-of-the-art commercial models on all examined evaluations. Furthermore, similar injections can be used on automated interpretability frameworks to produce misleading model-written explanations. The results inspire future work and should caution against unqualified trust in evaluations and automated interpretability.

Title: DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer. (arXiv:2312.03724v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03724
Code URL: null
Copy Paste: [[2312.03724]] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer(http://arxiv.org/abs/2312.03724)
Summary:
Large Language Models (LLMs) have emerged as dominant tools for various tasks, particularly when tailored for a specific target by prompt tuning. Nevertheless, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information. A practical solution is to host a local LLM and optimize a soft prompt privately using data. Yet, hosting a local model becomes problematic when model ownership is protected. Alternative methods, like sending data to the model's provider for training, intensify these privacy issues facing an untrusted provider. In this paper, we present a novel solution called Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge. Our approach involves tuning a discrete prompt on the client side and then applying it to the desired cloud models. We demonstrate that prompts suggested by LLMs themselves can be transferred without compromising performance significantly. To ensure that the prompts do not leak private information, we introduce the first private prompt generation mechanism, by a differentially-private (DP) ensemble of in-context learning with private demonstrations. With DP-OPT, generating privacy-preserving prompts by Vicuna-7b can yield competitive performance compared to non-private in-context learning on GPT3.5 or local private prompt tuning. Codes are available at https://github.com/VITA-Group/DP-OPT .

Title: Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?. (arXiv:2312.03729v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03729
Code URL: null
Copy Paste: [[2312.03729]] Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?(http://arxiv.org/abs/2312.03729)
Summary:
Neural language models (LMs) can be used to evaluate the truth of factual statements in two ways: they can be either queried for statement probabilities, or probed for internal representations of truthfulness. Past work has found that these two procedures sometimes disagree, and that probes tend to be more accurate than LM outputs. This has led some researchers to conclude that LMs "lie" or otherwise encode non-cooperative communicative intents. Is this an accurate description of today's LMs, or can query-probe disagreement arise in other ways? We identify three different classes of disagreement, which we term confabulation, deception, and heterogeneity. In many cases, the superiority of probes is simply attributable to better calibration on uncertain answers rather than a greater fraction of correct, high-confidence answers. In some cases, queries and probes perform better on different subsets of inputs, and accuracy can further be improved by ensembling the two. Code is available at github.com/lingo-mit/lm-truthfulness.

Title: FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections. (arXiv:2312.03730v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03730
Code URL: null
Copy Paste: [[2312.03730]] FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections(http://arxiv.org/abs/2312.03730)
Summary:
In today's technologically driven world, the spread of fake news, particularly during crucial events such as elections, presents an increasing challenge to the integrity of information. To address this challenge, we introduce FakeWatch ElectionShield, an innovative framework carefully designed to detect fake news. We have created a novel dataset of North American election-related news articles through a blend of advanced language models (LMs) and thorough human verification, for precision and relevance. We propose a model hub of LMs for identifying fake news. Our goal is to provide the research community with adaptable and accurate classification models in recognizing the dynamic nature of misinformation. Extensive evaluation of fake news classifiers on our dataset and a benchmark dataset shows our that while state-of-the-art LMs slightly outperform the traditional ML models, classical models are still competitive with their balance of accuracy, explainability, and computational efficiency. This research sets the foundation for future studies to address misinformation related to elections.

Title: Methods to Estimate Large Language Model Confidence. (arXiv:2312.03733v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03733
Code URL: null
Copy Paste: [[2312.03733]] Methods to Estimate Large Language Model Confidence(http://arxiv.org/abs/2312.03733)
Summary:
Large Language Models have difficulty communicating uncertainty, which is a significant obstacle to applying LLMs to complex medical tasks. This study evaluates methods to measure LLM confidence when suggesting a diagnosis for challenging clinical vignettes. GPT4 was asked a series of challenging case questions using Chain of Thought and Self Consistency prompting. Multiple methods were investigated to assess model confidence and evaluated on their ability to predict the models observed accuracy. The methods evaluated were Intrinsic Confidence, SC Agreement Frequency and CoT Response Length. SC Agreement Frequency correlated with observed accuracy, yielding a higher Area under the Receiver Operating Characteristic Curve compared to Intrinsic Confidence and CoT Length analysis. SC agreement is the most useful proxy for model confidence, especially for medical diagnosis. Model Intrinsic Confidence and CoT Response Length exhibit a weaker ability to differentiate between correct and incorrect answers, preventing them from being reliable and interpretable markers for model confidence. We conclude GPT4 has a limited ability to assess its own diagnostic accuracy. SC Agreement Frequency is the most useful method to measure GPT4 confidence.

Title: Advancing State of the Art in Language Modeling. (arXiv:2312.03735v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03735
Code URL: null
Copy Paste: [[2312.03735]] Advancing State of the Art in Language Modeling(http://arxiv.org/abs/2312.03735)
Summary:
Generalization is arguably the most important goal of statistical language modeling research. Publicly available benchmarks and papers published with an open-source code have been critical to advancing the field. However, it is often very difficult, and sometimes even impossible, to reproduce the results fully as reported in publications. In this paper, we propose a simple framework that should help advance the state of the art in language modeling in terms of generalization. We propose to publish not just the code, but also probabilities on dev and test sets with future publications so that one can easily add the new model into an ensemble. This has crucial advantages: it is much easier to determine whether a newly proposed model is actually complementary to the current baseline. Therefore, instead of inventing new names for the old tricks, the scientific community can advance faster. Finally, this approach promotes diversity of ideas: one does not need to create an individual model that is the new state of the art to attract attention; it will be sufficient to develop a new model that learns patterns which other models do not. Thus, even a suboptimal model can be found to have value. Remarkably, our approach has yielded new state-of-the-art results across various language modeling benchmarks up to 10%.

Title: Evaluating Large Language Model Creativity from a Literary Perspective. (arXiv:2312.03746v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03746
Code URL: null
Copy Paste: [[2312.03746]] Evaluating Large Language Model Creativity from a Literary Perspective(http://arxiv.org/abs/2312.03746)
Summary:
This paper assesses the potential for large language models (LLMs) to serve as assistive tools in the creative writing process, by means of a single, in-depth case study. In the course of the study, we develop interactive and multi-voice prompting strategies that interleave background descriptions (scene setting, plot elements), instructions that guide composition, samples of text in the target style, and critical discussion of the given samples. We qualitatively evaluate the results from a literary critical perspective, as well as from the standpoint of computational creativity (a sub-field of artificial intelligence). Our findings lend support to the view that the sophistication of the results that can be achieved with an LLM mirrors the sophistication of the prompting.

Title: Applying Large Language Models and Chain-of-Thought for Automatic Scoring. (arXiv:2312.03748v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03748
Code URL: null
Copy Paste: [[2312.03748]] Applying Large Language Models and Chain-of-Thought for Automatic Scoring(http://arxiv.org/abs/2312.03748)
Summary:
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT)in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of automatic assessment tools among researchers and educators. We used a testing dataset comprising six assessment tasks (three binomial and three trinomial) with 1,650 student responses. We employed six prompt engineering strategies, combining zero-shot or few-shot learning with CoT, either alone or alongside item stem and scoring rubrics. Results indicated that few-shot (acc = .67) outperformed zero-shot learning (acc = .60), with 12.6\% increase. CoT, when used without item stem and scoring rubrics, did not significantly affect scoring accuracy (acc = .60). However, CoT prompting paired with contextual item stems and rubrics proved to be a significant contributor to scoring accuracy (13.44\% increase for zero-shot; 3.7\% increase for few-shot). Using a novel approach PPEAS, we found a more balanced accuracy across different proficiency categories, highlighting the importance of domain-specific reasoning in enhancing the effectiveness of LLMs in scoring tasks. Additionally, we also found that GPT-4 demonstrated superior performance over GPT-3.5 in various scoring tasks, showing 8.64\% difference. The study revealed that the single-call strategy with GPT-4, particularly using greedy sampling, outperformed other approaches, including ensemble voting strategies. This study demonstrates the potential of LLMs in facilitating automatic scoring, emphasizing that CoT enhances accuracy, particularly when used with item stem and scoring rubrics.

Title: Conceptual Engineering Using Large Language Models. (arXiv:2312.03749v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03749
Code URL: null
Copy Paste: [[2312.03749]] Conceptual Engineering Using Large Language Models(http://arxiv.org/abs/2312.03749)
Summary:
We describe a method, based on Jennifer Nado's definition of classification procedures as targets of conceptual engineering, that implements such procedures using a large language model. We then apply this method using data from the Wikidata knowledge graph to evaluate concept definitions from two paradigmatic conceptual engineering projects: the International Astronomical Union's redefinition of PLANET and Haslanger's ameliorative analysis of WOMAN. We discuss implications of this work for the theory and practice of conceptual engineering. The code and data can be found on GitHub.

Title: Near-real-time Earthquake-induced Fatality Estimation using Crowdsourced Data and Large-Language Models. (arXiv:2312.03755v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03755
Code URL: null
Copy Paste: [[2312.03755]] Near-real-time Earthquake-induced Fatality Estimation using Crowdsourced Data and Large-Language Models(http://arxiv.org/abs/2312.03755)
Summary:
When a damaging earthquake occurs, immediate information about casualties is critical for time-sensitive decision-making by emergency response and aid agencies in the first hours and days. Systems such as Prompt Assessment of Global Earthquakes for Response (PAGER) by the U.S. Geological Survey (USGS) were developed to provide a forecast within about 30 minutes of any significant earthquake globally. Traditional systems for estimating human loss in disasters often depend on manually collected early casualty reports from global media, a process that's labor-intensive and slow with notable time delays. Recently, some systems have employed keyword matching and topic modeling to extract relevant information from social media. However, these methods struggle with the complex semantics in multilingual texts and the challenge of interpreting ever-changing, often conflicting reports of death and injury numbers from various unverified sources on social media platforms. In this work, we introduce an end-to-end framework to significantly improve the timeliness and accuracy of global earthquake-induced human loss forecasting using multi-lingual, crowdsourced social media. Our framework integrates (1) a hierarchical casualty extraction model built upon large language models, prompt design, and few-shot learning to retrieve quantitative human loss claims from social media, (2) a physical constraint-aware, dynamic-truth discovery model that discovers the truthful human loss from massive noisy and potentially conflicting human loss claims, and (3) a Bayesian updating loss projection model that dynamically updates the final loss estimation using discovered truths. We test the framework in real-time on a series of global earthquake events in 2021 and 2022 and show that our framework streamlines casualty data retrieval, achieving speed and accuracy comparable to manual methods by USGS.

Title: How should the advent of large language models affect the practice of science?. (arXiv:2312.03759v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03759
Code URL: null
Copy Paste: [[2312.03759]] How should the advent of large language models affect the practice of science?(http://arxiv.org/abs/2312.03759)
Summary:
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and over-hyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices.

Title: Improving Activation Steering in Language Models with Mean-Centring. (arXiv:2312.03813v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03813
Code URL: null
Copy Paste: [[2312.03813]] Improving Activation Steering in Language Models with Mean-Centring(http://arxiv.org/abs/2312.03813)
Summary:
Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors. This is difficult because engineers do not typically know how features are represented in these models. We seek to address this issue by applying the idea of mean-centring to steering vectors. We find that taking the average of activations associated with a target dataset, and then subtracting the mean of all training activations, results in effective steering vectors. We test this method on a variety of models on natural language tasks by steering away from generating toxic text, and steering the completion of a story towards a target genre. We also apply mean-centring to extract function vectors, more effectively triggering the execution of a range of natural language tasks by a significant margin (compared to previous baselines). This suggests that mean-centring can be used to easily improve the effectiveness of activation steering in a wide range of contexts.

Title: Efficient Large Language Models: A Survey. (arXiv:2312.03863v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03863
Code URL: https://github.com/aiot-mlsys-lab/efficientllms
Copy Paste: [[2312.03863]] Efficient Large Language Models: A Survey(http://arxiv.org/abs/2312.03863)
Summary:
Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges. In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we compile the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/EfficientLLMs, https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey, and will actively maintain this repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.

Title: FoMo Rewards: Can we cast foundation models as reward functions?. (arXiv:2312.03881v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03881
Code URL: null
Copy Paste: [[2312.03881]] FoMo Rewards: Can we cast foundation models as reward functions?(http://arxiv.org/abs/2312.03881)
Summary:
We explore the viability of casting foundation models as generic reward functions for reinforcement learning. To this end, we propose a simple pipeline that interfaces an off-the-shelf vision model with a large language model. Specifically, given a trajectory of observations, we infer the likelihood of an instruction describing the task that the user wants an agent to perform. We show that this generic likelihood function exhibits the characteristics ideally expected from a reward function: it associates high values with the desired behaviour and lower values for several similar, but incorrect policies. Overall, our work opens the possibility of designing open-ended agents for interactive tasks via foundation models.

Title: A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints. (arXiv:2312.03905v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03905
Code URL: null
Copy Paste: [[2312.03905]] A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints(http://arxiv.org/abs/2312.03905)
Summary:
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. This often requires maximizing the likelihood of a symbolic constraint w.r.t the neural network's output distribution. Such output distributions are typically assumed to be fully-factorized. This limits the applicability of neuro-symbolic learning to the more expressive autoregressive distributions, e.g., transformers. Under such distributions, computing the likelihood of even simple constraints is #P-hard. Instead of attempting to enforce the constraint on the entire output distribution, we propose to do so on a random, local approximation thereof. More precisely, we optimize the likelihood of the constraint under a pseudolikelihood-based approximation centered around a model sample. Our approximation is factorized, allowing the reuse of solutions to sub-problems, a main tenet for efficiently computing neuro-symbolic losses. Moreover, it is a local, high-fidelity approximation of the likelihood, exhibiting low entropy and KL-divergence around the model sample. We evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation, and observe that we greatly improve upon the base model's ability to predict logically-consistent outputs. We also evaluate on the task of detoxifying large language models. Using a simple constraint disallowing a list of toxic words, we are able to steer the model's outputs away from toxic generations, achieving SoTA detoxification compared to previous approaches.

Title: A Study on the Calibration of In-context Learning. (arXiv:2312.04021v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04021
Code URL: null
Copy Paste: [[2312.04021]] A Study on the Calibration of In-context Learning(http://arxiv.org/abs/2312.04021)
Summary:
Modern auto-regressive language models are trained to minimize log loss on broad data by predicting the next token so they are expected to get calibrated answers when framing a problem as a next-token prediction task. We study this for in-context learning (ICL), a widely used way to adapt frozen large language models (LLMs) via crafting prompts, and investigate the trade-offs between performance and calibration on a wide range of natural language understanding and reasoning tasks. We conduct extensive experiments to show that such trade-offs may get worse as we increase model size, incorporate more ICL examples, and fine-tune models using instruction, dialog, or reinforcement learning from human feedback (RLHF) on carefully curated datasets. Furthermore, we find that common recalibration techniques that are widely effective such as temperature scaling provide limited gains in calibration errors, suggesting that new methods may be required for settings where models are expected to be reliable.

Title: Using a Large Language Model to generate a Design Structure Matrix. (arXiv:2312.04134v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04134
Code URL: null
Copy Paste: [[2312.04134]] Using a Large Language Model to generate a Design Structure Matrix(http://arxiv.org/abs/2312.04134)
Summary:
The Design Structure Matrix (DSM) is an established method used in dependency modelling, especially in the design of complex engineering systems. The generation of DSM is traditionally carried out through manual means and can involve interviewing experts to elicit critical system elements and the relationships between them. Such manual approaches can be time-consuming and costly. This paper presents a workflow that uses a Large Language Model (LLM) to support the generation of DSM and improve productivity. A prototype of the workflow was developed in this work and applied on a diesel engine DSM published previously. It was found that the prototype could reproduce 357 out of 462 DSM entries published (i.e. 77.3%), suggesting that the work can aid DSM generation. A no-code version of the prototype is made available online to support future research.

Title: MIMo: A Multi-Modal Infant Model for Studying Cognitive Development. (arXiv:2312.04318v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04318
Code URL: https://github.com/trieschlab/mimo
Copy Paste: [[2312.04318]] MIMo: A Multi-Modal Infant Model for Studying Cognitive Development(http://arxiv.org/abs/2312.04318)
Summary:
Human intelligence and human consciousness emerge gradually during the process of cognitive development. Understanding this development is an essential aspect of understanding the human mind and may facilitate the construction of artificial minds with similar properties. Importantly, human cognitive development relies on embodied interactions with the physical and social environment, which is perceived via complementary sensory modalities. These interactions allow the developing mind to probe the causal structure of the world. This is in stark contrast to common machine learning approaches, e.g., for large language models, which are merely passively ``digesting'' large amounts of training data, but are not in control of their sensory inputs. However, computational modeling of the kind of self-determined embodied interactions that lead to human intelligence and consciousness is a formidable challenge. Here we present MIMo, an open-source multi-modal infant model for studying early cognitive development through computer simulations. MIMo's body is modeled after an 18-month-old child with detailed five-fingered hands. MIMo perceives its surroundings via binocular vision, a vestibular system, proprioception, and touch perception through a full-body virtual skin, while two different actuation models allow control of his body. We describe the design and interfaces of MIMo and provide examples illustrating its use. All code is available at https://github.com/trieschlab/MIMo .

Title: CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models. (arXiv:2312.04350v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04350
Code URL: https://github.com/causalnlp/cladder
Copy Paste: [[2312.04350]] CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models(http://arxiv.org/abs/2312.04350)
Summary:
The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordance with a set of well-defined formal rules. To address this, we propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al. We compose a large dataset, CLadder, with 10K samples: based on a collection of causal graphs and queries (associational, interventional, and counterfactual), we obtain symbolic questions and ground-truth answers, through an oracle causal inference engine. These are then translated into natural language. We evaluate multiple LLMs on our dataset, and we introduce and evaluate a bespoke chain-of-thought prompting strategy, CausalCoT. We show that our task is highly challenging for LLMs, and we conduct an in-depth analysis to gain deeper insight into the causal reasoning abilities of LLMs. Our data is open-sourced at https://huggingface.co/datasets/causalNLP/cladder, and our code can be found at https://github.com/causalNLP/cladder.

Title: LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs. (arXiv:2312.04372v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04372
Code URL: null
Copy Paste: [[2312.04372]] LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs(http://arxiv.org/abs/2312.04372)
Summary:
We present LaMPilot, a novel framework for planning in the field of autonomous driving, rethinking the task as a code-generation process that leverages established behavioral primitives. This approach aims to address the challenge of interpreting and executing spontaneous user instructions such as "overtake the car ahead," which have typically posed difficulties for existing frameworks. We introduce the LaMPilot benchmark specifically designed to quantitatively evaluate the efficacy of Large Language Models (LLMs) in translating human directives into actionable driving policies. We then evaluate a wide range of state-of-the-art code generation language models on tasks from the LaMPilot Benchmark. The results of the experiments showed that GPT-4, with human feedback, achieved an impressive task completion rate of 92.7% and a minimal collision rate of 0.9%. To encourage further investigation in this area, our code and dataset will be made available.

Title: Prompting in Autoregressive Large Language Models. (arXiv:2312.03740v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03740
Code URL: null
Copy Paste: [[2312.03740]] Prompting in Autoregressive Large Language Models(http://arxiv.org/abs/2312.03740)
Summary:
Autoregressive Large Language Models have transformed the landscape of Natural Language Processing. Pre-train and prompt paradigm has replaced the conventional approach of pre-training and fine-tuning for many downstream NLP tasks. This shift has been possible largely due to LLMs and innovative prompting techniques. LLMs have shown great promise for a variety of downstream tasks owing to their vast parameters and huge datasets that they are pre-trained on. However, in order to fully realize their potential, their outputs must be guided towards the desired outcomes. Prompting, in which a specific input or instruction is provided to guide the LLMs toward the intended output, has become a tool for achieving this goal. In this paper, we discuss the various prompting techniques that have been applied to fully harness the power of LLMs. We present a taxonomy of existing literature on prompting techniques and provide a concise survey based on this taxonomy. Further, we identify some open problems in the realm of prompting in autoregressive LLMs which could serve as a direction for future research.

Title: Clinical Risk Prediction Using Language Models: Benefits And Considerations. (arXiv:2312.03742v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03742
Code URL: null
Copy Paste: [[2312.03742]] Clinical Risk Prediction Using Language Models: Benefits And Considerations(http://arxiv.org/abs/2312.03742)
Summary:
The utilization of Electronic Health Records (EHRs) for clinical risk prediction is on the rise. However, strict privacy regulations limit access to comprehensive health records, making it challenging to apply standard machine learning algorithms in practical real-world scenarios. Previous research has addressed this data limitation by incorporating medical ontologies and employing transfer learning methods. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks. Unlike applying LMs to unstructured EHR data such as clinical notes, this study focuses on using textual descriptions within structured EHR to make predictions exclusively based on that information. We extensively compare against previous approaches across various data types and sizes. We find that employing LMs to represent structured EHRs, such as diagnostic histories, leads to improved or at least comparable performance in diverse risk prediction tasks. Furthermore, LM-based approaches offer numerous advantages, including few-shot learning, the capability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. Nevertheless, we underscore, through various experiments, the importance of being cautious when employing such models, as concerns regarding the reliability of LMs persist.

Title: Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment. (arXiv:2312.03766v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03766
Code URL: null
Copy Paste: [[2312.03766]] Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment(http://arxiv.org/abs/2312.03766)
Summary:
While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds plausible misaligned captions for a given image and corresponding textual explanations and visual indicators. We also publish a new human curated test set comprising ground-truth textual and visual misalignment annotations. Empirical results show that fine-tuning vision language models on our training set enables them to articulate misalignments and visually indicate them within images, outperforming strong baselines both on the binary alignment classification and the explanation generation tasks. Our method code and human curated test set are available at: https://mismatch-quest.github.io/

Title: Revisiting the Optimality of Word Lengths. (arXiv:2312.03897v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03897
Code URL: null
Copy Paste: [[2312.03897]] Revisiting the Optimality of Word Lengths(http://arxiv.org/abs/2312.03897)
Summary:
Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs. Under the assumption that cost is given by an utterance's length, he supported this claim by showing that words' lengths are inversely correlated with their frequencies. Communicative cost, however, can be operationalized in different ways. Piantadosi et al. (2011) claim that cost should be measured as the distance between an utterance's information rate and channel capacity, which we dub the channel capacity hypothesis (CCH) here. Following this logic, they then proposed that a word's length should be proportional to the expected value of its surprisal (negative log-probability in context). In this work, we show that Piantadosi et al.'s derivation does not minimize CCH's cost, but rather a lower bound, which we term CCH-lower. We propose a novel derivation, suggesting an improved way to minimize CCH's cost. Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio. Experimentally, we compare these three communicative cost functions: Zipf's, CCH-lower , and CCH. Across 13 languages and several experimental settings, we find that length is better predicted by frequency than either of the other hypotheses. In fact, when surprisal's expectation, or expectation plus variance-to-mean ratio, is estimated using better language models, it leads to worse word length predictions. We take these results as evidence that Zipf's longstanding hypothesis holds.

Title: RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training. (arXiv:2312.04032v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04032
Code URL: https://github.com/bbuing9/roast
Copy Paste: [[2312.04032]] RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training(http://arxiv.org/abs/2312.04032)
Summary:
Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives of robustness for LMs have been studied independently, but lacking a unified consideration in multiple perspectives. In this paper, we propose Robustifying LMs via Adversarial perturbation with Selective Training (RoAST), a simple yet effective fine-tuning technique to enhance the multi-perspective robustness of LMs in a unified way. RoAST effectively incorporates two important sources for the model robustness, robustness on the perturbed inputs and generalizable knowledge in pre-trained LMs. To be specific, RoAST introduces adversarial perturbation during fine-tuning while the model parameters are selectively updated upon their relative importance to minimize unnecessary deviation. Under a unified evaluation of fine-tuned LMs by incorporating four representative perspectives of model robustness, we demonstrate the effectiveness of RoAST compared to state-of-the-art fine-tuning methods on six different types of LMs, which indicates its usefulness in practice.

Title: Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss. (arXiv:2312.04059v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04059
Code URL: null
Copy Paste: [[2312.04059]] Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss(http://arxiv.org/abs/2312.04059)
Summary:
Automated coaching messages for weight control can save time and costs, but their repetitive, generic nature may limit their effectiveness compared to human coaching. Large language model (LLM) based artificial intelligence (AI) chatbots, like ChatGPT, could offer more personalized and novel messages to address repetition with their data-processing abilities. While LLM AI demonstrates promise to encourage healthier lifestyles, studies have yet to examine the feasibility and acceptability of LLM-based BWL coaching. 87 adults in a weight-loss trial rated ten coaching messages' helpfulness (five human-written, five ChatGPT-generated) using a 5-point Likert scale, providing additional open-ended feedback to justify their ratings. Participants also identified which messages they believed were AI-generated. The evaluation occurred in two phases: messages in Phase 1 were perceived as impersonal and negative, prompting revisions for Phase 2 messages. In Phase 1, AI-generated messages were rated less helpful than human-written ones, with 66 percent receiving a helpfulness rating of 3 or higher. However, in Phase 2, the AI messages matched the human-written ones regarding helpfulness, with 82% scoring three or above. Additionally, 50% were misidentified as human-written, suggesting AI's sophistication in mimicking human-generated content. A thematic analysis of open-ended feedback revealed that participants appreciated AI's empathy and personalized suggestions but found them more formulaic, less authentic, and too data-focused. This study reveals the preliminary feasibility and acceptability of LLM AIs, like ChatGPT, in crafting potentially effective weight control coaching messages. Our findings also underscore areas for future enhancement.

Title: Language Model Knowledge Distillation for Efficient Question Answering in Spanish. (arXiv:2312.04193v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04193
Code URL: https://github.com/adrianbzg/tinyroberta-distillation-qa-es
Copy Paste: [[2312.04193]] Language Model Knowledge Distillation for Efficient Question Answering in Spanish(http://arxiv.org/abs/2312.04193)
Summary:
Recent advances in the development of pre-trained Spanish language models has led to significant progress in many Natural Language Processing (NLP) tasks, such as question answering. However, the lack of efficient models imposes a barrier for the adoption of such models in resource-constrained environments. Therefore, smaller distilled models for the Spanish language could be proven to be highly scalable and facilitate their further adoption on a variety of tasks and scenarios. In this work, we take one step in this direction by developing SpanishTinyRoBERTa, a compressed language model based on RoBERTa for efficient question answering in Spanish. To achieve this, we employ knowledge distillation from a large model onto a lighter model that allows for a wider implementation, even in areas with limited computational resources, whilst attaining negligible performance sacrifice. Our experiments show that the dense distilled model can still preserve the performance of its larger counterpart, while significantly increasing inference speedup. This work serves as a starting point for further research and investigation of model compression efforts for Spanish language models across various NLP tasks.

Title: Beyond Surface: Probing LLaMA Across Scales and Layers. (arXiv:2312.04333v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04333
Code URL: https://github.com/nuochenpku/llama_analysis
Copy Paste: [[2312.04333]] Beyond Surface: Probing LLaMA Across Scales and Layers(http://arxiv.org/abs/2312.04333)
Summary:
This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.

Title: OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization. (arXiv:2312.04440v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04440
Code URL: https://github.com/liatschiff/openasp
Copy Paste: [[2312.04440]] OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization(http://arxiv.org/abs/2312.04440)
Summary:
The performance of automatic summarization models has improved dramatically in recent years. Yet, there is still a gap in meeting specific information needs of users in real-world scenarios, particularly when a targeted summary is sought, such as in the useful aspect-based summarization setting targeted in this paper. Previous datasets and studies for this setting have predominantly concentrated on a limited set of pre-defined aspects, focused solely on single document inputs, or relied on synthetic data. To advance research on more realistic scenarios, we introduce OpenAsp, a benchmark for multi-document \textit{open} aspect-based summarization. This benchmark is created using a novel and cost-effective annotation protocol, by which an open aspect dataset is derived from existing generic multi-document summarization datasets. We analyze the properties of OpenAsp showcasing its high-quality content. Further, we show that the realistic open-aspect setting realized in OpenAsp poses a challenge for current state-of-the-art summarization models, as well as for large language models.

gpt

Title: ChatGPT Application In Summarizing An Evolution Of Deep Learning Techniques In Imaging: A Qualitative Study. (arXiv:2312.03723v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03723
Code URL: null
Copy Paste: [[2312.03723]] ChatGPT Application In Summarizing An Evolution Of Deep Learning Techniques In Imaging: A Qualitative Study(http://arxiv.org/abs/2312.03723)
Summary:
The pursuit of article or text summarization has captured the attention of natural language processing (NLP) practitioners, presenting itself as a formidable challenge. ChatGPT 3.5 exhibits the capacity to condense the content of up to 3000 tokens into a single page, aiming to retain pivotal information from a given text across diverse themes. In a conducted qualitative research endeavor, we selected seven scientific articles and employed the publicly available ChatGPT service to generate summaries of these articles. Subsequently, we engaged six co-authors of the articles in a survey, presenting five questions to evaluate the quality of the summaries compared to the original content. The findings revealed that the summaries produced by ChatGPT effectively encapsulated the crucial information present in the articles, preserving the principal message of each manuscript. Nonetheless, there was a slight diminishment in the technical depth of the summaries as opposed to the original articles. As a result, our conclusion underscores ChatGPT's text summarization capability as a potent tool for extracting essential insights in a manner more aligned with reporting than purely scientific discourse.

Title: Real Customization or Just Marketing: Are Customized Versions of Chat GPT Useful?. (arXiv:2312.03728v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03728
Code URL: null
Copy Paste: [[2312.03728]] Real Customization or Just Marketing: Are Customized Versions of Chat GPT Useful?(http://arxiv.org/abs/2312.03728)
Summary:
Large Language Models (LLMs), as the case of OpenAI ChatGPT-4 Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalized through a fine-tuning process to meet the student demands on every particular subject, like statistics. Recently, OpenAI has launched the possibility to fine-tune their model with a natural language web interface, enabling the possibility to create customized GPT version deliberately conditioned to meet the demands of a specific task. The objective of this research is to assess the potential of the customized GPTs that have recently been launched by OpenAI. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behavior was evaluated and compared with that of ChatGPT-4 Turbo. The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP provided responses in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, and this is a matter of relevance, when explicitly asked for something like, "I would like to practice a programming exercise similar to those in R practice 4," BSVP was capable of providing a far superior response: having access to contextual documentation, it could fulfill the request, something beyond ChatGPT-4 Turbo's capabilities. On the downside, the response times were generally higher. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo. It appears that customized assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.

Title: GPT vs Human for Scientific Reviews: A Dual Source Review on Applications of ChatGPT in Science. (arXiv:2312.03769v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03769
Code URL: null
Copy Paste: [[2312.03769]] GPT vs Human for Scientific Reviews: A Dual Source Review on Applications of ChatGPT in Science(http://arxiv.org/abs/2312.03769)
Summary:
The new polymath Large Language Models (LLMs) can speed-up greatly scientific reviews, possibly using more unbiased quantitative metrics, facilitating cross-disciplinary connections, and identifying emerging trends and research gaps by analyzing large volumes of data. However, at the present time, they lack the required deep understanding of complex methodologies, they have difficulty in evaluating innovative claims, and they are unable to assess ethical issues and conflicts of interest. Herein, we consider 13 GPT-related papers across different scientific domains, reviewed by a human reviewer and SciSpace, a large language model, with the reviews evaluated by three distinct types of evaluators, namely GPT-3.5, a crowd panel, and GPT-4. We found that 50% of SciSpace's responses to objective questions align with those of a human reviewer, with GPT-4 (informed evaluator) often rating the human reviewer higher in accuracy, and SciSpace higher in structure, clarity, and completeness. In subjective questions, the uninformed evaluators (GPT-3.5 and crowd panel) showed varying preferences between SciSpace and human responses, with the crowd panel showing a preference for the human responses. However, GPT-4 rated them equally in accuracy and structure but favored SciSpace for completeness.

Title: AI and Jobs: Has the Inflection Point Arrived? Evidence from an Online Labor Platform. (arXiv:2312.04180v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04180
Code URL: null
Copy Paste: [[2312.04180]] AI and Jobs: Has the Inflection Point Arrived? Evidence from an Online Labor Platform(http://arxiv.org/abs/2312.04180)
Summary:
Artificial intelligence (AI) refers to the ability of machines or software to mimic or even surpass human intelligence in a given cognitive task. While humans learn by both induction and deduction, the success of current AI is rooted in induction, relying on its ability to detect statistical regularities in task input -- an ability learnt from a vast amount of training data using enormous computation resources. We examine the performance of such a statistical AI in a human task through the lens of four factors, including task learnability, statistical resource, computation resource, and learning techniques, and then propose a three-phase visual framework to understand the evolving relation between AI and jobs. Based on this conceptual framework, we develop a simple economic model of competition to show the existence of an inflection point for each occupation. Before AI performance crosses the inflection point, human workers always benefit from an improvement in AI performance, but after the inflection point, human workers become worse off whenever such an improvement occurs. To offer empirical evidence, we first argue that AI performance has passed the inflection point for the occupation of translation but not for the occupation of web development. We then study how the launch of ChatGPT, which led to significant improvement of AI performance on many tasks, has affected workers in these two occupations on a large online labor platform. Consistent with the inflection point conjecture, we find that translators are negatively affected by the shock both in terms of the number of accepted jobs and the earnings from those jobs, while web developers are positively affected by the very same shock. Given the potentially large disruption of AI on employment, more studies on more occupations using data from different platforms are urgently needed.

Title: Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies. (arXiv:2312.04344v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04344
Code URL: null
Copy Paste: [[2312.04344]] Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies(http://arxiv.org/abs/2312.04344)
Summary:
OpenAI's latest large vision-language model (LVLM), GPT-4V(ision), has piqued considerable interest for its potential in medical applications. Despite its promise, recent studies and internal reviews highlight its underperformance in specialized medical tasks. This paper explores the boundary of GPT-4V's capabilities in medicine, particularly in processing complex imaging data from endoscopies, CT scans, and MRIs etc. Leveraging open-source datasets, we assessed its foundational competencies, identifying substantial areas for enhancement. Our research emphasizes prompt engineering, an often-underutilized strategy for improving AI responsiveness. Through iterative testing, we refined the model's prompts, significantly improving its interpretative accuracy and relevance in medical imaging. From our comprehensive evaluations, we distilled 10 effective prompt engineering techniques, each fortifying GPT-4V's medical acumen. These methodical enhancements facilitate more reliable, precise, and clinically valuable insights from GPT-4V, advancing its operability in critical healthcare environments. Our findings are pivotal for those employing AI in medicine, providing clear, actionable guidance on harnessing GPT-4V's full diagnostic potential.

Title: UID as a Guiding Metric for Automated Authorship Obfuscation. (arXiv:2312.03709v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03709
Code URL: null
Copy Paste: [[2312.03709]] UID as a Guiding Metric for Automated Authorship Obfuscation(http://arxiv.org/abs/2312.03709)
Summary:
Protecting the anonymity of authors has become a difficult task given the rise of automated authorship attributors. These attributors are capable of attributing the author of a text amongst a pool of authors with great accuracy. In order to counter the rise of these automated attributors, there has also been a rise of automated obfuscators. These obfuscators are capable of taking some text, perturbing the text in some manner, and, if successful, deceive an automated attributor in misattributing the wrong author. We devised three novel authorship obfuscation methods that utilized a Psycho-linguistic theory known as Uniform Information Density (UID) theory. This theory states that humans evenly distribute information amongst speech or text so as to maximize efficiency. Utilizing this theory in our three obfuscation methods, we attempted to see how successfully we could deceive two separate attributors. Obfuscating 50 human and 50 GPT-3 generated articles from the TuringBench dataset, we observed how well each method did on deceiving the attributors. While the quality of the obfuscation in terms of semantic preservation and sensical changes was high, we were not able to find any evidence to indicate UID was a viable guiding metric for obfuscation. However, due to restrictions in time we were unable to test a large enough sample of article or tune the parameters for our attributors to comment conclusively on UID in obfuscation.

llm

Title: Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits. (arXiv:2312.03720v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03720
Code URL: null
Copy Paste: [[2312.03720]] Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits(http://arxiv.org/abs/2312.03720)
Summary:
Large language models LLMs like ChatGPT have reached the 100 Mio user barrier in record time and might increasingly enter all areas of our life leading to a diverse set of interactions between those Artificial Intelligence models and humans. While many studies have discussed governance and regulations deductively from first-order principles, few studies provide an inductive, data-driven lens based on observing dialogues between humans and LLMs especially when it comes to non-collaborative, competitive situations that have the potential to pose a serious threat to people. In this work, we conduct a user study engaging over 40 individuals across all age groups in price negotiations with an LLM. We explore how people interact with an LLM, investigating differences in negotiation outcomes and strategies. Furthermore, we highlight shortcomings of LLMs with respect to their reasoning capabilities and, in turn, susceptiveness to prompt hacking, which intends to manipulate the LLM to make agreements that are against its instructions or beyond any rationality. We also show that the negotiated prices humans manage to achieve span a broad range, which points to a literacy gap in effectively interacting with LLMs.

Title: MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator. (arXiv:2312.03991v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03991
Code URL: null
Copy Paste: [[2312.03991]] MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator(http://arxiv.org/abs/2312.03991)
Summary:
Offline reinforcement learning (RL) faces a significant challenge of distribution shift. Model-free offline RL penalizes the Q value for out-of-distribution (OOD) data or constrains the policy closed to the behavior policy to tackle this problem, but this inhibits the exploration of the OOD region. Model-based offline RL, which uses the trained environment model to generate more OOD data and performs conservative policy optimization within that model, has become an effective method for this problem. However, the current model-based algorithms rarely consider agent robustness when incorporating conservatism into policy. Therefore, the new model-based offline algorithm with a conservative Bellman operator (MICRO) is proposed. This method trades off performance and robustness via introducing the robust Bellman operator into the algorithm. Compared with previous model-based algorithms with robust adversarial models, MICRO can significantly reduce the computation cost by only choosing the minimal Q value in the state uncertainty set. Extensive experiments demonstrate that MICRO outperforms prior RL algorithms in offline RL benchmark and is considerably robust to adversarial perturbations.

Title: Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization. (arXiv:2312.04386v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04386
Code URL: null
Copy Paste: [[2312.04386]] Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization(http://arxiv.org/abs/2312.04386)
Summary:
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation (UBE), but the over-approximation may result in inefficient exploration. We propose a new UBE whose solution converges to the true posterior variance over values and leads to lower regret in tabular exploration problems. We identify challenges to apply the UBE theory beyond tabular problems and propose a suitable approximation. Based on this approximation, we introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC), that can be applied for either risk-seeking or risk-averse policy optimization with minimal changes. Experiments in both online and offline RL demonstrate improved performance compared to other uncertainty estimation methods.

Title: SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM. (arXiv:2312.03788v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03788
Code URL: null
Copy Paste: [[2312.03788]] SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM(http://arxiv.org/abs/2312.03788)
Summary:
Large language models (LLMs) have shown remarkable capabilities in various tasks. However their huge model size and the consequent demand for computational and memory resources also pose challenges to model deployment. Currently, 4-bit post-training quantization (PTQ) has achieved some success in LLMs, reducing the memory footprint by approximately 75% compared to FP16 models, albeit with some accuracy loss. In this paper, we propose SmoothQuant+, an accurate and efficient 4-bit weight-only PTQ that requires no additional training, which enables lossless in accuracy for LLMs for the first time. Based on the fact that the loss of weight quantization is amplified by the activation outliers, SmoothQuant+ smoothes the activation outliers by channel before quantization, while adjusting the corresponding weights for mathematical equivalence, and then performs group-wise 4-bit weight quantization for linear layers. We have integrated SmoothQuant+ into the vLLM framework, an advanced high-throughput inference engine specially developed for LLMs, and equipped it with an efficient W4A16 CUDA kernels, so that vLLM can seamlessly support SmoothQuant+ 4-bit weight quantization. Our results show that, with SmoothQuant+, the Code Llama-34B model can be quantized and deployed on a A100 40GB GPU, achieving lossless accuracy and a throughput increase of 1.9 to 4.0 times compared to the FP16 model deployed on two A100 40GB GPUs. Moreover, the latency per token is only 68% of the FP16 model deployed on two A100 40GB GPUs. This is the state-of-the-art 4-bit weight quantization for LLMs as we know.

Title: Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak. (arXiv:2312.04127v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04127
Code URL: null
Copy Paste: [[2312.04127]] Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak(http://arxiv.org/abs/2312.04127)
Summary:
Extensive work has been devoted to improving the safety mechanism of Large Language Models (LLMs). However, in specific scenarios, LLMs still generate harmful responses when faced with malicious instructions, a phenomenon referred to as "Jailbreak Attack". In our research, we introduce a novel jailbreak attack method (\textbf{RADIAL}), which consists of two steps: 1) Inherent Response Tendency Analysis: we analyze the inherent affirmation and rejection tendency of LLMs to react to real-world instructions. 2) Real-World Instructions-Driven Jailbreak: based on our analysis, we strategically choose several real-world instructions and embed malicious instructions into them to amplify the LLM's potential to generate harmful responses. On three open-source human-aligned LLMs, our method achieves excellent jailbreak attack performance for both Chinese and English malicious instructions. Besides, we guided detailed ablation experiments and verified the effectiveness of our core idea "Inherent Response Tendency Analysis". Our exploration also exposes the vulnerability of LLMs to being induced into generating more detailed harmful responses in subsequent rounds of dialogue.

long context

lora

Title: Pearl: A Production-ready Reinforcement Learning Agent. (arXiv:2312.03814v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03814
Code URL: https://github.com/facebookresearch/pearl
Copy Paste: [[2312.03814]] Pearl: A Production-ready Reinforcement Learning Agent(http://arxiv.org/abs/2312.03814)
Summary:
Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety constraints are met. Despite considerable progress made by the RL research community in addressing these issues, existing open-source RL libraries tend to focus on a narrow portion of the RL solution pipeline, leaving other aspects largely unattended. This paper introduces Pearl, a Production-ready RL agent software package explicitly designed to embrace these challenges in a modular fashion. In addition to presenting preliminary benchmark results, this paper highlights Pearl's industry adoptions to demonstrate its readiness for production usage. Pearl is open sourced on Github at github.com/facebookresearch/pearl and its official website is located at pearlagent.github.io.

Title: Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration. (arXiv:2312.03987v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03987
Code URL: https://github.com/fmh1art/batcher
Copy Paste: [[2312.03987]] Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration(http://arxiv.org/abs/2312.03987)
Summary:
Entity resolution (ER) is an important data integration task with a wide spectrum of applications. The state-of-the-art solutions on ER rely on pre-trained language models (PLMs), which require fine-tuning on a lot of labeled matching/non-matching entity pairs. Recently, large languages models (LLMs), such as GPT-4, have shown the ability to perform many tasks without tuning model parameters, which is known as in-context learning (ICL) that facilitates effective learning from a few labeled input context demonstrations. However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs. To address the problem, in this paper, we provide a comprehensive study to investigate how to develop a cost-effective batch prompting approach to ER. We introduce a framework BATCHER consisting of demonstration selection and question batching and explore different design choices that support batch prompting for ER. We also devise a covering-based demonstration selection strategy that achieves an effective balance between matching accuracy and monetary cost. We conduct a thorough evaluation to explore the design space and evaluate our proposed strategies. Through extensive experiments, we find that batch prompting is very cost-effective for ER, compared with not only PLM-based methods fine-tuned with extensive labeled data but also LLM-based methods with manually designed prompting. We also provide guidance for selecting appropriate design choices for batch prompting.

Title: A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA. (arXiv:2312.03732v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03732
Code URL: null
Copy Paste: [[2312.03732]] A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA(http://arxiv.org/abs/2312.03732)
Summary:
As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.

hallucination

prompt

Title: Conditional Prompt Tuning for Multimodal Fusion. (arXiv:2312.03734v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03734
Code URL: null
Copy Paste: [[2312.03734]] Conditional Prompt Tuning for Multimodal Fusion(http://arxiv.org/abs/2312.03734)
Summary:
We show that the representation of one modality can effectively guide the prompting of another modality for parameter-efficient multimodal fusion. Specifically, we first encode one modality and use its representation as a prior to conditionally prompt all frozen layers of the other modality. This is achieved by disentangling the vanilla prompt vectors into three types of specialized prompts that adaptively capture global-level and instance-level features. To better produce the instance-wise prompt, we introduce the mixture of prompt experts (MoPE) to dynamically route each instance to the most suitable prompt experts for encoding. We further study a regularization term to avoid degenerated prompt expert routing. Thanks to our design, our method can effectively transfer the pretrained knowledge in unimodal encoders for downstream multimodal tasks. Compared with vanilla prompting, we show that our MoPE-based conditional prompting is more expressive, thereby scales better with training data and the total number of prompts. We also demonstrate that our prompt tuning is architecture-agnostic, thereby offering high modularity. Extensive experiments over three multimodal datasets demonstrate state-of-the-art results, matching or surpassing the performance achieved through fine-tuning, while only necessitating 0.7% of the trainable parameters. Code will be released: https://github.com/songrise/ConditionalPrompt.

Title: MultiGPrompt for Multi-Task Pre-Training and Prompting on Graphs. (arXiv:2312.03731v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03731
Code URL: null
Copy Paste: [[2312.03731]] MultiGPrompt for Multi-Task Pre-Training and Prompting on Graphs(http://arxiv.org/abs/2312.03731)
Summary:
Graphs can inherently model interconnected objects on the Web, thereby facilitating a series of Web applications, such as web analyzing and content recommendation. Recently, Graph Neural Networks (GNNs) have emerged as a mainstream technique for graph representation learning. However, their efficacy within an end-to-end supervised framework is significantly tied to the availabilityof task-specific labels. To mitigate labeling costs and enhance robustness in few-shot settings, pre-training on self-supervised tasks has emerged as a promising method, while prompting has been proposed to further narrow the objective gap between pretext and downstream tasks. Although there has been some initial exploration of prompt-based learning on graphs, they primarily leverage a single pretext task, resulting in a limited subset of general knowledge that could be learned from the pre-training data. Hence, in this paper, we propose MultiGPrompt, a novel multi-task pre-training and prompting framework to exploit multiple pretext tasks for more comprehensive pre-trained knowledge. First, in pre-training, we design a set of pretext tokens to synergize multiple pretext tasks. Second, we propose a dual-prompt mechanism consisting of composed and open prompts to leverage task-specific and global pre-training knowledge, to guide downstream tasks in few-shot settings. Finally, we conduct extensive experiments on six public datasets to evaluate and analyze MultiGPrompt.

code

Title: SCStory: Self-supervised and Continual Online Story Discovery. (arXiv:2312.03725v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03725
Code URL: null
Copy Paste: [[2312.03725]] SCStory: Self-supervised and Continual Online Story Discovery(http://arxiv.org/abs/2312.03725)
Summary:
We present a framework SCStory for online story discovery, that helps people digest rapidly published news article streams in real-time without human annotations. To organize news article streams into stories, existing approaches directly encode the articles and cluster them based on representation similarity. However, these methods yield noisy and inaccurate story discovery results because the generic article embeddings do not effectively reflect the story-indicative semantics in an article and cannot adapt to the rapidly evolving news article streams. SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams. With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories. The embedding module is continuously updated to adapt to evolving news streams with a contrastive learning objective, backed up by two unique techniques, confidence-aware memory replay and prioritized-augmentation, employed for label absence and data scarcity problems. Thorough experiments on real and the latest news data sets demonstrate that SCStory outperforms existing state-of-the-art algorithms for unsupervised online story discovery.

Title: Stock Movement and Volatility Prediction from Tweets, Macroeconomic Factors and Historical Prices. (arXiv:2312.03758v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.03758
Code URL: https://github.com/hao1zhao/bigdata23
Copy Paste: [[2312.03758]] Stock Movement and Volatility Prediction from Tweets, Macroeconomic Factors and Historical Prices(http://arxiv.org/abs/2312.03758)
Summary:
Predicting stock market is vital for investors and policymakers, acting as a barometer of the economic health. We leverage social media data, a potent source of public sentiment, in tandem with macroeconomic indicators as government-compiled statistics, to refine stock market predictions. However, prior research using tweet data for stock market prediction faces three challenges. First, the quality of tweets varies widely. While many are filled with noise and irrelevant details, only a few genuinely mirror the actual market scenario. Second, solely focusing on the historical data of a particular stock without considering its sector can lead to oversight. Stocks within the same industry often exhibit correlated price behaviors. Lastly, simply forecasting the direction of price movement without assessing its magnitude is of limited value, as the extent of the rise or fall truly determines profitability. In this paper, diverging from the conventional methods, we pioneer an ECON. The framework has following advantages: First, ECON has an adept tweets filter that efficiently extracts and decodes the vast array of tweet data. Second, ECON discerns multi-level relationships among stocks, sectors, and macroeconomic factors through a self-aware mechanism in semantic space. Third, ECON offers enhanced accuracy in predicting substantial stock price fluctuations by capitalizing on stock price movement. We showcase the state-of-the-art performance of our proposed model using a dataset, specifically curated by us, for predicting stock market movements and volatility.

Title: Similarity-based Knowledge Transfer for Cross-Domain Reinforcement Learning. (arXiv:2312.03764v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03764
Code URL: null
Copy Paste: [[2312.03764]] Similarity-based Knowledge Transfer for Cross-Domain Reinforcement Learning(http://arxiv.org/abs/2312.03764)
Summary:
Transferring knowledge in cross-domain reinforcement learning is a challenging setting in which learning is accelerated by reusing knowledge from a task with different observation and/or action space. However, it is often necessary to carefully select the source of knowledge for the receiving end to benefit from the transfer process. In this article, we study how to measure the similarity between cross-domain reinforcement learning tasks to select a source of knowledge that will improve the performance of the learning agent. We developed a semi-supervised alignment loss to match different spaces with a set of encoder-decoders, and use them to measure similarity and transfer policies across tasks. In comparison to prior works, our method does not require data to be aligned, paired or collected by expert policies. Experimental results, on a set of varied Mujoco control tasks, show the robustness of our method in effectively selecting and transferring knowledge, without the supervision of a tailored set of source tasks.

Title: Multi-Scale and Multi-Modal Contrastive Learning Network for Biomedical Time Series. (arXiv:2312.03796v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03796
Code URL: null
Copy Paste: [[2312.03796]] Multi-Scale and Multi-Modal Contrastive Learning Network for Biomedical Time Series(http://arxiv.org/abs/2312.03796)
Summary:
Multi-modal biomedical time series (MBTS) data offers a holistic view of the physiological state, holding significant importance in various bio-medical applications. Owing to inherent noise and distribution gaps across different modalities, MBTS can be complex to model. Various deep learning models have been developed to learn representations of MBTS but still fall short in robustness due to the ignorance of modal-to-modal variations. This paper presents a multi-scale and multi-modal biomedical time series representation learning (MBSL) network with contrastive learning to migrate these variations. Firstly, MBTS is grouped based on inter-modal distances, then each group with minimum intra-modal variations can be effectively modeled by individual encoders. Besides, to enhance the multi-scale feature extraction (encoder), various patch lengths and mask ratios are designed to generate tokens with semantic information at different scales and diverse contextual perspectives respectively. Finally, cross-modal contrastive learning is proposed to maximize consistency among inter-modal groups, maintaining useful information and eliminating noises. Experiments against four bio-medical applications show that MBSL outperforms state-of-the-art models by 33.9% mean average errors (MAE) in respiration rate, by 13.8% MAE in exercise heart rate, by 1.41% accuracy in human activity recognition, and by 1.14% F1-score in obstructive sleep apnea-hypopnea syndrome.

Title: Graph Convolutions Enrich the Self-Attention in Transformers!. (arXiv:2312.04234v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04234
Code URL: null
Copy Paste: [[2312.04234]] Graph Convolutions Enrich the Self-Attention in Transformers!(http://arxiv.org/abs/2312.04234)
Summary:
Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph pattern classification, speech recognition, and code classification.

Title: Easy Data Augmentation in Sentiment Analysis of Cyberbullying. (arXiv:2312.03743v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03743
Code URL: null
Copy Paste: [[2312.03743]] Easy Data Augmentation in Sentiment Analysis of Cyberbullying(http://arxiv.org/abs/2312.03743)
Summary:
Instagram, a social media platform, has in the vicinity of 2 billion active users in 2023. The platform allows users to post photos and videos with one another. However, cyberbullying remains a significant problem for about 50% of young Indonesians. To address this issue, sentiment analysis for comment filtering uses a Support Vector Machine (SVM) and Easy Data Augmentation (EDA). EDA will augment the dataset, enabling robust prediction and analysis of cyberbullying by introducing more variation. Based on the tests, SVM combination with EDA results in a 2.52% increase in the k-Fold Cross Validation score. Our proposed approach shows an improved accuracy of 92.5%, 2.5% higher than that of the existing state-of-the-art method. To maintain the reproducibility and replicability of this research, the source code can be accessed at uns.id/eda_svm.

Title: Multimodal Misinformation Detection in a South African Social Media Environment. (arXiv:2312.04052v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04052
Code URL: null
Copy Paste: [[2312.04052]] Multimodal Misinformation Detection in a South African Social Media Environment(http://arxiv.org/abs/2312.04052)
Summary:
With the constant spread of misinformation on social media networks, a need has arisen to continuously assess the veracity of digital content. This need has inspired numerous research efforts on the development of misinformation detection (MD) models. However, many models do not use all information available to them and existing research contains a lack of relevant datasets to train the models, specifically within the South African social media environment. The aim of this paper is to investigate the transferability of knowledge of a MD model between different contextual environments. This research contributes a multimodal MD model capable of functioning in the South African social media environment, as well as introduces a South African misinformation dataset. The model makes use of multiple sources of information for misinformation detection, namely: textual and visual elements. It uses bidirectional encoder representations from transformers (BERT) as the textual encoder and a residual network (ResNet) as the visual encoder. The model is trained and evaluated on the Fakeddit dataset and a South African misinformation dataset. Results show that using South African samples in the training of the model increases model performance, in a South African contextual environment, and that a multimodal model retains significantly more knowledge than both the textual and visual unimodal models. Our study suggests that the performance of a misinformation detection model is influenced by the cultural nuances of its operating environment and multimodal models assist in the transferability of knowledge between different contextual environments. Therefore, local data should be incorporated into the training process of a misinformation detection model in order to optimize model performance.

Title: Merging by Matching Models in Task Subspaces. (arXiv:2312.04339v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04339
Code URL: https://github.com/r-three/mats
Copy Paste: [[2312.04339]] Merging by Matching Models in Task Subspaces(http://arxiv.org/abs/2312.04339)
Summary:
Model merging aims to cheaply combine individual task-specific models into a single multitask model. In this work, we view past merging methods as leveraging different notions of a ''task subspace'' in which models are matched before being merged. We connect the task subspace of a given model to its loss landscape and formalize how this approach to model merging can be seen as solving a linear system of equations. While past work has generally been limited to linear systems that have a closed-form solution, we consider using the conjugate gradient method to find a solution. We show that using the conjugate gradient method can outperform closed-form solutions, enables merging via linear systems that are otherwise intractable to solve, and flexibly allows choosing from a wide variety of initializations and estimates for the ''task subspace''. We ultimately demonstrate that our merging framework called ''Matching Models in their Task Subspace'' (MaTS) achieves state-of-the-art results in multitask and intermediate-task model merging. We release all of the code and checkpoints used in our work at https://github.com/r-three/mats.

Title: Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs. (arXiv:2312.03865v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03865
Code URL: https://github.com/ratschlab/genomic-gnn
Copy Paste: [[2312.03865]] Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs(http://arxiv.org/abs/2312.03865)
Summary:
The rapid expansion of genomic sequence data calls for new methods to achieve robust sequence representations. Existing techniques often neglect intricate structural details, emphasizing mainly contextual information. To address this, we developed k-mer embeddings that merge contextual and structural string information by enhancing De Bruijn graphs with structural similarity connections. Subsequently, we crafted a self-supervised method based on Contrastive Learning that employs a heterogeneous Graph Convolutional Network encoder and constructs positive pairs based on node similarities. Our embeddings consistently outperform prior techniques for Edit Distance Approximation and Closest String Retrieval tasks.

Title: Series2Vec: Similarity-based Self-supervised Representation Learning for Time Series Classification. (arXiv:2312.03998v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03998
Code URL: https://github.com/navidfoumani/series2vec
Copy Paste: [[2312.03998]] Series2Vec: Similarity-based Self-supervised Representation Learning for Time Series Classification(http://arxiv.org/abs/2312.03998)
Summary:
We argue that time series analysis is fundamentally different in nature to either vision or natural language processing with respect to the forms of meaningful self-supervised learning tasks that can be defined. Motivated by this insight, we introduce a novel approach called \textit{Series2Vec} for self-supervised representation learning. Unlike other self-supervised methods in time series, which carry the risk of positive sample variants being less similar to the anchor sample than series in the negative set, Series2Vec is trained to predict the similarity between two series in both temporal and spectral domains through a self-supervised task. Series2Vec relies primarily on the consistency of the unsupervised similarity step, rather than the intrinsic quality of the similarity measurement, without the need for hand-crafted data augmentation. To further enforce the network to learn similar representations for similar time series, we propose a novel approach that applies order-invariant attention to each representation within the batch during training. Our evaluation of Series2Vec on nine large real-world datasets, along with the UCR/UEA archive, shows enhanced performance compared to current state-of-the-art self-supervised techniques for time series. Additionally, our extensive experiments show that Series2Vec performs comparably with fully supervised training and offers high efficiency in datasets with limited-labeled data. Finally, we show that the fusion of Series2Vec with other representation learning models leads to enhanced performance for time series classification. Code and models are open-source at \url{https://github.com/Navidfoumani/Series2Vec.}

Title: Jointly spatial-temporal representation learning for individual trajectories. (arXiv:2312.04055v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04055
Code URL: null
Copy Paste: [[2312.04055]] Jointly spatial-temporal representation learning for individual trajectories(http://arxiv.org/abs/2312.04055)
Summary:
Individual trajectories, containing substantial information on human-environment interactions across space and time, is a crucial input for geospatial foundation models (GeoFMs). However, existing attempts, leveraging trajectory data for various applications have overlooked the implicit spatial-temporal dependency within trajectories and failed to encode and represent it in a format friendly to deep learning, posing a challenge in obtaining general-purpose trajectory representations. Therefore, this paper proposes a spatial-temporal joint representation learning method (ST-GraphRL) to formalize learnable spatial-temporal dependencies into trajectory representations. The proposed ST-GraphRL consists of three compositions: (i) a weighted directed spatial-temporal graph to explicitly construct mobility interactions over both space and time dimensions; (ii) a two-stage jointly encoder (i.e., decoupling and fusion) to learn entangled spatial-temporal dependencies by independently decomposing and jointly aggregating space and time information; (iii) a decoder guides ST-GraphRL to learn explicit mobility regularities by simulating the spatial-temporal distributions of trajectories. Tested on three real-world human mobility datasets, the proposed ST-GraphRL outperformed all the baseline models in predicting movement spatial-temporal distributions and preserving trajectory similarity with high spatial-temporal correlations. We also explore how spatial-temporal features presented in latent space, validating that ST-GraphRL understands spatial-temporal patterns. This method is also transferable for general-purpose geospatial data representations for broad downstream tasks, as well advancing GeoFMs developing.

Title: MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion. (arXiv:2312.04067v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04067
Code URL: https://github.com/zpguigroupwhu/meancut-clustering
Copy Paste: [[2312.04067]] MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion(http://arxiv.org/abs/2312.04067)
Summary:
As the most typical graph clustering method, spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability. Classical spectral clustering measures the edge weights of graph using pairwise Euclidean-based metric, and solves the optimal graph partition by relaxing the constraints of indicator matrix and performing Laplacian decomposition. However, Euclidean-based similarity might cause skew graph cuts when handling non-spherical data distributions, and the relaxation strategy introduces information loss. Meanwhile, spectral clustering requires specifying the number of clusters, which is hard to determine without enough prior knowledge. In this work, we leverage the path-based similarity to enhance intra-cluster associations, and propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition. This algorithm enables the identification of arbitrary shaped clusters and is robust to noise. To reduce the computational complexity of similarity calculation, we transform optimal path search into generating the maximum spanning tree (MST), and develop a fast MST (FastMST) algorithm to further improve its time-efficiency. Moreover, we define a density gradient factor (DGF) for separating the weakly connected clusters. The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition. The source code of MeanCut is available at https://github.com/ZPGuiGroupWhu/MeanCut-Clustering.

Title: A Transformer Model for Symbolic Regression towards Scientific Discovery. (arXiv:2312.04070v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04070
Code URL: https://github.com/omron-sinicx/transformer4sr
Copy Paste: [[2312.04070]] A Transformer Model for Symbolic Regression towards Scientific Discovery(http://arxiv.org/abs/2312.04070)
Summary:
Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets. This allows to circumvent interpretation issues inherent to artificial neural networks, but SR algorithms are often computationally expensive. This work proposes a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery. We propose three encoder architectures with increasing flexibility but at the cost of column-permutation equivariance violation. Training results indicate that the most flexible architecture is required to prevent from overfitting. Once trained, we apply our best model to the SRSD datasets (Symbolic Regression for Scientific Discovery datasets) which yields state-of-the-art results using the normalized tree-based edit distance, at no extra computational cost.

Title: Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection. (arXiv:2312.04095v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04095
Code URL: https://github.com/hnanhtuan/projected_gradient_unlearning
Copy Paste: [[2312.04095]] Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection(http://arxiv.org/abs/2312.04095)
Summary:
Recent data-privacy laws have sparked interest in machine unlearning, which involves removing the effect of specific training samples from a learnt model as if they were never present in the original training dataset. The challenge of machine unlearning is to discard information about the ``forget'' data in the learnt model without altering the knowledge about the remaining dataset and to do so more efficiently than the naive retraining approach. To achieve this, we adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU), in which the model takes steps in the orthogonal direction to the gradient subspaces deemed unimportant for the retaining dataset, so as to its knowledge is preserved. By utilizing Stochastic Gradient Descent (SGD) to update the model weights, our method can efficiently scale to any model and dataset size. We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible. Our code is available at https://github.com/hnanhtuan/projected_gradient_unlearning.

Title: Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation. (arXiv:2312.04167v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04167
Code URL: null
Copy Paste: [[2312.04167]] Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation(http://arxiv.org/abs/2312.04167)
Summary:
In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources. A DVAE model is pre-trained on a single-source dataset to capture the source dynamics. Then, multiple instances of the pre-trained DVAE model are integrated into a multi-source mixture model with a discrete observation-to-source assignment latent variable. The posterior distributions of both the discrete observation-to-source assignment variable and the continuous DVAE variables representing the sources content/position are estimated using a variational expectation-maximization algorithm, leading to multi-source trajectories estimation. We illustrate the versatility of the proposed MixDVAE model on two tasks: a computer vision task, namely multi-object tracking, and an audio processing task, namely single-channel audio source separation. Experimental results show that the proposed method works well on these two tasks, and outperforms several baseline methods.

Title: CODEX: A Cluster-Based Method for Explainable Reinforcement Learning. (arXiv:2312.04216v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04216
Code URL: https://github.com/ainfosec/codex
Copy Paste: [[2312.04216]] CODEX: A Cluster-Based Method for Explainable Reinforcement Learning(http://arxiv.org/abs/2312.04216)
Summary:
Despite the impressive feats demonstrated by Reinforcement Learning (RL), these algorithms have seen little adoption in high-risk, real-world applications due to current difficulties in explaining RL agent actions and building user trust. We present Counterfactual Demonstrations for Explanation (CODEX), a method that incorporates semantic clustering, which can effectively summarize RL agent behavior in the state-action space. Experimentation on the MiniGrid and StarCraft II gaming environments reveals the semantic clusters retain temporal as well as entity information, which is reflected in the constructed summary of agent behavior. Furthermore, clustering the discrete+continuous game-state latent representations identifies the most crucial episodic events, demonstrating a relationship between the latent and semantic spaces. This work contributes to the growing body of work that strives to unlock the power of RL for widespread use by leveraging and extending techniques from Natural Language Processing.

chat

Title: Assessing AI Chatbots Performance in Comprehensive Standardized Test Preparation; A Case Study with GRE. (arXiv:2312.03719v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03719
Code URL: null
Copy Paste: [[2312.03719]] Assessing AI Chatbots Performance in Comprehensive Standardized Test Preparation; A Case Study with GRE(http://arxiv.org/abs/2312.03719)
Summary:
This research paper presents a comprehensive evaluation of the performance of three artificial 10 intelligence chatbots: Bing, ChatGPT, and GPT-4, in addressing standardized test questions. Graduate record examination, known as GRE, serves as a case study in this paper, encompassing both quantitative reasoning and verbal skills. A total of 137 quantitative reasoning questions, featuring diverse styles and 157 verbal questions categorized into varying levels of difficulty (easy, medium, and hard) were administered to assess the chatbots' capabilities. This paper provides a detailed examination of the results and their implications for the utilization of artificial intelligence in standardized test preparation by presenting the performance of each chatbot across various skills and styles tested in the exam. Additionally, this paper explores the proficiency of artificial intelligence in addressing image-based questions and illustrates the uncertainty level of each chatbot. The results reveal varying degrees of success across the chatbots, demonstrating the influence of model sophistication and training data. GPT-4 emerged as the most proficient, especially in complex language understanding tasks, highlighting the evolution of artificial intelligence in language comprehension and its ability to pass the exam with a high score.

Title: Comparing Generative Chatbots Based on Process Requirements. (arXiv:2312.03741v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03741
Code URL: null
Copy Paste: [[2312.03741]] Comparing Generative Chatbots Based on Process Requirements(http://arxiv.org/abs/2312.03741)
Summary:
Business processes are commonly represented by modelling languages, such as Event-driven Process Chain (EPC), Yet Another Workflow Language (YAWL), and the most popular standard notation for modelling business processes, the Business Process Model and Notation (BPMN). Most recently, chatbots, programs that allow users to interact with a machine using natural language, have been increasingly used for business process execution support. A recent category of chatbots worth mentioning is generative-based chatbots, powered by Large Language Models (LLMs) such as OpenAI's Generative Pre-Trained Transformer (GPT) model and Google's Pathways Language Model (PaLM), which are trained on billions of parameters and support conversational intelligence. However, it is not clear whether generative-based chatbots are able to understand and meet the requirements of constructs such as those provided by BPMN for process execution support. This paper presents a case study to compare the performance of prominent generative models, GPT and PaLM, in the context of process execution support. The research sheds light into the challenging problem of using conversational approaches supported by generative chatbots as a means to understand process-aware modelling notations and support users to execute their tasks.

Title: LineConGraphs: Line Conversation Graphs for Effective Emotion Recognition using Graph Neural Networks. (arXiv:2312.03756v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03756
Code URL: null
Copy Paste: [[2312.03756]] LineConGraphs: Line Conversation Graphs for Effective Emotion Recognition using Graph Neural Networks(http://arxiv.org/abs/2312.03756)
Summary:
Emotion Recognition in Conversations (ERC) is a critical aspect of affective computing, and it has many practical applications in healthcare, education, chatbots, and social media platforms. Earlier approaches for ERC analysis involved modeling both speaker and long-term contextual information using graph neural network architectures. However, it is ideal to deploy speaker-independent models for real-world applications. Additionally, long context windows can potentially create confusion in recognizing the emotion of an utterance in a conversation. To overcome these limitations, we propose novel line conversation graph convolutional network (LineConGCN) and graph attention (LineConGAT) models for ERC analysis. These models are speaker-independent and built using a graph construction strategy for conversations -- line conversation graphs (LineConGraphs). The conversational context in LineConGraphs is short-term -- limited to one previous and future utterance, and speaker information is not part of the graph. We evaluate the performance of our proposed models on two benchmark datasets, IEMOCAP and MELD, and show that our LineConGAT model outperforms the state-of-the-art methods with an F1-score of 64.58% and 76.50%. Moreover, we demonstrate that embedding sentiment shift information into line conversation graphs further enhances the ERC performance in the case of GCN models.

Title: PsyChat: A Client-Centric Dialogue System for Mental Health Support. (arXiv:2312.04262v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04262
Code URL: https://github.com/qiuhuachuan/psychat
Copy Paste: [[2312.04262]] PsyChat: A Client-Centric Dialogue System for Mental Health Support(http://arxiv.org/abs/2312.04262)
Summary:
Dialogue systems are increasingly integrated into mental health support to help clients facilitate exploration, gain insight, take action, and ultimately heal themselves. For a dialogue system to be practical and user-friendly, it should be client-centric, focusing on the client's behaviors. However, existing dialogue systems publicly available for mental health support often concentrate solely on the counselor's strategies rather than the behaviors expressed by clients. This can lead to the implementation of unreasonable or inappropriate counseling strategies and corresponding responses from the dialogue system. To address this issue, we propose PsyChat, a client-centric dialogue system that provides psychological support through online chat. The client-centric dialogue system comprises five modules: client behavior recognition, counselor strategy selection, input packer, response generator intentionally fine-tuned to produce responses, and response selection. Both automatic and human evaluations demonstrate the effectiveness and practicality of our proposed dialogue system for real-life mental health support. Furthermore, we employ our proposed dialogue system to simulate a real-world client-virtual-counselor interaction scenario. The system is capable of predicting the client's behaviors, selecting appropriate counselor strategies, and generating accurate and suitable responses, as demonstrated in the scenario.

retrieval augmented generation

rag

Title: Co-guiding for Multi-intent Spoken Language Understanding. (arXiv:2312.03716v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03716
Code URL: null
Copy Paste: [[2312.03716]] Co-guiding for Multi-intent Spoken Language Understanding(http://arxiv.org/abs/2312.03716)
Summary:
Recent graph-based models for multi-intent SLU have obtained promising results through modeling the guidance from the prediction of intents to the decoding of slot filling. However, existing methods (1) only model the unidirectional guidance from intent to slot, while there are bidirectional inter-correlations between intent and slot; (2) adopt homogeneous graphs to model the interactions between the slot semantics nodes and intent label nodes, which limit the performance. In this paper, we propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the mutual guidances between the two tasks. In the first stage, the initial estimated labels of both tasks are produced, and then they are leveraged in the second stage to model the mutual guidances. Specifically, we propose two heterogeneous graph attention networks working on the proposed two heterogeneous semantics label graphs, which effectively represent the relations among the semantics nodes and label nodes. Besides, we further propose Co-guiding-SCL Net, which exploits the single-task and dual-task semantics contrastive relations. For the first stage, we propose single-task supervised contrastive learning, and for the second stage, we propose co-guiding supervised contrastive learning, which considers the two tasks' mutual guidances in the contrastive learning procedure. Experiment results on multi-intent SLU show that our model outperforms existing models by a large margin, obtaining a relative improvement of 21.3% over the previous best model on MixATIS dataset in overall accuracy. We also evaluate our model on the zero-shot cross-lingual scenario and the results show that our model can relatively improve the state-of-the-art model by 33.5% on average in terms of overall accuracy for the total 9 languages.

Title: Leveraging AI-derived Data for Carbon Accounting: Information Extraction from Alternative Sources. (arXiv:2312.03722v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03722
Code URL: null
Copy Paste: [[2312.03722]] Leveraging AI-derived Data for Carbon Accounting: Information Extraction from Alternative Sources(http://arxiv.org/abs/2312.03722)
Summary:
Carbon accounting is a fundamental building block in our global path to emissions reduction and decarbonization, yet many challenges exist in achieving reliable and trusted carbon accounting measures. We motivate that carbon accounting not only needs to be more data-driven, but also more methodologically sound. We discuss the need for alternative, more diverse data sources that can play a significant role on our path to trusted carbon accounting procedures and elaborate on not only why, but how Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular can unlock reasonable access to a treasure trove of alternative data sets in light of the recent advances in the field that better enable the utilization of unstructured data in this process. We present a case study of the recent developments on real-world data via an NLP-powered analysis using OpenAI's GPT API on financial and shipping data. We conclude the paper with a discussion on how these methods and approaches can be integrated into a broader framework for AI-enabled integrative carbon accounting.

Title: Content-Localization based System for Analyzing Sentiment and Hate Behaviors in Low-Resource Dialectal Arabic: English to Levantine and Gulf. (arXiv:2312.03727v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03727
Code URL: null
Copy Paste: [[2312.03727]] Content-Localization based System for Analyzing Sentiment and Hate Behaviors in Low-Resource Dialectal Arabic: English to Levantine and Gulf(http://arxiv.org/abs/2312.03727)
Summary:
Even though online social movements can quickly become viral on social media, languages can be a barrier to timely monitoring and analyzing the underlying online social behaviors (OSB). This is especially true for under-resourced languages on social media like dialectal Arabic; the primary language used by Arabs on social media. Therefore, it is crucial to provide solutions to efficiently exploit resources from high-resourced languages to solve language-dependent OSB analysis in under-resourced languages. This paper proposes to localize content of resources in high-resourced languages into under-resourced Arabic dialects. Content localization goes beyond content translation that converts text from one language to another; content localization adapts culture, language nuances and regional preferences from one language to a specific language/dialect. Automating understanding of the natural and familiar day-to-day expressions in different regions, is the key to achieve a wider analysis of OSB especially for smart cities. In this paper, we utilize content-localization based neural machine translation to develop sentiment and hate classifiers for two low-resourced Arabic dialects: Levantine and Gulf. Not only this but we also leverage unsupervised learning to facilitate the analysis of sentiment and hate predictions by inferring hidden topics from the corresponding data and providing coherent interpretations of those topics in their native language/dialects. The experimental evaluations and proof-of-concept COVID-19 case study on real data have validated the effectiveness of our proposed system in precisely distinguishing sentiments and accurately identifying hate content in both Levantine and Gulf Arabic dialects. Our findings shed light on the importance of considering the unique nature of dialects within the same language and ignoring the dialectal aspect would lead to misleading analysis.

Title: Breaking the Entanglement of Homophily and Heterophily in Semi-supervised Node Classification. (arXiv:2312.04111v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04111
Code URL: null
Copy Paste: [[2312.04111]] Breaking the Entanglement of Homophily and Heterophily in Semi-supervised Node Classification(http://arxiv.org/abs/2312.04111)
Summary:
Recently, graph neural networks (GNNs) have shown prominent performance in semi-supervised node classification by leveraging knowledge from the graph database. However, most existing GNNs follow the homophily assumption, where connected nodes are more likely to exhibit similar feature distributions and the same labels, and such an assumption has proven to be vulnerable in a growing number of practical applications. As a supplement, heterophily reflects dissimilarity in connected nodes, which has gained significant attention in graph learning. To this end, data engineers aim to develop a powerful GNN model that can ensure performance under both homophily and heterophily. Despite numerous attempts, most existing GNNs struggle to achieve optimal node representations due to the constraints of undirected graphs. The neglect of directed edges results in sub-optimal graph representations, thereby hindering the capacity of GNNs. To address this issue, we introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective, offering valuable insights for \underline{A}daptively \underline{M}odeling the natural directed graphs as the \underline{U}ndirected or \underline{D}irected graph to maximize the benefits from subsequent graph learning. Furthermore, we propose \underline{A}daptive \underline{D}irected \underline{P}attern \underline{A}ggregation (ADPA) as a new directed graph learning paradigm for AMUD. Empirical studies have demonstrated that AMUD guides efficient graph learning. Meanwhile, extensive experiments on 14 benchmark datasets substantiate the impressive performance of ADPA, outperforming baselines by significant margins of 3.96\%.

Title: TimeDRL: Disentangled Representation Learning for Multivariate Time-Series. (arXiv:2312.04142v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04142
Code URL: null
Copy Paste: [[2312.04142]] TimeDRL: Disentangled Representation Learning for Multivariate Time-Series(http://arxiv.org/abs/2312.04142)
Summary:
Multivariate time-series data in numerous real-world applications (e.g., healthcare and industry) are informative but challenging due to the lack of labels and high dimensionality. Recent studies in self-supervised learning have shown their potential in learning rich representations without relying on labels, yet they fall short in learning disentangled embeddings and addressing issues of inductive bias (e.g., transformation-invariance). To tackle these challenges, we propose TimeDRL, a generic multivariate time-series representation learning framework with disentangled dual-level embeddings. TimeDRL is characterized by three novel features: (i) disentangled derivation of timestamp-level and instance-level embeddings from patched time-series data using a [CLS] token strategy; (ii) utilization of timestamp-predictive and instance-contrastive tasks for disentangled representation learning, with the former optimizing timestamp-level embeddings with predictive loss, and the latter optimizing instance-level embeddings with contrastive loss; and (iii) avoidance of augmentation methods to eliminate inductive biases, such as transformation-invariance from cropping and masking. Comprehensive experiments on 6 time-series forecasting datasets and 5 time-series classification datasets have shown that TimeDRL consistently surpasses existing representation learning approaches, achieving an average improvement of forecasting by 57.98% in MSE and classification by 1.25% in accuracy. Furthermore, extensive ablation studies confirmed the relative contribution of each component in TimeDRL's architecture, and semi-supervised learning evaluations demonstrated its effectiveness in real-world scenarios, even with limited labeled data.

Title: Constraint Model for the Satellite Image Mosaic Selection Problem. (arXiv:2312.04210v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04210
Code URL: https://github.com/mancs20/mosaic_image_combination
Copy Paste: [[2312.04210]] Constraint Model for the Satellite Image Mosaic Selection Problem(http://arxiv.org/abs/2312.04210)
Summary:
Satellite imagery solutions are widely used to study and monitor different regions of the Earth. However, a single satellite image can cover only a limited area. In cases where a larger area of interest is studied, several images must be stitched together to create a single larger image, called a mosaic, that can cover the area. Today, with the increasing number of satellite images available for commercial use, selecting the images to build the mosaic is challenging, especially when the user wants to optimize one or more parameters, such as the total cost and the cloud coverage percentage in the mosaic. More precisely, for this problem the input is an area of interest, several satellite images intersecting the area, a list of requirements relative to the image and the mosaic, such as cloud coverage percentage, image resolution, and a list of objectives to optimize. We contribute to the constraint and mixed integer lineal programming formulation of this new problem, which we call the \textit{satellite image mosaic selection problem}, which is a multi-objective extension of the polygon cover problem. We propose a dataset of realistic and challenging instances, where the images were captured by the satellite constellations SPOT, Pl\'eiades and Pl\'eiades Neo. We evaluate and compare the two proposed models and show their efficiency for large instances, up to 200 images.

Title: Causality and Explainability for Trustworthy Integrated Pest Management. (arXiv:2312.04343v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04343
Code URL: null
Copy Paste: [[2312.04343]] Causality and Explainability for Trustworthy Integrated Pest Management(http://arxiv.org/abs/2312.04343)
Summary:
Pesticides serve as a common tool in agricultural pest control but significantly contribute to the climate crisis. To combat this, Integrated Pest Management (IPM) stands as a climate-smart alternative. Despite its potential, IPM faces low adoption rates due to farmers' skepticism about its effectiveness. To address this challenge, we introduce an advanced data analysis framework tailored to enhance IPM adoption. Our framework provides i) robust pest population predictions across diverse environments with invariant and causal learning, ii) interpretable pest presence predictions using transparent models, iii) actionable advice through counterfactual explanations for in-season IPM interventions, iv) field-specific treatment effect estimations, and v) assessments of the effectiveness of our advice using causal inference. By incorporating these features, our framework aims to alleviate skepticism and encourage wider adoption of IPM practices among farmers.

Title: PCoQA: Persian Conversational Question Answering Dataset. (arXiv:2312.04362v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.04362
Code URL: null
Copy Paste: [[2312.04362]] PCoQA: Persian Conversational Question Answering Dataset(http://arxiv.org/abs/2312.04362)
Summary:
Humans seek information regarding a specific topic through performing a conversation containing a series of questions and answers. In the pursuit of conversational question answering research, we introduce the PCoQA, the first \textbf{P}ersian \textbf{Co}nversational \textbf{Q}uestion \textbf{A}nswering dataset, a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions. Each dialog involves a questioner, a responder, and a document from the Wikipedia; The questioner asks several inter-connected questions from the text and the responder provides a span of the document as the answer for each question. PCoQA is designed to present novel challenges compared to previous question answering datasets including having more open-ended non-factual answers, longer answers, and fewer lexical overlaps. This paper not only presents the comprehensive PCoQA dataset but also reports the performance of various benchmark models. Our models include baseline models and pre-trained models, which are leveraged to boost the performance of the model. The dataset and benchmarks are available at our Github page.

Title: Scalable Knowledge Graph Construction and Inference on Human Genome Variants. (arXiv:2312.04423v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.04423
Code URL: null
Copy Paste: [[2312.04423]] Scalable Knowledge Graph Construction and Inference on Human Genome Variants(http://arxiv.org/abs/2312.04423)
Summary:
Real-world knowledge can be represented as a graph consisting of entities and relationships between the entities. The need for efficient and scalable solutions arises when dealing with vast genomic data, like RNA-sequencing. Knowledge graphs offer a powerful approach for various tasks in such large-scale genomic data, such as analysis and inference. In this work, variant-level information extracted from the RNA-sequences of vaccine-na\"ive COVID-19 patients have been represented as a unified, large knowledge graph. Variant call format (VCF) files containing the variant-level information were annotated to include further information for each variant. The data records in the annotated files were then converted to Resource Description Framework (RDF) triples. Each VCF file obtained had an associated CADD scores file that contained the raw and Phred-scaled scores for each variant. An ontology was defined for the VCF and CADD scores files. Using this ontology and the extracted information, a large, scalable knowledge graph was created. Available graph storage was then leveraged to query and create datasets for further downstream tasks. We also present a case study using the knowledge graph and perform a classification task using graph machine learning. We also draw comparisons between different Graph Neural Networks (GNNs) for the case study.

Title: Syntactic Fusion: Enhancing Aspect-Level Sentiment Analysis Through Multi-Tree Graph Integration. (arXiv:2312.03738v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03738
Code URL: null
Copy Paste: [[2312.03738]] Syntactic Fusion: Enhancing Aspect-Level Sentiment Analysis Through Multi-Tree Graph Integration(http://arxiv.org/abs/2312.03738)
Summary:
Recent progress in aspect-level sentiment classification has been propelled by the incorporation of graph neural networks (GNNs) leveraging syntactic structures, particularly dependency trees. Nevertheless, the performance of these models is often hampered by the innate inaccuracies of parsing algorithms. To mitigate this challenge, we introduce SynthFusion, an innovative graph ensemble method that amalgamates predictions from multiple parsers. This strategy blends diverse dependency relations prior to the application of GNNs, enhancing robustness against parsing errors while avoiding extra computational burdens. SynthFusion circumvents the pitfalls of overparameterization and diminishes the risk of overfitting, prevalent in models with stacked GNN layers, by optimizing graph connectivity. Our empirical evaluations on the SemEval14 and Twitter14 datasets affirm that SynthFusion not only outshines models reliant on single dependency trees but also eclipses alternative ensemble techniques, achieving this without an escalation in model complexity.

Title: Which linguistic cues make people fall for fake news? A comparison of cognitive and affective processing. (arXiv:2312.03751v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03751
Code URL: null
Copy Paste: [[2312.03751]] Which linguistic cues make people fall for fake news? A comparison of cognitive and affective processing(http://arxiv.org/abs/2312.03751)
Summary:
Fake news on social media has large, negative implications for society. However, little is known about what linguistic cues make people fall for fake news and, hence, how to design effective countermeasures for social media. In this study, we seek to understand which linguistic cues make people fall for fake news. Linguistic cues (e.g., adverbs, personal pronouns, positive emotion words, negative emotion words) are important characteristics of any text and also affect how people process real vs. fake news. Specifically, we compare the role of linguistic cues across both cognitive processing (related to careful thinking) and affective processing (related to unconscious automatic evaluations). To this end, we performed a within-subject experiment where we collected neurophysiological measurements of 42 subjects while these read a sample of 40 real and fake news articles. During our experiment, we measured cognitive processing through eye fixations, and affective processing in situ through heart rate variability. We find that users engage more in cognitive processing for longer fake news articles, while affective processing is more pronounced for fake news written in analytic words. To the best of our knowledge, this is the first work studying the role of linguistic cues in fake news processing. Altogether, our findings have important implications for designing online platforms that encourage users to engage in careful thinking and thus prevent them from falling for fake news.

Title: English to Arabic machine translation of mathematical documents. (arXiv:2312.03753v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.03753
Code URL: null
Copy Paste: [[2312.03753]] English to Arabic machine translation of mathematical documents(http://arxiv.org/abs/2312.03753)
Summary:
This paper is about the development of a machine translation system tailored specifically for LATEX mathematical documents. The system focuses on translating English LATEX mathematical documents into Arabic LATEX, catering to the growing demand for multilingual accessibility in scientific and mathematical literature. With the vast proliferation of LATEX mathematical documents the need for an efficient and accurate translation system has become increasingly essential. This paper addresses the necessity for a robust translation tool that enables seamless communication and comprehension of complex mathematical content across language barriers. The proposed system leverages a Transformer model as the core of the translation system, ensuring enhanced accuracy and fluency in the translated Arabic LATEX documents. Furthermore, the integration of RyDArab, an Arabic mathematical TEX extension, along with a rule-based translator for Arabic mathematical expressions, contributes to the precise rendering of complex mathematical symbols and equations in the translated output. The paper discusses the architecture, methodology, of the developed system, highlighting its efficacy in bridging the language gap in the domain of mathematical documentation

Title: Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives. (arXiv:2312.03885v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03885
Code URL: null
Copy Paste: [[2312.03885]] Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives(http://arxiv.org/abs/2312.03885)
Summary:
We consider a gradient-based optimization method applied to a function $\mathcal{L}$ of a vector of variables $\boldsymbol{\theta}$, in the case where $\boldsymbol{\theta}$ is represented as a tuple of tensors $(\mathbf{T}_1, \cdots, \mathbf{T}_S)$. This framework encompasses many common use-cases, such as training neural networks by gradient descent. First, we propose a computationally inexpensive technique providing higher-order information on $\mathcal{L}$, especially about the interactions between the tensors $\mathbf{T}_s$, based on automatic differentiation and computational tricks. Second, we use this technique at order 2 to build a second-order optimization method which is suitable, among other things, for training deep neural networks of various architectures. This second-order method leverages the partition structure of $\boldsymbol{\theta}$ into tensors $(\mathbf{T}_1, \cdots, \mathbf{T}_S)$, in such a way that it requires neither the computation of the Hessian of $\mathcal{L}$ according to $\boldsymbol{\theta}$, nor any approximation of it. The key part consists in computing a smaller matrix interpretable as a "Hessian according to the partition", which can be computed exactly and efficiently. In contrast to many existing practical second-order methods used in neural networks, which perform a diagonal or block-diagonal approximation of the Hessian or its inverse, the method we propose does not neglect interactions between layers. Finally, we can tune the coarseness of the partition to recover well-known optimization methods: the coarsest case corresponds to Cauchy's steepest descent method, the finest case corresponds to the usual Newton's method.

Title: Adaptive Weighted Co-Learning for Cross-Domain Few-Shot Learning. (arXiv:2312.03928v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03928
Code URL: null
Copy Paste: [[2312.03928]] Adaptive Weighted Co-Learning for Cross-Domain Few-Shot Learning(http://arxiv.org/abs/2312.03928)
Summary:
Due to the availability of only a few labeled instances for the novel target prediction task and the significant domain shift between the well annotated source domain and the target domain, cross-domain few-shot learning (CDFSL) induces a very challenging adaptation problem. In this paper, we propose a simple Adaptive Weighted Co-Learning (AWCoL) method to address the CDFSL challenge by adapting two independently trained source prototypical classification models to the target task in a weighted co-learning manner. The proposed method deploys a weighted moving average prediction strategy to generate probabilistic predictions from each model, and then conducts adaptive co-learning by jointly fine-tuning the two models in an alternating manner based on the pseudo-labels and instance weights produced from the predictions. Moreover, a negative pseudo-labeling regularizer is further deployed to improve the fine-tuning process by penalizing false predictions. Comprehensive experiments are conducted on multiple benchmark datasets and the empirical results demonstrate that the proposed method produces state-of-the-art CDFSL performance.

Title: Rapid detection of rare events from in situ X-ray diffraction data using machine learning. (arXiv:2312.03989v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.03989
Code URL: null
Copy Paste: [[2312.03989]] Rapid detection of rare events from in situ X-ray diffraction data using machine learning(http://arxiv.org/abs/2312.03989)
Summary:
High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs of traditional data acquisition and reduction approaches pose a barrier to quickly extracting actionable insights and improving the temporal resolution of these snapshots. Here we present a fully automated technique capable of rapidly detecting the onset of plasticity in high-energy X-ray microscopy data. Our technique is computationally faster by at least 50 times than the traditional approaches and works for data sets that are up to 9 times sparser than a full data set. This new technique leverages self-supervised image representation learning and clustering to transform massive data into compact, semantic-rich representations of visually salient characteristics (e.g., peak shapes). These characteristics can be a rapid indicator of anomalous events such as changes in diffraction peak shapes. We anticipate that this technique will provide just-in-time actionable information to drive smarter experiments that effectively deploy multi-modal X-ray diffraction methods that span many decades of length scales.

Title: On the adaptation of in-context learners for system identification. (arXiv:2312.04083v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04083
Code URL: null
Copy Paste: [[2312.04083]] On the adaptation of in-context learners for system identification(http://arxiv.org/abs/2312.04083)
Summary:
In-context system identification aims at constructing meta-models to describe classes of systems, differently from traditional approaches that model single systems. This paradigm facilitates the leveraging of knowledge acquired from observing the behaviour of different, yet related dynamics. This paper discusses the role of meta-model adaptation. Through numerical examples, we demonstrate how meta-model adaptation can enhance predictive performance in three realistic scenarios: tailoring the meta-model to describe a specific system rather than a class; extending the meta-model to capture the behaviour of systems beyond the initial training class; and recalibrating the model for new prediction tasks. Results highlight the effectiveness of meta-model adaptation to achieve a more robust and versatile meta-learning framework for system identification.

Title: Learning to sample in Cartesian MRI. (arXiv:2312.04327v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.04327
Code URL: null
Copy Paste: [[2312.04327]] Learning to sample in Cartesian MRI(http://arxiv.org/abs/2312.04327)
Summary:
Despite its exceptional soft tissue contrast, Magnetic Resonance Imaging (MRI) faces the challenge of long scanning times compared to other modalities like X-ray radiography. Shortening scanning times is crucial in clinical settings, as it increases patient comfort, decreases examination costs and improves throughput. Recent advances in compressed sensing (CS) and deep learning allow accelerated MRI acquisition by reconstructing high-quality images from undersampled data. While reconstruction algorithms have received most of the focus, designing acquisition trajectories to optimize reconstruction quality remains an open question. This thesis explores two approaches to address this gap in the context of Cartesian MRI. First, we propose two algorithms, lazy LBCS and stochastic LBCS, that significantly improve upon G\"ozc\"u et al.'s greedy learning-based CS (LBCS) approach. These algorithms scale to large, clinically relevant scenarios like multi-coil 3D MR and dynamic MRI, previously inaccessible to LBCS. Additionally, we demonstrate that generative adversarial networks (GANs) can serve as a natural criterion for adaptive sampling by leveraging variance in the measurement domain to guide acquisition. Second, we delve into the underlying structures or assumptions that enable mask design algorithms to perform well in practice. Our experiments reveal that state-of-the-art deep reinforcement learning (RL) approaches, while capable of adaptation and long-horizon planning, offer only marginal improvements over stochastic LBCS, which is neither adaptive nor does long-term planning. Altogether, our findings suggest that stochastic LBCS and similar methods represent promising alternatives to deep RL. They shine in particular by their scalability and computational efficiency and could be key in the deployment of optimized acquisition trajectories in Cartesian MRI.