2024-01-04

language model

Title: Quantifying the Uniqueness of Donald Trump in Presidential Discourse. (arXiv:2401.01405v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01405
Code URL: null
Copy Paste: [[2401.01405]] Quantifying the Uniqueness of Donald Trump in Presidential Discourse(http://arxiv.org/abs/2401.01405)
Summary:
Does Donald Trump speak differently from other presidents? If so, in what ways? Are these differences confined to any single medium of communication? To investigate these questions, this paper introduces a novel metric of uniqueness based on large language models, develops a new lexicon for divisive speech, and presents a framework for comparing the lexical features of political opponents. Applying these tools to a variety of corpora of presidential speeches, we find considerable evidence that Trump's speech patterns diverge from those of all major party nominees for the presidency in recent history. Some notable findings include Trump's employment of particularly divisive and antagonistic language targeting of his political opponents and his patterns of repetition for emphasis. Furthermore, Trump is significantly more distinctive than his fellow Republicans, whose uniqueness values are comparably closer to those of the Democrats. These differences hold across a variety of measurement strategies, arise on both the campaign trail and in official presidential addresses, and do not appear to be an artifact of secular time trends.

Title: PLLaMa: An Open-source Large Language Model for Plant Science. (arXiv:2401.01600v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01600
Code URL: null
Copy Paste: [[2401.01600]] PLLaMa: An Open-source Large Language Model for Plant Science(http://arxiv.org/abs/2401.01600)
Summary:
Large Language Models (LLMs) have exhibited remarkable capabilities in understanding and interacting with natural language across various sectors. However, their effectiveness is limited in specialized areas requiring high accuracy, such as plant science, due to a lack of specific expertise in these fields. This paper introduces PLLaMa, an open-source language model that evolved from LLaMa-2. It's enhanced with a comprehensive database, comprising more than 1.5 million scholarly articles in plant science. This development significantly enriches PLLaMa with extensive knowledge and proficiency in plant and agricultural sciences. Our initial tests, involving specific datasets related to plants and agriculture, show that PLLaMa substantially improves its understanding of plant science-related topics. Moreover, we have formed an international panel of professionals, including plant scientists, agricultural engineers, and plant breeders. This team plays a crucial role in verifying the accuracy of PLLaMa's responses to various academic inquiries, ensuring its effective and reliable application in the field. To support further research and development, we have made the model's checkpoints and source codes accessible to the scientific community. These resources are available for download at \url{https://github.com/Xianjun-Yang/PLLaMa}.

Title: Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication. (arXiv:2401.01620v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01620
Code URL: https://github.com/philipchung/llm-periop-prediction
Copy Paste: [[2401.01620]] Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication(http://arxiv.org/abs/2401.01620)
Summary:
We investigate whether general-domain large language models such as GPT-4 Turbo can perform risk stratification and predict post-operative outcome measures using a description of the procedure and a patient's clinical notes derived from the electronic health record. We examine predictive performance on 8 different tasks: prediction of ASA Physical Status Classification, hospital admission, ICU admission, unplanned admission, hospital mortality, PACU Phase 1 duration, hospital duration, and ICU duration. Few-shot and chain-of-thought prompting improves predictive performance for several of the tasks. We achieve F1 scores of 0.50 for ASA Physical Status Classification, 0.81 for ICU admission, and 0.86 for hospital mortality. Performance on duration prediction tasks were universally poor across all prompt strategies. Current generation large language models can assist clinicians in perioperative risk stratification on classification tasks and produce high-quality natural language summaries and explanations.

Title: Large Language Models Relearn Removed Concepts. (arXiv:2401.01814v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01814
Code URL: null
Copy Paste: [[2401.01814]] Large Language Models Relearn Removed Concepts(http://arxiv.org/abs/2401.01814)
Summary:
Advances in model editing through neuron pruning hold promise for removing undesirable concepts from large language models. However, it remains unclear whether models have the capacity to reacquire pruned concepts after editing. To investigate this, we evaluate concept relearning in models by tracking concept saliency and similarity in pruned neurons during retraining. Our findings reveal that models can quickly regain performance post-pruning by relocating advanced concepts to earlier layers and reallocating pruned concepts to primed neurons with similar semantics. This demonstrates that models exhibit polysemantic capacities and can blend old and new concepts in individual neurons. While neuron pruning provides interpretability into model concepts, our results highlight the challenges of permanent concept removal for improved model \textit{safety}. Monitoring concept reemergence and developing techniques to mitigate relearning of unsafe concepts will be important directions for more robust model editing. Overall, our work strongly demonstrates the resilience and fluidity of concept representations in LLMs post concept removal.

Title: Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling. (arXiv:2401.01830v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01830
Code URL: null
Copy Paste: [[2401.01830]] Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling(http://arxiv.org/abs/2401.01830)
Summary:
Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we propose a novel text augmentation method that leverages the Fill-Mask feature of the transformer-based BERT model. Our method involves iteratively masking words in a sentence and replacing them with language model predictions. We have tested our proposed method on various NLP tasks and found it to be effective in many cases. Our results are presented along with a comparison to existing augmentation methods. Experimental results show that our proposed method significantly improves performance, especially on topic classification datasets.

Title: Multilingual Instruction Tuning With Just a Pinch of Multilinguality. (arXiv:2401.01854v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01854
Code URL: null
Copy Paste: [[2401.01854]] Multilingual Instruction Tuning With Just a Pinch of Multilinguality(http://arxiv.org/abs/2401.01854)
Summary:
As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. One promising approach is cross-lingual transfer, where a model acquires specific functionality on some language by finetuning on another language. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in several languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that increasing the number of languages in the instruction tuning set from 1 to only 2, 3, or 4 increases cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.

Title: MLPs Compass: What is learned when MLPs are combined with PLMs?. (arXiv:2401.01667v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01667
Code URL: null
Copy Paste: [[2401.01667]] MLPs Compass: What is learned when MLPs are combined with PLMs?(http://arxiv.org/abs/2401.01667)
Summary:
While Transformer-based pre-trained language models and their variants exhibit strong semantic representation capabilities, the question of comprehending the information gain derived from the additional components of PLMs remains an open question in this field. Motivated by recent efforts that prove Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs), this paper aims to quantify whether simple MLPs can further enhance the already potent ability of PLMs to capture linguistic information. Specifically, we design a simple yet effective probing framework containing MLPs components based on BERT structure and conduct extensive experiments encompassing 10 probing tasks spanning three distinct linguistic levels. The experimental results demonstrate that MLPs can indeed enhance the comprehension of linguistic structure by PLMs. Our research provides interpretable and valuable insights into crafting variations of PLMs utilizing MLPs for tasks that emphasize diverse linguistic structures.

Title: Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs. (arXiv:2401.01711v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01711
Code URL: https://github.com/sebischair/llm-sp-cqa
Copy Paste: [[2401.01711]] Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs(http://arxiv.org/abs/2401.01711)
Summary:
Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversational question answering. This paper evaluates the performance of large language models that have not been explicitly pre-trained on this task. Through a series of experiments on an extensive benchmark dataset, we compare models of varying sizes with different prompting techniques and identify common issue types in the generated output. Our results demonstrate that large language models are capable of generating graph queries from dialogues, with significant improvements achievable through few-shot prompting and fine-tuning techniques, especially for smaller models that exhibit lower zero-shot performance.

Title: Cross-target Stance Detection by Exploiting Target Analytical Perspectives. (arXiv:2401.01761v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01761
Code URL: null
Copy Paste: [[2401.01761]] Cross-target Stance Detection by Exploiting Target Analytical Perspectives(http://arxiv.org/abs/2401.01761)
Summary:
Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data derived from the source target. One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets. However, the analysis of informal and short text structure, and implicit expressions, complicate the extraction of domain-invariant knowledge. In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge. First, we develop a two-stage instruct-based chain-of-thought method (TsCoT) to elicit target analysis perspectives and provide natural language explanations (NLEs) from multiple viewpoints by formulating instructions based on large language model (LLM). Second, we propose a multi-perspective prompt-tuning framework (MultiPLN) to fuse the NLEs into the stance predictor. Extensive experiments results demonstrate the superiority of MPPT against the state-of-the-art baseline methods.

gpt

Title: GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse. (arXiv:2401.01523v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01523
Code URL: null
Copy Paste: [[2401.01523]] GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse(http://arxiv.org/abs/2401.01523)
Summary:
The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and imagery. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g. GPT-4V) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this vital field.

Title: Predicting challenge moments from students' discourse: A comparison of GPT-4 to two traditional natural language processing approaches. (arXiv:2401.01692v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01692
Code URL: null
Copy Paste: [[2401.01692]] Predicting challenge moments from students' discourse: A comparison of GPT-4 to two traditional natural language processing approaches(http://arxiv.org/abs/2401.01692)
Summary:
Effective collaboration requires groups to strategically regulate themselves to overcome challenges. Research has shown that groups may fail to regulate due to differences in members' perceptions of challenges which may benefit from external support. In this study, we investigated the potential of leveraging three distinct natural language processing models: an expert knowledge rule-based model, a supervised machine learning (ML) model and a Large Language model (LLM), in challenge detection and challenge dimension identification (cognitive, metacognitive, emotional and technical/other challenges) from student discourse, was investigated. The results show that the supervised ML and the LLM approaches performed considerably well in both tasks, in contrast to the rule-based approach, whose efficacy heavily relies on the engineered features by experts. The paper provides an extensive discussion of the three approaches' performance for automated detection and support of students' challenge moments in collaborative learning activities. It argues that, although LLMs provide many advantages, they are unlikely to be the panacea to issues of the detection and feedback provision of socially shared regulation of learning due to their lack of reliability, as well as issues of validity evaluation, privacy and confabulation. We conclude the paper with a discussion on additional considerations, including model transparency to explore feasible and meaningful analytical feedback for students and educators using LLMs.

llm

Title: Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review. (arXiv:2401.01519v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01519
Code URL: null
Copy Paste: [[2401.01519]] Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review(http://arxiv.org/abs/2401.01519)
Summary:
This paper explores the frontiers of large language models (LLMs) in psychology applications. Psychology has undergone several theoretical changes, and the current use of Artificial Intelligence (AI) and Machine Learning, particularly LLMs, promises to open up new research directions. We provide a detailed exploration of how LLMs like ChatGPT are transforming psychological research. It discusses the impact of LLMs across various branches of psychology, including cognitive and behavioral, clinical and counseling, educational and developmental, and social and cultural psychology, highlighting their potential to simulate aspects of human cognition and behavior. The paper delves into the capabilities of these models to emulate human-like text generation, offering innovative tools for literature review, hypothesis generation, experimental design, experimental subjects, data analysis, academic writing, and peer review in psychology. While LLMs are essential in advancing research methodologies in psychology, the paper also cautions about their technical and ethical challenges. There are issues like data privacy, the ethical implications of using LLMs in psychological research, and the need for a deeper understanding of these models' limitations. Researchers should responsibly use LLMs in psychological studies, adhering to ethical standards and considering the potential consequences of deploying these technologies in sensitive areas. Overall, the article provides a comprehensive overview of the current state of LLMs in psychology, exploring potential benefits and challenges. It serves as a call to action for researchers to leverage LLLs' advantages responsibly while addressing associated risks.

Title: A Generative AI Assistant to Accelerate Cloud Migration. (arXiv:2401.01753v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01753
Code URL: null
Copy Paste: [[2401.01753]] A Generative AI Assistant to Accelerate Cloud Migration(http://arxiv.org/abs/2401.01753)
Summary:
We present a tool that leverages generative AI to accelerate the migration of on-premises applications to the cloud. The Cloud Migration LLM accepts input from the user specifying the parameters of their migration, and outputs a migration strategy with an architecture diagram. A user study suggests that the migration LLM can assist inexperienced users in finding the right cloud migration profile, while avoiding complexities of a manual approach.

Title: Social Media Ready Caption Generation for Brands. (arXiv:2401.01637v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01637
Code URL: null
Copy Paste: [[2401.01637]] Social Media Ready Caption Generation for Brands(http://arxiv.org/abs/2401.01637)
Summary:
Social media advertisements are key for brand marketing, aiming to attract consumers with captivating captions and pictures or logos. While previous research has focused on generating captions for general images, incorporating brand personalities into social media captioning remains unexplored. Brand personalities are shown to be affecting consumers' behaviours and social interactions and thus are proven to be a key aspect of marketing strategies. Current open-source multimodal LLMs are not directly suited for this task. Hence, we propose a pipeline solution to assist brands in creating engaging social media captions that align with the image and the brand personalities. Our architecture is based on two parts: a the first part contains an image captioning model that takes in an image that the brand wants to post online and gives a plain English caption; b the second part takes in the generated caption along with the target brand personality and outputs a catchy personality-aligned social media caption. Along with brand personality, our system also gives users the flexibility to provide hashtags, Instagram handles, URLs, and named entities they want the caption to contain, making the captions more semantically related to the social media handles. Comparative evaluations against various baselines demonstrate the effectiveness of our approach, both qualitatively and quantitatively.

Title: Physio: An LLM-Based Physiotherapy Advisor. (arXiv:2401.01825v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01825
Code URL: null
Copy Paste: [[2401.01825]] Physio: An LLM-Based Physiotherapy Advisor(http://arxiv.org/abs/2401.01825)
Summary:
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-being. In this paper, we present Physio, a chat-based application for physical rehabilitation. Physio is capable of making an initial diagnosis while citing reliable health sources to support the information provided. Furthermore, drawing upon external knowledge databases, Physio can recommend rehabilitation exercises and over-the-counter medication for symptom relief. By combining these features, Physio can leverage the power of generative models for language processing while also conditioning its response on dependable and verifiable sources. A live demo of Physio is available at https://physio.inesctec.pt.

long context

lora

Title: A First Look at Information Highlighting in Stack Overflow Answers. (arXiv:2401.01472v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01472
Code URL: https://github.com/shaoweiwang2010/rep_2022_information_highlight_so
Copy Paste: [[2401.01472]] A First Look at Information Highlighting in Stack Overflow Answers(http://arxiv.org/abs/2401.01472)
Summary:
Context: Navigating the knowledge of Stack Overflow (SO) remains challenging. To make the posts vivid to users, SO allows users to write and edit posts with Markdown or HTML so that users can leverage various formatting styles (e.g., bold, italic, and code) to highlight the important information. Nonetheless, there have been limited studies on the highlighted information. Objective: We carried out the first large-scale exploratory study on the information highlighted in SO answers in our recent study. To extend our previous study, we develop approaches to automatically recommend highlighted content with formatting styles using neural network architectures initially designed for the Named Entity Recognition task. Method: In this paper, we studied 31,169,429 answers of Stack Overflow. For training recommendation models, we choose CNN and BERT models for each type of formatting (i.e., Bold, Italic, Code, and Heading) using the information highlighting dataset we collected from SO answers. Results: Our models based on CNN architecture achieve precision ranging from 0.71 to 0.82. The trained model for automatic code content highlighting achieves a recall of 0.73 and an F1 score of 0.71, outperforming the trained models for other formatting styles. The BERT models have even lower recalls and F1 scores than the CNN models. Our analysis of failure cases indicates that the majority of the failure cases are missing identification (i.e., the model misses the content that is supposed to be highlighted) due to the models tend to learn the frequently highlighted words while struggling to learn less frequent words. Conclusion: Our findings suggest that it is possible to develop recommendation models for highlighting information for answers with different formatting styles on Stack Overflow.

hallucination

Title: Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models. (arXiv:2401.01572v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01572
Code URL: null
Copy Paste: [[2401.01572]] Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models(http://arxiv.org/abs/2401.01572)
Summary:
Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of hallucinations to probable natural language outputs of the model creates a danger of deception and impacts the credibility of the system. We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models. To address this, we propose a perturbation-based method for assessing the susceptibility of an automatic speech recognition (ASR) model to hallucination at test time, which does not require access to the training dataset. We demonstrate that this method helps to distinguish between hallucinatory and non-hallucinatory models that have similar baseline word error rates. We further explore the relationship between the types of ASR errors and the types of dataset noise to determine what types of noise are most likely to create hallucinatory outputs. We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency. Finally, we discover how to induce hallucinations with a random noise injection to the utterance.

Title: Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering. (arXiv:2401.01780v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01780
Code URL: null
Copy Paste: [[2401.01780]] Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering(http://arxiv.org/abs/2401.01780)
Summary:
While Large Language Models (LLM) are able to accumulate and restore knowledge, they are still prone to hallucination. Especially when faced with factual questions, LLM cannot only rely on knowledge stored in parameters to guarantee truthful and correct answers. Augmenting these models with the ability to search on external information sources, such as the web, is a promising approach to ground knowledge to retrieve information. However, searching in a large collection of documents introduces additional computational/time costs. An optimal behavior would be to query external resources only when the LLM is not confident about answers. In this paper, we propose a new LLM able to self-estimate if it is able to answer directly or needs to request an external tool. We investigate a supervised approach by introducing a hallucination masking mechanism in which labels are generated using a close book question-answering task. In addition, we propose to leverage parameter-efficient fine-tuning techniques to train our model on a small amount of data. Our model directly provides answers for $78.2\%$ of the known queries and opts to search for $77.2\%$ of the unknown ones. This results in the API being utilized only $62\%$ of the time.

prompt

Title: Can AI Be as Creative as Humans?. (arXiv:2401.01623v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01623
Code URL: null
Copy Paste: [[2401.01623]] Can AI Be as Creative as Humans?(http://arxiv.org/abs/2401.01623)
Summary:
Creativity serves as a cornerstone for societal progress and innovation, but its assessment remains a complex and often subjective endeavor. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. This paper addresses the complexities in defining and evaluating creativity by introducing a new concept called Relative Creativity. Instead of trying to define creativity universally, we shift the focus to whether AI can match the creative abilities of a hypothetical human. This perspective draws inspiration from the Turing Test, expanding upon it to address the challenges and subjectivities inherent in evaluating creativity. This methodological shift facilitates a statistically quantifiable evaluation of AI's creativity, which we term Statistical Creativity. This approach allows for direct comparisons of AI's creative abilities with those of specific human groups. Building on this foundation, we discuss the application of statistical creativity in contemporary prompt-conditioned autoregressive models. In addition to defining and analyzing a measure of creativity, we introduce an actionable training guideline, effectively bridging the gap between theoretical quantification of creativity and practical model training. Through these multifaceted contributions, the paper establishes a cohesive, continuously evolving, and transformative framework for assessing and fostering statistical creativity in AI models.

code

Title: MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries. (arXiv:2401.01596v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01596
Code URL: https://github.com/arkadeepacharya/medsumm-ecir2024
Copy Paste: [[2401.01596]] MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries(http://arxiv.org/abs/2401.01596)
Summary:
In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available.

Title: A Two-Stage Multimodal Emotion Recognition Model Based on Graph Contrastive Learning. (arXiv:2401.01495v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01495
Code URL: null
Copy Paste: [[2401.01495]] A Two-Stage Multimodal Emotion Recognition Model Based on Graph Contrastive Learning(http://arxiv.org/abs/2401.01495)
Summary:
In terms of human-computer interaction, it is becoming more and more important to correctly understand the user's emotional state in a conversation, so the task of multimodal emotion recognition (MER) started to receive more attention. However, existing emotion classification methods usually perform classification only once. Sentences are likely to be misclassified in a single round of classification. Previous work usually ignores the similarities and differences between different morphological features in the fusion process. To address the above issues, we propose a two-stage emotion recognition model based on graph contrastive learning (TS-GCL). First, we encode the original dataset with different preprocessing modalities. Second, a graph contrastive learning (GCL) strategy is introduced for these three modal data with other structures to learn similarities and differences within and between modalities. Finally, we use MLP twice to achieve the final emotion classification. This staged classification method can help the model to better focus on different levels of emotional information, thereby improving the performance of the model. Extensive experiments show that TS-GCL has superior performance on IEMOCAP and MELD datasets compared with previous methods.

chat

retrieval augmented generation

Title: Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented Generation. (arXiv:2401.01469v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.01469
Code URL: null
Copy Paste: [[2401.01469]] Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented Generation(http://arxiv.org/abs/2401.01469)
Summary:
Summarization of electronic health records (EHRs) can substantially minimize 'screen time' for both patients as well as medical personnel. In recent years summarization of EHRs have employed machine learning pipelines using state of the art neural models. However, these models have produced less than adequate results that are attributed to the difficulty of obtaining sufficient annotated data for training. Moreover, the requirement to consider the entire content of an EHR in summarization has resulted in poor performance due to the fact that attention mechanisms in modern large language models (LLMs) adds a quadratic complexity in terms of the size of the input. We propose here a method that mitigates these shortcomings by combining semantic search, retrieval augmented generation (RAG) and question-answering using the latest LLMs. In our approach summarization is the extraction of answers to specific questions that are deemed important by subject-matter experts (SMEs). Our approach is quite efficient; requires minimal to no training; does not suffer from the 'hallucination' problem of LLMs; and it ensures diversity, since the summary will not have repeated content but diverse answers to specific questions.

retrieval-augmented generation

rag

Title: Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference. (arXiv:2401.01426v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01426
Code URL: null
Copy Paste: [[2401.01426]] Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference(http://arxiv.org/abs/2401.01426)
Summary:
Pearl's causal hierarchy establishes a clear separation between observational, interventional, and counterfactual questions. Researchers proposed sound and complete algorithms to compute identifiable causal queries at a given level of the hierarchy using the causal structure and data from the lower levels of the hierarchy. However, most of these algorithms assume that we can accurately estimate the probability distribution of the data, which is an impractical assumption for high-dimensional variables such as images. On the other hand, modern generative deep learning architectures can be trained to learn how to accurately sample from such high-dimensional distributions. Especially with the recent rise of foundation models for images, it is desirable to leverage pre-trained models to answer causal queries with such high-dimensional data. To address this, we propose a sequential training algorithm that, given the causal structure and a pre-trained conditional generative model, can train a deep causal generative model, which utilizes the pre-trained model and can provably sample from identifiable interventional and counterfactual distributions. Our algorithm, called Modular-DCM, uses adversarial training to learn the network weights, and to the best of our knowledge, is the first algorithm that can make use of pre-trained models and provably sample from any identifiable causal query in the presence of latent confounders with high-dimensional data. We demonstrate the utility of our algorithm using semi-synthetic and real-world datasets containing images as variables in the causal structure.

Title: Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint. (arXiv:2401.01458v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01458
Code URL: null
Copy Paste: [[2401.01458]] Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint(http://arxiv.org/abs/2401.01458)
Summary:
Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal effects. Explicit NN-HA testing methods cannot detect transient faults during inference, are unsuitable for always-on applications, and require extensive test vector generation and storage. Therefore, in this paper, we propose the \emph{uncertainty fingerprint} approach representing the online fault status of NN. Furthermore, we propose a dual head NN topology specifically designed to produce uncertainty fingerprints and the primary prediction of the NN in \emph{a single shot}. During the online operation, by matching the uncertainty fingerprint, we can concurrently self-test NNs with up to $100\%$ coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to $243.7$ MB, multiply and accumulate (MAC) operation is reduced by up to $10000\times$, and false-positive rates are reduced by up to $89\%$.

Title: Outlier Ranking in Large-Scale Public Health Streams. (arXiv:2401.01459v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01459
Code URL: null
Copy Paste: [[2401.01459]] Outlier Ranking in Large-Scale Public Health Streams(http://arxiv.org/abs/2401.01459)
Summary:
Disease control experts inspect public health data streams daily for outliers worth investigating, like those corresponding to data quality issues or disease outbreaks. However, they can only examine a few of the thousands of maximally-tied outliers returned by univariate outlier detection methods applied to large-scale public health data streams. To help experts distinguish the most important outliers from these thousands of tied outliers, we propose a new task for algorithms to rank the outputs of any univariate method applied to each of many streams. Our novel algorithm for this task, which leverages hierarchical networks and extreme value analysis, performed the best across traditional outlier detection metrics in a human-expert evaluation using public health data streams. Most importantly, experts have used our open-source Python implementation since April 2023 and report identifying outliers worth investigating 9.1x faster than their prior baseline. Other organizations can readily adapt this implementation to create rankings from the outputs of their tailored univariate methods across large-scale streams.

Title: Uncertainty Regularized Evidential Regression. (arXiv:2401.01484v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01484
Code URL: https://github.com/flynnye/ur-ern
Copy Paste: [[2401.01484]] Uncertainty Regularized Evidential Regression(http://arxiv.org/abs/2401.01484)
Summary:
The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory to predict a target and quantify the associated uncertainty. Guided by the underlying theory, specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance by limiting its ability to learn from all samples. This paper provides a theoretical analysis of this limitation and introduces an improvement to overcome it. Initially, we define the region where the models can't effectively learn from the samples. Following this, we thoroughly analyze the ERN and investigate this constraint. Leveraging the insights from our analysis, we address the limitation by introducing a novel regularization term that empowers the ERN to learn from the whole training set. Our extensive experiments substantiate our theoretical findings and demonstrate the effectiveness of the proposed solution.

Title: Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework. (arXiv:2401.01493v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01493
Code URL: null
Copy Paste: [[2401.01493]] Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework(http://arxiv.org/abs/2401.01493)
Summary:
Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields. Due to location differences, growth in data size, and centralized server storage constraints, these data are usually stored under different databases across regions/countries. However, privacy laws and national security concerns constrain researchers from accessing these sensitive remote sensing images for further analysis. Additionally, low-resource remote sensing devices encounter challenges in terms of communication overhead and efficiency when dealing with the ever-increasing data and model scales. To solve the above challenges, this paper proposes a novel Privacy-Reserving TFGC Framework based on Federated Learning, dubbed PRFL. The proposed framework allows each client to learn global and local knowledge to enhance the local representation of private data in environments with extreme statistical heterogeneity (non. Independent and Identically Distributed, IID). Thus, it provides highly customized models to clients with differentiated data distributions. Moreover, the framework minimizes communication overhead and improves efficiency while ensuring satisfactory performance, thereby enhancing robustness and practical applicability under resource-scarce conditions. We demonstrate the effectiveness of the proposed PRFL on the classical TFGC task by leveraging four public datasets.

Title: Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices. (arXiv:2401.01728v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01728
Code URL: null
Copy Paste: [[2401.01728]] Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices(http://arxiv.org/abs/2401.01728)
Summary:
Modern deep learning models, growing larger and more complex, have demonstrated exceptional generalization and accuracy due to training on huge datasets. This trend is expected to continue. However, the increasing size of these models poses challenges in training, as traditional centralized methods are limited by memory constraints at such scales. This paper proposes an asynchronous decentralized training paradigm for large modern deep learning models that harnesses the compute power of regular heterogeneous PCs with limited resources connected across the internet to achieve favourable performance metrics. Ravnest facilitates decentralized training by efficiently organizing compute nodes into clusters with similar data transfer rates and compute capabilities, without necessitating that each node hosts the entire model. These clusters engage in $\textit{Zero-Bubble Asynchronous Model Parallel}$ training, and a $\textit{Parallel Multi-Ring All-Reduce}$ method is employed to effectively execute global parameter averaging across all clusters. We have framed our asynchronous SGD loss function as a block structured optimization problem with delayed updates and derived an optimal convergence rate of $O\left(\frac{1}{\sqrt{K}}\right)$. We further discuss linear speedup with respect to the number of participating clusters and the bound on the staleness parameter.

Title: A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable Structure. (arXiv:2401.01772v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01772
Code URL: null
Copy Paste: [[2401.01772]] A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable Structure(http://arxiv.org/abs/2401.01772)
Summary:
Artificial neural networks (ANNs) have permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, the inherent limitations of traditional neural networks arise due to their relatively fixed network structures and activation functions. 1, The type of activation function is single and relatively fixed, which leads to poor "unit representation ability" of the network, and it is often used to solve simple problems with very complex networks; 2, the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. To address the aforementioned issues, this study proposes a novel neural network called X-Net. By utilizing our designed Alternating Backpropagation mechanism, X-Net dynamically selects appropriate activation functions based on derivative information during training to enhance the network's representation capability for specific tasks. Simultaneously, it accurately adjusts the network structure at the neuron level to accommodate tasks of varying complexities and reduce computational costs. Through a series of experiments, we demonstrate the dual advantages of X-Net in terms of reducing model size and improving representation power. Specifically, in terms of the number of parameters, X-Net is only 3$\%$ of baselines on average, and only 1.4$\%$ under some tasks. In terms of representation ability, X-Net can achieve an average $R^2$=0.985 on the fitting task by only optimizing the activation function without introducing any parameters. Finally, we also tested the ability of X-Net to help scientific discovery on data from multiple disciplines such as society, energy, environment, and aerospace, and achieved concise and good results.

Title: Applications of machine learning and IoT for Outdoor Air Pollution Monitoring and Prediction: A Systematic Literature Review. (arXiv:2401.01788v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01788
Code URL: null
Copy Paste: [[2401.01788]] Applications of machine learning and IoT for Outdoor Air Pollution Monitoring and Prediction: A Systematic Literature Review(http://arxiv.org/abs/2401.01788)
Summary:
According to the World Health Organization (WHO), air pollution kills seven million people every year. Outdoor air pollution is a major environmental health problem affecting low, middle, and high-income countries. In the past few years, the research community has explored IoT-enabled machine learning applications for outdoor air pollution prediction. The general objective of this paper is to systematically review applications of machine learning and Internet of Things (IoT) for outdoor air pollution prediction and the combination of monitoring sensors and input features used. Two research questions were formulated for this review. 1086 publications were collected in the initial PRISMA stage. After the screening and eligibility phases, 37 papers were selected for inclusion. A cost-based analysis was conducted on the findings to highlight high-cost monitoring, low-cost IoT and hybrid enabled prediction. Three methods of prediction were identified: time series, feature-based and spatio-temporal. This review's findings identify major limitations in applications found in the literature, namely lack of coverage, lack of diversity of data and lack of inclusion of context-specific features. This review proposes directions for future research and underlines practical implications in healthcare, urban planning, global synergy and smart cities.

Title: Zero-shot Active Learning Using Self Supervised Learning. (arXiv:2401.01690v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01690
Code URL: null
Copy Paste: [[2401.01690]] Zero-shot Active Learning Using Self Supervised Learning(http://arxiv.org/abs/2401.01690)
Summary:
Deep learning algorithms are often said to be data hungry. The performance of such algorithms generally improve as more and more annotated data is fed into the model. While collecting unlabelled data is easier (as they can be scraped easily from the internet), annotating them is a tedious and expensive task. Given a fixed budget available for data annotation, Active Learning helps selecting the best subset of data for annotation, such that the deep learning model when trained over that subset will have maximum generalization performance under this budget. In this work, we aim to propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process. We aim to leverage self-supervised learnt features for the task of Active Learning. The benefit of self-supervised learning, is that one can get useful feature representation of the input data, without having any annotation.

Title: EPA: Neural Collapse Inspired Robust Out-of-Distribution Detector. (arXiv:2401.01710v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01710
Code URL: null
Copy Paste: [[2401.01710]] EPA: Neural Collapse Inspired Robust Out-of-Distribution Detector(http://arxiv.org/abs/2401.01710)
Summary:
Out-of-distribution (OOD) detection plays a crucial role in ensuring the security of neural networks. Existing works have leveraged the fact that In-distribution (ID) samples form a subspace in the feature space, achieving state-of-the-art (SOTA) performance. However, the comprehensive characteristics of the ID subspace still leave under-explored. Recently, the discovery of Neural Collapse ($\mathcal{NC}$) sheds light on novel properties of the ID subspace. Leveraging insight from $\mathcal{NC}$, we observe that the Principal Angle between the features and the ID feature subspace forms a superior representation for measuring the likelihood of OOD. Building upon this observation, we propose a novel $\mathcal{NC}$-inspired OOD scoring function, named Entropy-enhanced Principal Angle (EPA), which integrates both the global characteristic of the ID subspace and its inner property. We experimentally compare EPA with various SOTA approaches, validating its superior performance and robustness across different network architectures and OOD datasets.

Title: On the hardness of learning under symmetries. (arXiv:2401.01869v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.01869
Code URL: null
Copy Paste: [[2401.01869]] On the hardness of learning under symmetries(http://arxiv.org/abs/2401.01869)
Summary:
We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.

multi-run

chain-of-thought

tree-of-thought

agent

Title: Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes. (arXiv:2401.01841v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.01841
Code URL: https://github.com/scope-lab-vu/ada-mcts
Copy Paste: [[2401.01841]] Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes(http://arxiv.org/abs/2401.01841)
Summary:
A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i.e., the agent acts ``safely'' to account for the non-stationary evolution of the environment. We argue that both these assumptions are invalid in practice -- updated environmental conditions are rarely known, and as the agent interacts with the environment, it can learn about the updated dynamics and avoid being pessimistic, at least in states whose dynamics it is confident about. We present a heuristic search algorithm called \textit{Adaptive Monte Carlo Tree Search (ADA-MCTS)} that addresses these challenges. We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic. To quantify ``updated knowledge,'' we disintegrate the aleatoric and epistemic uncertainty in the agent's updated belief and show how the agent can use these estimates for decision-making. We compare the proposed approach with the multiple state-of-the-art approaches in decision-making across multiple well-established open-source problems and empirically show that our approach is faster and highly adaptive without sacrificing safety.