language model

Title: LDM$^2$: A Large Decision Model Imitating Human Cognition with Dynamic Memory Enhancement. (arXiv:2312.08402v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08402
Code URL: null
Copy Paste: [[2312.08402]] LDM$^2$: A Large Decision Model Imitating Human Cognition with Dynamic Memory Enhancement(http://arxiv.org/abs/2312.08402)
Summary:
With the rapid development of large language models (LLMs), it is highly demanded that LLMs can be adopted to make decisions to enable the artificial general intelligence. Most approaches leverage manually crafted examples to prompt the LLMs to imitate the decision process of human. However, designing optimal prompts is difficult and the patterned prompts can hardly be generalized to more complex environments. In this paper, we propose a novel model named Large Decision Model with Memory (LDM$^2$), which leverages a dynamic memory mechanism to construct dynamic prompts, guiding the LLMs in making proper decisions according to the faced state. LDM$^2$ consists of two stages: memory formation and memory refinement. In the former stage, human behaviors are decomposed into state-action tuples utilizing the powerful summarizing ability of LLMs. Then, these tuples are stored in the memory, whose indices are generated by the LLMs, to facilitate the retrieval of the most relevant subset of memorized tuples based on the current state. In the latter stage, our LDM$^2$ employs tree exploration to discover more suitable decision processes and enrich the memory by adding valuable state-action tuples. The dynamic circle of exploration and memory enhancement provides LDM$^2$ a better understanding of the global environment. Extensive experiments conducted in two interactive environments have shown that our LDM$^2$ outperforms the baselines in terms of both score and success rate, which demonstrates its effectiveness.

Title: Contractive error feedback for gradient compression. (arXiv:2312.08538v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08538
Code URL: null
Copy Paste: [[2312.08538]] Contractive error feedback for gradient compression(http://arxiv.org/abs/2312.08538)
Summary:
On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.

Title: Learning adaptive planning representations with natural language guidance. (arXiv:2312.08566v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08566
Code URL: null
Copy Paste: [[2312.08566]] Learning adaptive planning representations with natural language guidance(http://arxiv.org/abs/2312.08566)
Summary:
Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into smaller subproblems appropriate for a goal or set of goals. This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from language models (LMs). Starting with a general-purpose hierarchical planner and a low-level goal-conditioned policy, Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks. On two language-guided interactive planning benchmarks (Mini Minecraft and ALFRED Household Tasks), Ada strongly outperforms other approaches that use LMs for sequential decision-making, offering more accurate plans and better generalization to complex tasks.

Title: Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models. (arXiv:2312.08762v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08762
Code URL: null
Copy Paste: [[2312.08762]] Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models(http://arxiv.org/abs/2312.08762)
Summary:
Chain-of-thought (CoT) reasoning has exhibited impressive performance in language models for solving complex tasks and answering questions. However, many real-world questions require multi-modal information, such as text and images. Previous research on multi-modal CoT has primarily focused on extracting fixed image features from off-the-shelf vision models and then fusing them with text using attention mechanisms. This approach has limitations because these vision models were not designed for complex reasoning tasks and do not align well with language thoughts. To overcome this limitation, we introduce a novel approach for multi-modal CoT reasoning that utilizes latent space learning via diffusion processes to generate effective image features that align with language thoughts. Our method fuses image features and text representations at a deep level and improves the complex reasoning ability of multi-modal CoT. We demonstrate the efficacy of our proposed method on multi-modal ScienceQA and machine translation benchmarks, achieving state-of-the-art performance on ScienceQA. Overall, our approach offers a more robust and effective solution for multi-modal reasoning in language models, enhancing their ability to tackle complex real-world problems.

Title: Evaluating Large Language Models for Health-related Queries with Presuppositions. (arXiv:2312.08800v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08800
Code URL: null
Copy Paste: [[2312.08800]] Evaluating Large Language Models for Health-related Queries with Presuppositions(http://arxiv.org/abs/2312.08800)
Summary:
As corporations rush to integrate large language models (LLMs) to their search offerings, it is critical that they provide factually accurate information that is robust to any presuppositions that a user may express. In this work, we introduce UPHILL, a dataset consisting of health-related queries with varying degrees of presuppositions. Using UPHILL, we evaluate the factual accuracy and consistency of InstructGPT, ChatGPT, and BingChat models. We find that while model responses rarely disagree with true health claims (posed as questions), they often fail to challenge false claims: responses from InstructGPT agree with 32% of the false claims, ChatGPT 26% and BingChat 23%. As we increase the extent of presupposition in input queries, the responses from InstructGPT and ChatGPT agree with the claim considerably more often, regardless of its veracity. Responses from BingChat, which rely on retrieved webpages, are not as susceptible. Given the moderate factual accuracy, and the inability of models to consistently correct false assumptions, our work calls for a careful assessment of current LLMs for use in high-stakes scenarios.

Title: Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent. (arXiv:2312.08926v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08926
Code URL: https://github.com/oashua/mathagent
Copy Paste: [[2312.08926]] Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent(http://arxiv.org/abs/2312.08926)
Summary:
Large language models (LLMs) face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named $\bf{P}$lanner-$\bf{R}$easoner-$\bf{E}$xecutor-$\bf{R}$eflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of $12.3\%$($53.9\%\xrightarrow{}66.2\%$) on the MiniF2F, $9.2\%$ ($49.8\%\xrightarrow{}59.0\%$) on MATH, and $13.2\%$($23.2\%\xrightarrow{}35.4\%$) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.

Title: LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers. (arXiv:2312.08958v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08958
Code URL: null
Copy Paste: [[2312.08958]] LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers(http://arxiv.org/abs/2312.08958)
Summary:
We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by providing reward feedback. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment while prior unsupervised skill discovery methods struggle. Additionally, we discuss observed challenges of using off-the-shelf foundation models as teachers and our efforts to address them.

Title: Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model. (arXiv:2312.08987v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08987
Code URL: https://github.com/ml4bio/uspnet
Copy Paste: [[2312.08987]] Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model(http://arxiv.org/abs/2312.08987)
Summary:
Signal peptide (SP) is a short peptide located in the N-terminus of proteins. It is essential to target and transfer transmembrane and secreted proteins to correct positions. Compared with traditional experimental methods to identify signal peptides, computational methods are faster and more efficient, which are more practical for analyzing thousands or even millions of protein sequences, especially for metagenomic data. Here we present Unbiased Organism-agnostic Signal Peptide Network (USPNet), a signal peptide classification and cleavage site prediction deep learning method that takes advantage of protein language models. We propose to apply label distribution-aware margin loss to handle data imbalance problems and use evolutionary information of protein to enrich representation and overcome species information dependence.

Title: Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach. (arXiv:2312.08579v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08579
Code URL: null
Copy Paste: [[2312.08579]] Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach(http://arxiv.org/abs/2312.08579)
Summary:
The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Names Working Group, n.d.). Many feature names overlap with places or people's names that they are named after, for example, Syria, Tempe, Einstein, and Sagan, to name a few (U.S. Geological Survey, n.d.). Some feature names have been used in many contexts, for instance, Apollo, which can refer to mission, program, sample, astronaut, seismic, seismometers, core, era, data, collection, instrument, and station, in addition to the crater on the Moon. Some feature names can appear in the text as adjectives, like the lunar craters Black, Green, and White. Some feature names in other contexts serve as directions, like craters West and South on the Moon. Additionally, some features share identical names across different celestial bodies, requiring disambiguation, such as the Adams crater, which exists on both the Moon and Mars. We present a multi-step pipeline combining rule-based filtering, statistical relevance analysis, part-of-speech (POS) tagging, named entity recognition (NER) model, hybrid keyword harvesting, knowledge graph (KG) matching, and inference with a locally installed large language model (LLM) to reliably identify planetary names despite these challenges. When evaluated on a dataset of astronomy papers from the Astrophysics Data System (ADS), this methodology achieves an F1-score over 0.97 in disambiguating planetary feature names.

Title: Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention. (arXiv:2312.08618v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08618
Code URL: null
Copy Paste: [[2312.08618]] Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention(http://arxiv.org/abs/2312.08618)
Summary:
This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information. Recognizing the inherent challenges in extending the context window for LLMs, primarily built on Transformer architecture, we propose a new model architecture, referred to as Zebra. This architecture efficiently manages the quadratic time and memory complexity issues associated with full attention in the Transformer by employing grouped local-global attention layers. Our model, akin to a zebra's alternating stripes, balances local and global attention layers, significantly reducing computational requirements and memory consumption. Comprehensive experiments, including pretraining from scratch, continuation of long context adaptation training, and long instruction tuning, are conducted to evaluate the Zebra's performance. The results show that Zebra achieves comparable or superior performance on both short and long sequence benchmarks, while also enhancing training and inference efficiency.

Title: Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference. (arXiv:2312.08747v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08747
Code URL: https://github.com/datngu/nli-artifacts
Copy Paste: [[2312.08747]] Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference(http://arxiv.org/abs/2312.08747)
Summary:
In recent years, the availability of large-scale annotated datasets, such as the Stanford Natural Language Inference and the Multi-Genre Natural Language Inference, coupled with the advent of pre-trained language models, has significantly contributed to the development of the natural language inference domain. However, these crowdsourced annotated datasets often contain biases or dataset artifacts, leading to overestimated model performance and poor generalization. In this work, we focus on investigating dataset artifacts and developing strategies to address these issues. Through the utilization of a novel statistical testing procedure, we discover a significant association between vocabulary distribution and text entailment classes, emphasizing vocabulary as a notable source of biases. To mitigate these issues, we propose several automatic data augmentation strategies spanning character to word levels. By fine-tuning the ELECTRA pre-trained language model, we compare the performance of boosted models with augmented data against their baseline counterparts. The experiments demonstrate that the proposed approaches effectively enhance model accuracy and reduce biases by up to 0.66% and 1.14%, respectively.

Title: Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning. (arXiv:2312.08900v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08900
Code URL: null
Copy Paste: [[2312.08900]] Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning(http://arxiv.org/abs/2312.08900)
Summary:
This paper introduces a novel Parameter-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models. PEFT techniques such as LoRA, BitFit and IA3 have demonstrated comparable performance to full fine-tuning of pre-trained models for specific downstream tasks, all while demanding significantly fewer trainable parameters and reduced GPU memory consumption. However, in the context of multi-modal fine-tuning, the need for architectural modifications or full fine-tuning often becomes apparent. To address this we propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain. This approach enables LoRA-like weight injection without requiring additional architectural changes. Our method is evaluated on the COCO captioning task, where it outperforms full fine-tuning under similar data constraints while simultaneously offering a substantially more parameter-efficient and computationally economical solution.

gpt

Title: Heterogeneous Graph Neural Architecture Search with GPT-4. (arXiv:2312.08680v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08680
Code URL: null
Copy Paste: [[2312.08680]] Heterogeneous Graph Neural Architecture Search with GPT-4(http://arxiv.org/abs/2312.08680)
Summary:
Heterogeneous graph neural architecture search (HGNAS) represents a powerful tool for automatically designing effective heterogeneous graph neural networks. However, existing HGNAS algorithms suffer from inefficient searches and unstable results. In this paper, we present a new GPT-4 based HGNAS model to improve the search efficiency and search accuracy of HGNAS. Specifically, we present a new GPT-4 enhanced Heterogeneous Graph Neural Architecture Search (GHGNAS for short). The basic idea of GHGNAS is to design a set of prompts that can guide GPT-4 toward the task of generating new heterogeneous graph neural architectures. By iteratively asking GPT-4 with the prompts, GHGNAS continually validates the accuracy of the generated HGNNs and uses the feedback to further optimize the prompts. Experimental results show that GHGNAS can design new HGNNs by leveraging the powerful generalization capability of GPT-4. Moreover, GHGNAS runs more effectively and stably than previous HGNAS models based on reinforcement learning and differentiable search algorithms.

Title: Detecting value-expressive text posts in Russian social media. (arXiv:2312.08968v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08968
Code URL: https://github.com/mmilkova/nlp-values
Copy Paste: [[2312.08968]] Detecting value-expressive text posts in Russian social media(http://arxiv.org/abs/2312.08968)
Summary:
Basic values are concepts or beliefs which pertain to desirable end-states and transcend specific situations. Studying personal values in social media can illuminate how and why societal values evolve especially when the stimuli-based methods, such as surveys, are inefficient, for instance, in hard-to-reach populations. On the other hand, user-generated content is driven by the massive use of stereotyped, culturally defined speech constructions rather than authentic expressions of personal values. We aimed to find a model that can accurately detect value-expressive posts in Russian social media VKontakte. A training dataset of 5,035 posts was annotated by three experts, 304 crowd-workers and ChatGPT. Crowd-workers and experts showed only moderate agreement in categorizing posts. ChatGPT was more consistent but struggled with spam detection. We applied an ensemble of human- and AI-assisted annotation involving active learning approach, subsequently trained several LLMs and selected a model based on embeddings from pre-trained fine-tuned rubert-tiny2, and reached a high quality of value detection with F1 = 0.75 (F1-macro = 0.80). This model provides a crucial step to a study of values within and between Russian social media users.

Title: Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets. (arXiv:2312.08495v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08495
Code URL: null
Copy Paste: [[2312.08495]] Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets(http://arxiv.org/abs/2312.08495)
Summary:
Recent research advances achieve human-level accuracy for de-identifying free-text clinical notes on research datasets, but gaps remain in reproducing this in large real-world settings. This paper summarizes lessons learned from building a system used to de-identify over one billion real clinical notes, in a fully automated way, that was independently certified by multiple organizations for production use. A fully automated solution requires a very high level of accuracy that does not require manual review. A hybrid context-based model architecture is described, which outperforms a Named Entity Recogniton (NER) - only model by 10% on the i2b2-2014 benchmark. The proposed system makes 50%, 475%, and 575% fewer errors than the comparable AWS, Azure, and GCP services respectively while also outperforming ChatGPT by 33%. It exceeds 98% coverage of sensitive data across 7 European languages, without a need for fine tuning. A second set of described models enable data obfuscation -- replacing sensitive data with random surrogates -- while retaining name, date, gender, clinical, and format consistency. Both the practical need and the solution architecture that provides for reliable & linked anonymized documents are described.

llm

Title: Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction. (arXiv:2312.08400v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08400
Code URL: null
Copy Paste: [[2312.08400]] Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction(http://arxiv.org/abs/2312.08400)
Summary:
Large language models (LLMs) finetuned to follow human instruction have recently exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC), especially on languages other than English, remains significantly unexplored. In this work, we evaluate the abilities of instruction finetuned LLMs in Arabic GEC, a complex task due to Arabic's rich morphology. Our findings suggest that various prompting methods, coupled with (in-context) few-shot learning, demonstrate considerable effectiveness, with GPT-4 achieving up to $65.49$ F$_{1}$ score under expert prompting (approximately $5$ points higher than our established baseline). Despite these positive results, we find that instruction finetuned models, regardless of their size, are still outperformed by fully finetuned ones, even if they are significantly smaller in size. This disparity highlights substantial room for improvements for LLMs. Inspired by methods used in low-resource machine translation, we also develop a method exploiting synthetic data that significantly outperforms previous models on two standard Arabic benchmarks. Our best model achieves a new SOTA on Arabic GEC, with $73.29$ and $73.26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively, compared to peer-reviewed published baselines.

Title: ChatSOS: LLM-based knowledge Q&A system for safety engineering. (arXiv:2312.08629v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08629
Code URL: null
Copy Paste: [[2312.08629]] ChatSOS: LLM-based knowledge Q&A system for safety engineering(http://arxiv.org/abs/2312.08629)
Summary:
Recent advancements in large language models (LLMs) have notably propelled natural language processing (NLP) capabilities, demonstrating significant potential in safety engineering applications. Despite these advancements, LLMs face constraints in processing specialized tasks, attributed to factors such as corpus size, input processing limitations, and privacy concerns. Obtaining useful information from reliable sources in a limited time is crucial for LLM. Addressing this, our study introduces an LLM-based Q&A system for safety engineering, enhancing the comprehension and response accuracy of the model. We employed prompt engineering to incorporate external knowledge databases, thus enriching the LLM with up-to-date and reliable information. The system analyzes historical incident reports through statistical methods, utilizes vector embedding to construct a vector database, and offers an efficient similarity-based search functionality. Our findings indicate that the integration of external knowledge significantly augments the capabilities of LLM for in-depth problem analysis and autonomous task assignment. It effectively summarizes accident reports and provides pertinent recommendations. This integration approach not only expands LLM applications in safety engineering but also sets a precedent for future developments towards automation and intelligent systems.

Title: TigerBot: An Open Multilingual Multitask LLM. (arXiv:2312.08688v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08688
Code URL: null
Copy Paste: [[2312.08688]] TigerBot: An Open Multilingual Multitask LLM(http://arxiv.org/abs/2312.08688)
Summary:
We release and introduce the TigerBot family of large language models (LLMs), consisting of base and chat models, sized from 7, 13, 70 and 180 billion parameters. We develop our models embarking from Llama-2 and BLOOM, and push the boundary further in data, training algorithm, infrastructure, and application tools. Our models yield meaningful performance gain over SOTA open-source models, e.g., Llama-2, specifically 6\% gain in English and 20\% gain in Chinese. TigerBot model family also achieves leading performance in major academic and industrial benchmarks and leaderboards. We believe that TigerBot represents just a snapshot of lightning-fast progression in LLM open-source community. Therefore, we are thrilled to give back by publicly releasing our models and reporting our approach behind, with additional emphases on building SOTA LLMs in a democratized way and making LLMs of use in real-world applications.

Title: Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory. (arXiv:2312.08702v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08702
Code URL: null
Copy Paste: [[2312.08702]] Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory(http://arxiv.org/abs/2312.08702)
Summary:
Having the ability to empathize is crucial for accurately representing human behavior during conversations. Despite numerous research aim to improve the cognitive capability of models by incorporating external knowledge, there has been limited attention on the sensible and rational expression of the conversation itself, which are crucial components of the cognitive empathy. Guided by self-presentation theory in sociology, we have designed an innovative categorical approach that segregates historical dialogues into sensible and rational sentences and subsequently elucidate the context through the designed attention mechanism. However, the rational information within the conversation is restricted and the external knowledge used in previous methods have limitations of semantic contradiction and narrow vision field. Considering the impressive performance of LLM in the domain of intelligent agent. We employ LLaMA2-70b as a rational brain to analyze the profound logical information maintained in conversations, which assists the model assessing the balance of sensibility and rationality to produce quality empathetic responses. Experimental evaluations demonstrate that our method outperforms other comparable methods on both automatic and human evaluations.

Title: Forbidden Facts: An Investigation of Competing Objectives in Llama-2. (arXiv:2312.08793v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08793
Code URL: null
Copy Paste: [[2312.08793]] Forbidden Facts: An Investigation of Competing Objectives in Llama-2(http://arxiv.org/abs/2312.08793)
Summary:
LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the forbidden fact task. Specifically, we instruct Llama-2 to truthfully complete a factual recall statement while forbidding it from saying the correct answer. This often makes the model give incorrect answers. We decompose Llama-2 into 1000+ components, and rank each one with respect to how useful it is for forbidding the correct answer. We find that in aggregate, around 35 components are enough to reliably implement the full suppression behavior. However, these components are fairly heterogeneous and many operate using faulty heuristics. We discover that one of these heuristics can be exploited via a manually designed adversarial attack which we call The California Attack. Our results highlight some roadblocks standing in the way of being able to successfully interpret advanced ML systems. Project website available at https://forbiddenfacts.github.io .

Title: Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning. (arXiv:2312.08901v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08901
Code URL: null
Copy Paste: [[2312.08901]] Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning(http://arxiv.org/abs/2312.08901)
Summary:
Large language models (LLMs) have shown impressive capabilities in various tasks, yet they still struggle with math reasoning. Despite efforts to optimize Chain-of-Thoughts (CoT) prompts and fine-tune LLMs, the potential of few-shot learning remains unexplored. In this work, we propose CoT-Max, a novel approach pushing the boundaries of few-shot CoT learning to improve LLM math reasoning capabilities. CoT-Max addresses the challenges of the selection of useful examples and limited number of examples due to restricted context window length. Inspired by our observation that natural language inputs contain many redundancy, we propose a coarse-to-fine pruner as a plug-and-play module for LLMs, which first identifies crucial CoT examples from a large batch and then further prunes unimportant tokens. To train the pruner, we collect a math reasoning dataset with diverse difficulty and steps, introduce a reward to measure both the input's effectiveness for math reasoning and token length constraints, and propose a novel training approach with reinforcement learning. As a result, CoT-Max significantly outperforms CoT and few-shot prompting baselines across various LLMs (LLaMA2-7B, 13B, 70B) and 5 mathematical datasets, achieving up to 4.55% absolute improvements. Remarkably, without any fine-tuning, LLaMA2-70B with CoT-Max surpasses GPT-3.5 and a wide range of larger LLMs (PaLM, Minerva, etc.) on the GSM8K.

Title: Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning. (arXiv:2312.08935v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08935
Code URL: null
Copy Paste: [[2312.08935]] Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning(http://arxiv.org/abs/2312.08935)
Summary:
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. However, even the most advanced open-source LLMs, such as the LLaMA family models, still face challenges when it comes to accurately solving complex multi-step mathematical problems. In this paper, we present an innovative process-oriented math verifier called \textbf{Math-Shepherd}, which assigns a reward score to each step of the LLM's outputs on math problems. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. With the guidance of Math-Shepherd, a series of open-source LLMs demonstrate exceptional performance. Among them, DeepSeek 67B \citep{DeepSeek-llm} stands out by achieving accuracy rates of 93.3\% on the GSM8K dataset and 48.1\% on the MATH dataset, without external enhancement such as tool usage. Our Math-Shepherd also outperforms the self-consistency method and other existing verification models. We believe that automatic process supervision holds significant potential for the future evolution of LLMs.

Title: ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks. (arXiv:2312.08583v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08583
Code URL: https://github.com/microsoft/DeepSpeed
Copy Paste: [[2312.08583]] ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks(http://arxiv.org/abs/2312.08583)
Summary:
This study examines 4-bit quantization methods like GPTQ in large language models (LLMs), highlighting GPTQ's overfitting and limited enhancement in Zero-Shot tasks. While prior works merely focusing on zero-shot measurement, we extend task scope to more generative categories such as code generation and abstractive summarization, in which we found that INT4 quantization can significantly underperform. However, simply shifting to higher precision formats like FP6 has been particularly challenging, thus overlooked, due to poor performance caused by the lack of sophisticated integration and system acceleration strategies on current AI hardware. Our results show that FP6, even with a coarse-grain quantization scheme, performs robustly across various algorithms and tasks, demonstrating its superiority in accuracy and versatility. Notably, with the FP6 quantization, \codestar-15B model performs comparably to its FP16 counterpart in code generation, and for smaller models like the 406M it closely matches their baselines in summarization. Neither can be achieved by INT4. To better accommodate various AI hardware and achieve the best system performance, we propose a novel 4+2 design for FP6 to achieve similar latency to the state-of-the-art INT4 fine-grain quantization. With our design, FP6 can become a promising solution to the current 4-bit quantization methods used in LLMs.

Title: A Comparative Analysis of Fine-Tuned LLMs and Few-Shot Learning of LLMs for Financial Sentiment Analysis. (arXiv:2312.08725v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08725
Code URL: null
Copy Paste: [[2312.08725]] A Comparative Analysis of Fine-Tuned LLMs and Few-Shot Learning of LLMs for Financial Sentiment Analysis(http://arxiv.org/abs/2312.08725)
Summary:
Financial sentiment analysis plays a crucial role in uncovering latent patterns and detecting emerging trends, enabling individuals to make well-informed decisions that may yield substantial advantages within the constantly changing realm of finance. Recently, Large Language Models (LLMs) have demonstrated their effectiveness in diverse domains, showcasing remarkable capabilities even in zero-shot and few-shot in-context learning for various Natural Language Processing (NLP) tasks. Nevertheless, their potential and applicability in the context of financial sentiment analysis have not been thoroughly explored yet. To bridge this gap, we employ two approaches: in-context learning (with a focus on gpt-3.5-turbo model) and fine-tuning LLMs on a finance-domain dataset. Given the computational costs associated with fine-tuning LLMs with large parameter sizes, our focus lies on smaller LLMs, spanning from 250M to 3B parameters for fine-tuning. We then compare the performances with state-of-the-art results to evaluate their effectiveness in the finance-domain. Our results demonstrate that fine-tuned smaller LLMs can achieve comparable performance to state-of-the-art fine-tuned LLMs, even with models having fewer parameters and a smaller training dataset. Additionally, the zero-shot and one-shot performance of LLMs produces comparable results with fine-tuned smaller LLMs and state-of-the-art outcomes. Furthermore, our analysis demonstrates that there is no observed enhancement in performance for finance-domain sentiment analysis when the number of shots for in-context learning is increased.

long context

lora

Title: Artificial Intelligence and Human Geography. (arXiv:2312.08827v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08827
Code URL: null
Copy Paste: [[2312.08827]] Artificial Intelligence and Human Geography(http://arxiv.org/abs/2312.08827)
Summary:
This paper examines the recent advances and applications of AI in human geography especially the use of machine (deep) learning, including place representation and modeling, spatial analysis and predictive mapping, and urban planning and design. AI technologies have enabled deeper insights into complex human-environment interactions, contributing to more effective scientific exploration, understanding of social dynamics, and spatial decision-making. Furthermore, human geography offers crucial contributions to AI, particularly in context-aware model development, human-centered design, biases and ethical considerations, and data privacy. The synergy beween AI and human geography is essential for addressing global challenges like disaster resilience, poverty, and equitable resource access. This interdisciplinary collaboration between AI and geography will help advance the development of GeoAI and promise a better and sustainable world for all.

Title: Fair Active Learning in Low-Data Regimes. (arXiv:2312.08559v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08559
Code URL: null
Copy Paste: [[2312.08559]] Fair Active Learning in Low-Data Regimes(http://arxiv.org/abs/2312.08559)
Summary:
In critical machine learning applications, ensuring fairness is essential to avoid perpetuating social inequities. In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments, where the cost of collecting labeled data prohibits the use of large, labeled datasets. In such settings, active learning promises to maximize marginal accuracy gains of small amounts of labeled data. However, existing applications of active learning for fairness fail to deliver on this, typically requiring large labeled datasets, or failing to ensure the desired fairness tolerance is met on the population distribution.

To address such limitations, we introduce an innovative active learning framework that combines an exploration procedure inspired by posterior sampling with a fair classification subroutine. We demonstrate that this framework performs effectively in very data-scarce regimes, maximizing accuracy while satisfying fairness constraints with high probability. We evaluate our proposed approach using well-established real-world benchmark datasets and compare it against state-of-the-art methods, demonstrating its effectiveness in producing fair models, and improvement over existing methods.

Title: A Cyber-Physical Architecture for Microgrids based on Deep learning and LORA Technology. (arXiv:2312.08818v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08818
Code URL: null
Copy Paste: [[2312.08818]] A Cyber-Physical Architecture for Microgrids based on Deep learning and LORA Technology(http://arxiv.org/abs/2312.08818)
Summary:
This paper proposes a cyber-physical architecture for the secured social operation of isolated hybrid microgrids (HMGs). On the physical side of the proposed architecture, an optimal scheduling scheme considering various renewable energy sources (RESs) and fossil fuel-based distributed generation units (DGs) is proposed. Regarding the cyber layer of MGs, a wireless architecture based on low range wide area (LORA) technology is introduced for advanced metering infrastructure (AMI) in smart electricity grids. In the proposed architecture, the LORA data frame is described in detail and designed for the application of smart meters considering DGs and ac-dc converters. Additionally, since the cyber layer of smart grids is highly vulnerable to cyber-attacks, t1his paper proposes a deep-learning-based cyber-attack detection model (CADM) based on bidirectional long short-term memory (BLSTM) and sequential hypothesis testing (SHT) to detect false data injection attacks (FDIA) on the smart meters within AMI. The performance of the proposed energy management architecture is evaluated using the IEEE 33-bus test system. In order to investigate the effect of FDIA on the isolated HMGs and highlight the interactions between the cyber layer and physical layer, an FDIA is launched against the test system. The results showed that a successful attack can highly damage the system and cause widespread load shedding. Also, the performance of the proposed CADM is examined using a real-world dataset. Results prove the effectiveness of the proposed CADM in detecting the attacks using only two samples.

hallucination

prompt

Title: Metacognition-Enhanced Few-Shot Prompting With Positive Reinforcement. (arXiv:2312.08642v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08642
Code URL: null
Copy Paste: [[2312.08642]] Metacognition-Enhanced Few-Shot Prompting With Positive Reinforcement(http://arxiv.org/abs/2312.08642)
Summary:
Few-shot prompting elicits the remarkable abilities of large language models by equipping them with a few demonstration examples in the input. However, the traditional method of providing large language models with all demonstration input-output pairs at once may not effectively guide large language models to learn the specific input-output mapping relationship. In this paper, inspired by the regulatory and supportive role of metacognition in students' learning, we propose a novel metacognition-enhanced few-shot prompting, which guides large language models to reflect on their thought processes to comprehensively learn the given demonstration examples. Furthermore, considering that positive reinforcement can improve students' learning motivation, we introduce positive reinforcement into our metacognition-enhanced few-shot prompting to promote the few-shot learning of large language models by providing response-based positive feedback. The experimental results on two real-world datasets show that our metacognition-enhanced few-shot prompting with positive reinforcement surpasses traditional few-shot prompting in classification accuracy and macro F1.

Title: Labels Need Prompts Too Mask Matching for Natural Language Understanding Tasks. (arXiv:2312.08726v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08726
Code URL: null
Copy Paste: [[2312.08726]] Labels Need Prompts Too Mask Matching for Natural Language Understanding Tasks(http://arxiv.org/abs/2312.08726)
Summary:
Textual label names (descriptions) are typically semantically rich in many natural language understanding (NLU) tasks. In this paper, we incorporate the prompting methodology, which is widely used to enrich model input, into the label side for the first time. Specifically, we propose a Mask Matching method, which equips an input with a prompt and its label with another, and then makes predictions by matching their mask representations. We evaluate our method extensively on 8 NLU tasks with 14 datasets. The experimental results show that Mask Matching significantly outperforms its counterparts of fine-tuning and conventional prompt-tuning, setting up state-of-the-art performances in several datasets. Mask Matching is particularly good at handling NLU tasks with large label counts and informative label names. As pioneering efforts that investigate the label-side prompt, we also discuss open issues for future study.

Title: MotherNet: A Foundational Hypernetwork for Tabular Classification. (arXiv:2312.08598v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08598
Code URL: null
Copy Paste: [[2312.08598]] MotherNet: A Foundational Hypernetwork for Tabular Classification(http://arxiv.org/abs/2312.08598)
Summary:
The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks, that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network. Like other Foundation Models, MotherNet replaces training on specific datasets with in-context learning through a single forward pass. In contrast to existing hypernetworks that were either task-specific or trained for relatively constraint multi-task settings, MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent.

The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets, and is competitive with predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of transformer models like TabPFN, MotherNet generated networks are highly efficient at inference time. This methodology opens up a new approach to building predictive models on tabular data that is both efficient and robust, without any dataset-specific training.

code

Title: ALGNet: Attention Light Graph Memory Network for Medical Recommendation System. (arXiv:2312.08377v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08377
Code URL: null
Copy Paste: [[2312.08377]] ALGNet: Attention Light Graph Memory Network for Medical Recommendation System(http://arxiv.org/abs/2312.08377)
Summary:
Medication recommendation is a vital task for improving patient care and reducing adverse events. However, existing methods often fail to capture the complex and dynamic relationships among patient medical records, drug efficacy and safety, and drug-drug interactions (DDI). In this paper, we propose ALGNet, a novel model that leverages light graph convolutional networks (LGCN) and augmentation memory networks (AMN) to enhance medication recommendation. LGCN can efficiently encode the patient records and the DDI graph into low-dimensional embeddings, while AMN can augment the patient representation with external knowledge from a memory module. We evaluate our model on the MIMIC-III dataset and show that it outperforms several baselines in terms of recommendation accuracy and DDI avoidance. We also conduct an ablation study to analyze the effects of different components of our model. Our results demonstrate that ALGNet can achieve superior performance with less computation and more interpretability. The implementation of this paper can be found at: https://github.com/huyquoctrinh/ALGNet.

Title: Earthfarseer: Versatile Spatio-Temporal Dynamical Systems Modeling in One Model. (arXiv:2312.08403v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08403
Code URL: https://github.com/easylearningscores/earthfarseer
Copy Paste: [[2312.08403]] Earthfarseer: Versatile Spatio-Temporal Dynamical Systems Modeling in One Model(http://arxiv.org/abs/2312.08403)
Summary:
Efficiently modeling spatio-temporal (ST) physical processes and observations presents a challenging problem for the deep learning community. Many recent studies have concentrated on meticulously reconciling various advantages, leading to designed models that are neither simple nor practical. To address this issue, this paper presents a systematic study on existing shortcomings faced by off-the-shelf models, including lack of local fidelity, poor prediction performance over long time-steps,low scalability, and inefficiency. To systematically address the aforementioned problems, we propose an EarthFarseer, a concise framework that combines parallel local convolutions and global Fourier-based transformer architectures, enabling dynamically capture the local-global spatial interactions and dependencies. EarthFarseer also incorporates a multi-scale fully convolutional and Fourier architectures to efficiently and effectively capture the temporal evolution. Our proposal demonstrates strong adaptability across various tasks and datasets, with fast convergence and better local fidelity in long time-steps predictions. Extensive experiments and visualizations over eight human society physical and natural physical datasets demonstrates the state-of-the-art performance of EarthFarseer. We release our code at https://github.com/easylearningscores/EarthFarseer.

Title: Harmonics of Learning: Universal Fourier Features Emerge in Invariant Networks. (arXiv:2312.08550v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08550
Code URL: null
Copy Paste: [[2312.08550]] Harmonics of Learning: Universal Fourier Features Emerge in Invariant Networks(http://arxiv.org/abs/2312.08550)
Summary:
In this work, we formally prove that, under certain conditions, if a neural network is invariant to a finite group then its weights recover the Fourier transform on that group. This provides a mathematical explanation for the emergence of Fourier features -- a ubiquitous phenomenon in both biological and artificial learning systems. The results hold even for non-commutative groups, in which case the Fourier transform encodes all the irreducible unitary group representations. Our findings have consequences for the problem of symmetry discovery. Specifically, we demonstrate that the algebraic structure of an unknown group can be recovered from the weights of a network that is at least approximately invariant within certain bounds. Overall, this work contributes to a foundation for an algebraic learning theory of invariant neural network representations.

Title: CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph. (arXiv:2312.08672v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08672
Code URL: null
Copy Paste: [[2312.08672]] CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph(http://arxiv.org/abs/2312.08672)
Summary:
Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination ability decline in heterophilic graphs because the high proportion of dissimilar neighbors can weaken the self-attention of the central node, jointly resulting in the deviation of the central node from similar nodes in the representation space. This kind of effect generated by neighboring nodes is called the Distraction Effect (DE) in this paper. To estimate and weaken the DE of neighboring nodes, we propose a Causally graph Attention network for Trimming heterophilic graph (CAT). To estimate the DE, since the DE are generated through two paths (grab the attention assigned to neighbors and reduce the self-attention of the central node), we use Total Effect to model DE, which is a kind of causal estimand and can be estimated from intervened data; To weaken the DE, we identify the neighbors with the highest DE (we call them Distraction Neighbors) and remove them. We adopt three representative GATs as the base model within the proposed CAT framework and conduct experiments on seven heterophilic datasets in three different sizes. Comparative experiments show that CAT can improve the node classification accuracy of all base GAT models. Ablation experiments and visualization further validate the enhancement of discrimination ability brought by CAT. The source code is available at https://github.com/GeoX-Lab/CAT.

Title: Gradient Informed Proximal Policy Optimization. (arXiv:2312.08710v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08710
Code URL: https://github.com/sonsang/gippo
Copy Paste: [[2312.08710]] Gradient Informed Proximal Policy Optimization(http://arxiv.org/abs/2312.08710)
Summary:
We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an {\alpha}-policy that stands as a locally superior policy. By adaptively modifying the {\alpha} value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo.

Title: Automated Process Planning Based on a Semantic Capability Model and SMT. (arXiv:2312.08801v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08801
Code URL: null
Copy Paste: [[2312.08801]] Automated Process Planning Based on a Semantic Capability Model and SMT(http://arxiv.org/abs/2312.08801)
Summary:
In research of manufacturing systems and autonomous robots, the term capability is used for a machine-interpretable specification of a system function. Approaches in this research area develop information models that capture all information relevant to interpret the requirements, effects and behavior of functions. These approaches are intended to overcome the heterogeneity resulting from the various types of processes and from the large number of different vendors. However, these models and associated methods do not offer solutions for automated process planning, i.e. finding a sequence of individual capabilities required to manufacture a certain product or to accomplish a mission using autonomous robots. Instead, this is a typical task for AI planning approaches, which unfortunately require a high effort to create the respective planning problem descriptions. In this paper, we present an approach that combines these two topics: Starting from a semantic capability model, an AI planning problem is automatically generated. The planning problem is encoded using Satisfiability Modulo Theories and uses an existing solver to find valid capability sequences including required parameter values. The approach also offers possibilities to integrate existing human expertise and to provide explanations for human operators in order to help understand planning decisions.

Title: JPIS: A Joint Model for Profile-based Intent Detection and Slot Filling with Slot-to-Intent Attention. (arXiv:2312.08737v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08737
Code URL: null
Copy Paste: [[2312.08737]] JPIS: A Joint Model for Profile-based Intent Detection and Slot Filling with Slot-to-Intent Attention(http://arxiv.org/abs/2312.08737)
Summary:
Profile-based intent detection and slot filling are important tasks aimed at reducing the ambiguity in user utterances by leveraging user-specific supporting profile information. However, research in these two tasks has not been extensively explored. To fill this gap, we propose a joint model, namely JPIS, designed to enhance profile-based intent detection and slot filling. JPIS incorporates the supporting profile information into its encoder and introduces a slot-to-intent attention mechanism to transfer slot information representations to intent detection. Experimental results show that our JPIS substantially outperforms previous profile-based models, establishing a new state-of-the-art performance in overall accuracy on the Chinese benchmark dataset ProSLU.

Title: Accelerating Meta-Learning by Sharing Gradients. (arXiv:2312.08398v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08398
Code URL: null
Copy Paste: [[2312.08398]] Accelerating Meta-Learning by Sharing Gradients(http://arxiv.org/abs/2312.08398)
Summary:
The success of gradient-based meta-learning is primarily attributed to its ability to leverage related tasks to learn task-invariant information. However, the absence of interactions between different tasks in the inner loop leads to task-specific over-fitting in the initial phase of meta-training. While this is eventually corrected by the presence of these interactions in the outer loop, it comes at a significant cost of slower meta-learning. To address this limitation, we explicitly encode task relatedness via an inner loop regularization mechanism inspired by multi-task learning. Our algorithm shares gradient information from previously encountered tasks as well as concurrent tasks in the same task batch, and scales their contribution with meta-learned parameters. We show using two popular few-shot classification datasets that gradient sharing enables meta-learning under bigger inner loop learning rates and can accelerate the meta-training process by up to 134%.

Title: ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance. (arXiv:2312.08852v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08852
Code URL: null
Copy Paste: [[2312.08852]] ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance(http://arxiv.org/abs/2312.08852)
Summary:
Deep learning has achieved remarkable success in graph-related tasks, yet this accomplishment heavily relies on large-scale high-quality annotated datasets. However, acquiring such datasets can be cost-prohibitive, leading to the practical use of labels obtained from economically efficient sources such as web searches and user tags. Unfortunately, these labels often come with noise, compromising the generalization performance of deep networks. To tackle this challenge and enhance the robustness of deep learning models against label noise in graph-based tasks, we propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE). The core idea of ERASE is to learn representations with error tolerance by maximizing coding rate reduction. Particularly, we introduce a decoupled label propagation method for learning representations. Before training, noisy labels are pre-corrected through structural denoising. During training, ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience, which significantly improves the generalization performance in node classification. The proposed method allows us to more effectively withstand errors caused by mislabeled nodes, thereby strengthening the robustness of deep networks in handling noisy graph data. Extensive experimental results show that our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability. Codes are released at https://github.com/eraseai/erase.

Title: Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems. (arXiv:2312.08884v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08884
Code URL: https://github.com/tumbais/gr-madrl-amod
Copy Paste: [[2312.08884]] Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems(http://arxiv.org/abs/2312.08884)
Summary:
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.

Title: BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials. (arXiv:2312.08937v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08937
Code URL: https://github.com/xingrun-xing/bipft
Copy Paste: [[2312.08937]] BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials(http://arxiv.org/abs/2312.08937)
Summary:
Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Binary Pretrained Foundation Transformer (BiPFT) for natural language understanding (NLU) tasks, which remarkably saves 56 times operations and 28 times memory. In contrast to previous task-specific binary transformers, BiPFT exhibits a substantial enhancement in the learning capabilities of binary neural networks (BNNs), promoting BNNs into the era of pre-training. Benefiting from extensive pretraining data, we further propose a data-driven binarization method. Specifically, we first analyze the binarization error in self-attention operations and derive the polynomials of binarization error. To simulate full-precision self-attention, we define binarization error as binarization residual polynomials, and then introduce low-rank estimators to model these polynomials. Extensive experiments validate the effectiveness of BiPFTs, surpassing task-specific baseline by 15.4% average performance on the GLUE benchmark. BiPFT also demonstrates improved robustness to hyperparameter changes, improved optimization efficiency, and reduced reliance on downstream distillation, which consequently generalize on various NLU tasks and simplify the downstream pipeline of BNNs. Our code and pretrained models are publicly available at https://github.com/Xingrun-Xing/BiPFT.

Title: EAT: Towards Long-Tailed Out-of-Distribution Detection. (arXiv:2312.08939v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08939
Code URL: https://github.com/stomach-ache/long-tailed-ood-detection
Copy Paste: [[2312.08939]] EAT: Towards Long-Tailed Out-of-Distribution Detection(http://arxiv.org/abs/2312.08939)
Summary:
Despite recent advancements in out-of-distribution (OOD) detection, most current studies assume a class-balanced in-distribution training dataset, which is rarely the case in real-world scenarios. This paper addresses the challenging task of long-tailed OOD detection, where the in-distribution data follows a long-tailed class distribution. The main difficulty lies in distinguishing OOD data from samples belonging to the tail classes, as the ability of a classifier to detect OOD instances is not strongly correlated with its accuracy on the in-distribution classes. To overcome this issue, we propose two simple ideas: (1) Expanding the in-distribution class space by introducing multiple abstention classes. This approach allows us to build a detector with clear decision boundaries by training on OOD data using virtual labels. (2) Augmenting the context-limited tail classes by overlaying images onto the context-rich OOD data. This technique encourages the model to pay more attention to the discriminative features of the tail classes. We provide a clue for separating in-distribution and OOD data by analyzing gradient noise. Through extensive experiments, we demonstrate that our method outperforms the current state-of-the-art on various benchmark datasets. Moreover, our method can be used as an add-on for existing long-tail learning approaches, significantly enhancing their OOD detection performance. Code is available at: https://github.com/Stomach-ache/Long-Tailed-OOD-Detection .

Title: Uncertainty in GNN Learning Evaluations: A Comparison Between Measures for Quantifying Randomness in GNN Community Detection. (arXiv:2312.09015v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.09015
Code URL: null
Copy Paste: [[2312.09015]] Uncertainty in GNN Learning Evaluations: A Comparison Between Measures for Quantifying Randomness in GNN Community Detection(http://arxiv.org/abs/2312.09015)
Summary:
(1) The enhanced capability of Graph Neural Networks (GNNs) in unsupervised community detection of clustered nodes is attributed to their capacity to encode both the connectivity and feature information spaces of graphs. The identification of latent communities holds practical significance in various domains, from social networks to genomics. Current real-world performance benchmarks are perplexing due to the multitude of decisions influencing GNN evaluations for this task. (2) Three metrics are compared to assess the consistency of algorithm rankings in the presence of randomness. The consistency and quality of performance between the results under a hyperparameter optimisation with the default hyperparameters is evaluated. (3) The results compare hyperparameter optimisation with default hyperparameters, revealing a significant performance loss when neglecting hyperparameter investigation. A comparison of metrics indicates that ties in ranks can substantially alter the quantification of randomness. (4) Ensuring adherence to the same evaluation criteria may result in notable differences in the reported performance of methods for this task. The $W$ Randomness coefficient, based on the Wasserstein distance, is identified as providing the most robust assessment of randomness.

chat

retrieval augmented generation

rag

Title: Personalized Decision Supports based on Theory of Mind Modeling and Explainable Reinforcement Learning. (arXiv:2312.08397v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08397
Code URL: null
Copy Paste: [[2312.08397]] Personalized Decision Supports based on Theory of Mind Modeling and Explainable Reinforcement Learning(http://arxiv.org/abs/2312.08397)
Summary:
In this paper, we propose a novel personalized decision support system that combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning (XRL) to provide effective and interpretable interventions. Our method leverages DRL to provide expert action recommendations while incorporating ToM modeling to understand users' mental states and predict their future actions, enabling appropriate timing for intervention. To explain interventions, we use counterfactual explanations based on RL's feature importance and users' ToM model structure. Our proposed system generates accurate and personalized interventions that are easily interpretable by end-users. We demonstrate the effectiveness of our approach through a series of crowd-sourcing experiments in a simulated team decision-making task, where our system outperforms control baselines in terms of task performance. Our proposed approach is agnostic to task environment and RL model structure, therefore has the potential to be generalized to a wide range of applications.

Title: How much can change in a year? Revisiting Evaluation in Multi-Agent Reinforcement Learning. (arXiv:2312.08463v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08463
Code URL: null
Copy Paste: [[2312.08463]] How much can change in a year? Revisiting Evaluation in Multi-Agent Reinforcement Learning(http://arxiv.org/abs/2312.08463)
Summary:
Establishing sound experimental standards and rigour is important in any growing field of research. Deep Multi-Agent Reinforcement Learning (MARL) is one such nascent field. Although exciting progress has been made, MARL has recently come under scrutiny for replicability issues and a lack of standardised evaluation methodology, specifically in the cooperative setting. Although protocols have been proposed to help alleviate the issue, it remains important to actively monitor the health of the field. In this work, we extend the database of evaluation methodology previously published by containing meta-data on MARL publications from top-rated conferences and compare the findings extracted from this updated database to the trends identified in their work. Our analysis shows that many of the worrying trends in performance reporting remain. This includes the omission of uncertainty quantification, not reporting all relevant evaluation details and a narrowing of algorithmic development classes. Promisingly, we do observe a trend towards more difficult scenarios in SMAC-v1, which if continued into SMAC-v2 will encourage novel algorithmic development. Our data indicate that replicability needs to be approached more proactively by the MARL community to ensure trust in the field as we move towards exciting new frontiers.

Title: On Diagnostics for Understanding Agent Training Behaviour in Cooperative MARL. (arXiv:2312.08468v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08468
Code URL: null
Copy Paste: [[2312.08468]] On Diagnostics for Understanding Agent Training Behaviour in Cooperative MARL(http://arxiv.org/abs/2312.08468)
Summary:
Cooperative multi-agent reinforcement learning (MARL) has made substantial strides in addressing the distributed decision-making challenges. However, as multi-agent systems grow in complexity, gaining a comprehensive understanding of their behaviour becomes increasingly challenging. Conventionally, tracking team rewards over time has served as a pragmatic measure to gauge the effectiveness of agents in learning optimal policies. Nevertheless, we argue that relying solely on the empirical returns may obscure crucial insights into agent behaviour. In this paper, we explore the application of explainable AI (XAI) tools to gain profound insights into agent behaviour. We employ these diagnostics tools within the context of Level-Based Foraging and Multi-Robot Warehouse environments and apply them to a diverse array of MARL algorithms. We demonstrate how our diagnostics can enhance the interpretability and explainability of MARL systems, providing a better understanding of agent behaviour.

Title: Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report). (arXiv:2312.08520v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08520
Code URL: null
Copy Paste: [[2312.08520]] Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report)(http://arxiv.org/abs/2312.08520)
Summary:
Inspired by the success of contrastive learning, we systematically examine recommendation losses, including listwise (softmax), pairwise (BPR), and pointwise (MSE and CCL) losses. In this endeavor, we introduce InfoNCE+, an optimized generalization of InfoNCE with balance coefficients, and highlight its performance advantages, particularly when aligned with our new decoupled contrastive loss, MINE+. We also leverage debiased InfoNCE to debias pointwise recommendation loss (CCL) as Debiased CCL. Interestingly, our analysis reveals that linear models like iALS and EASE are inherently debiased. Empirical results demonstrates the effectiveness of MINE+ and Debiased-CCL.

Title: World Models via Policy-Guided Trajectory Diffusion. (arXiv:2312.08533v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08533
Code URL: https://github.com/marc-rigter/polygrad-world-models
Copy Paste: [[2312.08533]] World Models via Policy-Guided Trajectory Diffusion(http://arxiv.org/abs/2312.08533)
Summary:
World models are a powerful tool for developing intelligent agents. By predicting the outcome of a sequence of actions, world models enable policies to be optimised via on-policy reinforcement learning (RL) using synthetic data, i.e. in ``in imagination''. Existing world models are autoregressive, and interleave predicting the next state with sampling the next action from the policy. Thus, the prediction error inevitably compounds as the trajectory length grows. In this work, we propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories via a single pass through a diffusion model. Our approach, Policy-Guided Trajectory Diffusion (PolyGRAD), leverages a denoising model in addition to the gradient of the action distribution of the policy to diffuse a trajectory of initially random states and actions into an on-policy synthetic trajectory. We analyse the capabilities of our approach and demonstrate that it obtains competitive prediction errors to state-of-the-art autoregressive baselines. PolyGRAD also enables performant policies to be trained via on-policy RL in imagination. We believe that PolyGRAD introduces a promising paradigm for world modelling with many possible extensions to explore in future work.

Title: Adaptive Shortcut Debiasing for Online Continual Learning. (arXiv:2312.08677v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08677
Code URL: null
Copy Paste: [[2312.08677]] Adaptive Shortcut Debiasing for Online Continual Learning(http://arxiv.org/abs/2312.08677)
Summary:
We propose a novel framework DropTop that suppresses the shortcut bias in online continual learning (OCL) while being adaptive to the varying degree of the shortcut bias incurred by continuously changing environment. By the observed high-attention property of the shortcut bias, highly-activated features are considered candidates for debiasing. More importantly, resolving the limitation of the online environment where prior knowledge and auxiliary data are not ready, two novel techniques -- feature map fusion and adaptive intensity shifting -- enable us to automatically determine the appropriate level and proportion of the candidate shortcut features to be dropped. Extensive experiments on five benchmark datasets demonstrate that, when combined with various OCL algorithms, DropTop increases the average accuracy by up to 10.4% and decreases the forgetting by up to 63.2%.

Title: Learning Safety Constraints From Demonstration Using One-Class Decision Trees. (arXiv:2312.08837v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08837
Code URL: null
Copy Paste: [[2312.08837]] Learning Safety Constraints From Demonstration Using One-Class Decision Trees(http://arxiv.org/abs/2312.08837)
Summary:
The alignment of autonomous agents with human values is a pivotal challenge when deploying these agents within physical environments, where safety is an important concern. However, defining the agent's objective as a reward and/or cost function is inherently complex and prone to human errors. In response to this challenge, we present a novel approach that leverages one-class decision trees to facilitate learning from expert demonstrations. These decision trees provide a foundation for representing a set of constraints pertinent to the given environment as a logical formula in disjunctive normal form. The learned constraints are subsequently employed within an oracle constrained reinforcement learning framework, enabling the acquisition of a safe policy. In contrast to other methods, our approach offers an interpretable representation of the constraints, a vital feature in safety-critical environments. To validate the effectiveness of our proposed method, we conduct experiments in synthetic benchmark domains and a realistic driving environment.

Title: Diffusion-C: Unveiling the Generative Challenges of Diffusion Models through Corrupted Data. (arXiv:2312.08843v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08843
Code URL: null
Copy Paste: [[2312.08843]] Diffusion-C: Unveiling the Generative Challenges of Diffusion Models through Corrupted Data(http://arxiv.org/abs/2312.08843)
Summary:
In our contemporary academic inquiry, we present "Diffusion-C," a foundational methodology to analyze the generative restrictions of Diffusion Models, particularly those akin to GANs, DDPM, and DDIM. By employing input visual data that has been subjected to a myriad of corruption modalities and intensities, we elucidate the performance characteristics of those Diffusion Models. The noise component takes center stage in our analysis, hypothesized to be a pivotal element influencing the mechanics of deep learning systems. In our rigorous expedition utilizing Diffusion-C, we have discerned the following critical observations: (I) Within the milieu of generative models under the Diffusion taxonomy, DDPM emerges as a paragon, consistently exhibiting superior performance metrics. (II) Within the vast spectrum of corruption frameworks, the fog and fractal corruptions notably undermine the functional robustness of both DDPM and DDIM. (III) The vulnerability of Diffusion Models to these particular corruptions is significantly influenced by topological and statistical similarities, particularly concerning the alignment between mean and variance. This scholarly work highlights Diffusion-C's core understandings regarding the impacts of various corruptions, setting the stage for future research endeavors in the realm of generative models.

Title: Knowledge-Driven Modulation of Neural Networks with Attention Mechanism for Next Activity Prediction. (arXiv:2312.08847v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.08847
Code URL: https://github.com/jonghyeonk/kb-modulation
Copy Paste: [[2312.08847]] Knowledge-Driven Modulation of Neural Networks with Attention Mechanism for Next Activity Prediction(http://arxiv.org/abs/2312.08847)
Summary:
Predictive Process Monitoring (PPM) aims at leveraging historic process execution data to predict how ongoing executions will continue up to their completion. In recent years, PPM techniques for the prediction of the next activities have matured significantly, mainly thanks to the use of Neural Networks (NNs) as a predictor. While their performance is difficult to beat in the general case, there are specific situations where background process knowledge can be helpful. Such knowledge can be leveraged for improving the quality of predictions for exceptional process executions or when the process changes due to a concept drift. In this paper, we present a Symbolic[Neuro] system that leverages background knowledge expressed in terms of a procedural process model to offset the under-sampling in the training data. More specifically, we make predictions using NNs with attention mechanism, an emerging technology in the NN field. The system has been tested on several real-life logs showing an improvement in the performance of the prediction task.

Title: Weighted Ensemble Models Are Strong Continual Learners. (arXiv:2312.08977v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08977
Code URL: null
Copy Paste: [[2312.08977]] Weighted Ensemble Models Are Strong Continual Learners(http://arxiv.org/abs/2312.08977)
Summary:
In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). With an aim to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current task. This weight-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weight ensemble by leveraging the Fisher information of the weights of the model. Both the variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks.

Title: PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and Environments. (arXiv:2312.08755v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.08755
Code URL: null
Copy Paste: [[2312.08755]] PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and Environments(http://arxiv.org/abs/2312.08755)
Summary:
What makes a presupposition of an utterance -- information taken for granted by its speaker -- different from other pragmatic inferences such as an entailment is projectivity (e.g., the negative sentence the boy did not stop shedding tears presupposes the boy had shed tears before). The projectivity may vary depending on the combination of presupposition triggers and environments. However, prior natural language understanding studies fail to take it into account as they either use no human baseline or include only negation as an entailment-canceling environment to evaluate models' performance. The current study attempts to reconcile these issues. We introduce a new dataset, projectivity of presupposition (PROPRES, which includes 12k premise-hypothesis pairs crossing six triggers involving some lexical variety with five environments. Our human evaluation reveals that humans exhibit variable projectivity in some cases. However, the model evaluation shows that the best-performed model, DeBERTa, does not fully capture it. Our findings suggest that probing studies on pragmatic inferences should take extra care of the human judgment variability and the combination of linguistic items.

Title: Simplicial Representation Learning with Neural $k$-forms. (arXiv:2312.08515v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08515
Code URL: null
Copy Paste: [[2312.08515]] Simplicial Representation Learning with Neural $k$-forms(http://arxiv.org/abs/2312.08515)
Summary:
Geometric deep learning extends deep learning to incorporate information about the geometry and topology data, especially in complex domains like graphs. Despite the popularity of message passing in this field, it has limitations such as the need for graph rewiring, ambiguity in interpreting data, and over-smoothing. In this paper, we take a different approach, focusing on leveraging geometric information from simplicial complexes embedded in $\mathbb{R}^n$ using node coordinates. We use differential k-forms in \mathbb{R}^n to create representations of simplices, offering interpretability and geometric consistency without message passing. This approach also enables us to apply differential geometry tools and achieve universal approximation. Our method is efficient, versatile, and applicable to various input complexes, including graphs, simplicial complexes, and cell complexes. It outperforms existing message passing neural networks in harnessing information from geometrical graphs with node features serving as coordinates.

Title: Occupancy Detection Based on Electricity Consumption. (arXiv:2312.08535v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08535
Code URL: null
Copy Paste: [[2312.08535]] Occupancy Detection Based on Electricity Consumption(http://arxiv.org/abs/2312.08535)
Summary:
This article presents a new methodology for extracting intervals when a home is vacant from low-frequency electricity consumption data. The approach combines multiple algorithms, including change point detection, classification, period detection, and periodic spikes retrieval. It shows encouraging results on both simulated and real consumption curves. This approach offers practical insights for optimizing energy use and holds potential benefits for residential consumers and utility companies in terms of energy cost reduction and sustainability. Further research is needed to enhance its applicability in diverse settings and with larger datasets.

Title: Estimating calibration error under label shift without labels. (arXiv:2312.08586v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08586
Code URL: null
Copy Paste: [[2312.08586]] Estimating calibration error under label shift without labels(http://arxiv.org/abs/2312.08586)
Summary:
In the face of dataset shift, model calibration plays a pivotal role in ensuring the reliability of machine learning systems. Calibration error (CE) is an indicator of the alignment between the predicted probabilities and the classifier accuracy. While prior works have delved into the implications of dataset shift on calibration, existing CE estimators assume access to labels from the target domain, which are often unavailable in practice, i.e., when the model is deployed and used. This work addresses such challenging scenario, and proposes a novel CE estimator under label shift, which is characterized by changes in the marginal label distribution $p(Y)$, while keeping the conditional $p(X|Y)$ constant between the source and target distributions. Our contribution is an approach, which, by leveraging importance re-weighting of the labeled source distribution, provides consistent and asymptotically unbiased CE estimation with respect to the shifted target distribution. Empirical results across diverse real-world datasets, under various conditions and label-shift intensities, demonstrate the effectiveness and reliability of the proposed estimator.

Title: Automated detection of Zika and dengue in Aedes aegypti using neural spiking analysis. (arXiv:2312.08654v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08654
Code URL: null
Copy Paste: [[2312.08654]] Automated detection of Zika and dengue in Aedes aegypti using neural spiking analysis(http://arxiv.org/abs/2312.08654)
Summary:
Mosquito-borne diseases present considerable risks to the health of both animals and humans. Aedes aegypti mosquitoes are the primary vectors for numerous medically important viruses such as dengue, Zika, yellow fever, and chikungunya. To characterize this mosquito neural activity, it is essential to classify the generated electrical spikes. However, no open-source neural spike classification method is currently available for mosquitoes. Our work presented in this paper provides an innovative artificial intelligence-based method to classify the neural spikes in uninfected, dengue-infected, and Zika-infected mosquitoes. Aiming for outstanding performance, the method employs a fusion of normalization, feature importance, and dimension reduction for the preprocessing and combines convolutional neural network and extra gradient boosting (XGBoost) for classification. The method uses the electrical spiking activity data of mosquito neurons recorded by microelectrode array technology. We used data from 0, 1, 2, 3, and 7 days post-infection, containing over 15 million samples, to analyze the method's performance. The performance of the proposed method was evaluated using accuracy, precision, recall, and the F1 scores. The results obtained from the method highlight its remarkable performance in differentiating infected vs uninfected mosquito samples, achieving an average of 98.1%. The performance was also compared with 6 other machine learning algorithms to further assess the method's capability. The method outperformed all other machine learning algorithms' performance. Overall, this research serves as an efficient method to classify the neural spikes of Aedes aegypti mosquitoes and can assist in unraveling the complex interactions between pathogens and mosquitoes.

Title: Read Between the Layers: Leveraging Intra-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models. (arXiv:2312.08888v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.08888
Code URL: null
Copy Paste: [[2312.08888]] Read Between the Layers: Leveraging Intra-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models(http://arxiv.org/abs/2312.08888)
Summary:
We address the Continual Learning (CL) problem, where a model has to learn a sequence of tasks from non-stationary distributions while preserving prior knowledge as it encounters new experiences. With the advancement of foundation models, CL research has shifted focus from the initial learning-from-scratch paradigm to the use of generic features from large-scale pre-training. However, existing approaches to CL with pre-trained models only focus on separating the class-specific features from the final representation layer and neglect the power of intermediate representations that capture low- and mid-level features naturally more invariant to domain shifts. In this work, we propose LayUP, a new class-prototype-based approach to continual learning that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network. Our method is conceptually simple, does not require any replay buffer, and works out of the box with any foundation model. LayUP improves over the state-of-the-art on four of the seven class-incremental learning settings at a considerably reduced memory and computational footprint compared with the next best baseline. Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes far beyond their final embeddings.

language model

Title: LDM$^2$: A Large Decision Model Imitating Human Cognition with Dynamic Memory Enhancement. (arXiv:2312.08402v1 [cs.LG])

Title: Contractive error feedback for gradient compression. (arXiv:2312.08538v1 [cs.LG])

Title: Learning adaptive planning representations with natural language guidance. (arXiv:2312.08566v1 [cs.AI])

Title: Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models. (arXiv:2312.08762v1 [cs.AI])

Title: Evaluating Large Language Models for Health-related Queries with Presuppositions. (arXiv:2312.08800v1 [cs.CL])

Title: Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent. (arXiv:2312.08926v1 [cs.AI])

Title: LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers. (arXiv:2312.08958v1 [cs.LG])

Title: Unbiased organism-agnostic and highly sensitive signal peptide predictor with deep protein language model. (arXiv:2312.08987v1 [cs.AI])

Title: Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach. (arXiv:2312.08579v1 [cs.CL])

Title: Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention. (arXiv:2312.08618v1 [cs.CL])

Title: Dissecting vocabulary biases datasets through statistical testing and automated data augmentation for artifact mitigation in Natural Language Inference. (arXiv:2312.08747v1 [cs.CL])

Title: Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning. (arXiv:2312.08900v1 [cs.LG])

gpt

Title: Heterogeneous Graph Neural Architecture Search with GPT-4. (arXiv:2312.08680v1 [cs.AI])

Title: Detecting value-expressive text posts in Russian social media. (arXiv:2312.08968v1 [cs.CL])

Title: Beyond Accuracy: Automated De-Identification of Large Real-World Clinical Text Datasets. (arXiv:2312.08495v1 [cs.CL])

llm

Title: Beyond English: Evaluating LLMs for Arabic Grammatical Error Correction. (arXiv:2312.08400v1 [cs.CL])

Title: ChatSOS: LLM-based knowledge Q&A system for safety engineering. (arXiv:2312.08629v1 [cs.AI])

Title: TigerBot: An Open Multilingual Multitask LLM. (arXiv:2312.08688v1 [cs.CL])

Title: Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory. (arXiv:2312.08702v1 [cs.AI])

Title: Forbidden Facts: An Investigation of Competing Objectives in Llama-2. (arXiv:2312.08793v1 [cs.LG])

Title: Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning. (arXiv:2312.08901v1 [cs.CL])

Title: Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning. (arXiv:2312.08935v1 [cs.AI])

Title: ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks. (arXiv:2312.08583v1 [cs.CL])

Title: A Comparative Analysis of Fine-Tuned LLMs and Few-Shot Learning of LLMs for Financial Sentiment Analysis. (arXiv:2312.08725v1 [cs.LG])

long context

lora

Title: Artificial Intelligence and Human Geography. (arXiv:2312.08827v1 [cs.AI])

Title: Fair Active Learning in Low-Data Regimes. (arXiv:2312.08559v1 [cs.LG])

Title: A Cyber-Physical Architecture for Microgrids based on Deep learning and LORA Technology. (arXiv:2312.08818v1 [cs.LG])

hallucination

prompt

Title: Metacognition-Enhanced Few-Shot Prompting With Positive Reinforcement. (arXiv:2312.08642v1 [cs.CL])

Title: Labels Need Prompts Too Mask Matching for Natural Language Understanding Tasks. (arXiv:2312.08726v1 [cs.CL])

Title: MotherNet: A Foundational Hypernetwork for Tabular Classification. (arXiv:2312.08598v1 [cs.LG])

code

Title: ALGNet: Attention Light Graph Memory Network for Medical Recommendation System. (arXiv:2312.08377v1 [cs.AI])

Title: Earthfarseer: Versatile Spatio-Temporal Dynamical Systems Modeling in One Model. (arXiv:2312.08403v1 [cs.AI])

Title: Harmonics of Learning: Universal Fourier Features Emerge in Invariant Networks. (arXiv:2312.08550v1 [cs.LG])

Title: CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph. (arXiv:2312.08672v1 [cs.LG])

Title: Gradient Informed Proximal Policy Optimization. (arXiv:2312.08710v1 [cs.LG])

Title: Automated Process Planning Based on a Semantic Capability Model and SMT. (arXiv:2312.08801v1 [cs.AI])

Title: JPIS: A Joint Model for Profile-based Intent Detection and Slot Filling with Slot-to-Intent Attention. (arXiv:2312.08737v1 [cs.CL])

Title: Accelerating Meta-Learning by Sharing Gradients. (arXiv:2312.08398v1 [cs.LG])

Title: ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance. (arXiv:2312.08852v1 [cs.LG])

Title: Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems. (arXiv:2312.08884v1 [cs.LG])

Title: BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials. (arXiv:2312.08937v1 [cs.LG])

Title: EAT: Towards Long-Tailed Out-of-Distribution Detection. (arXiv:2312.08939v1 [cs.LG])

Title: Uncertainty in GNN Learning Evaluations: A Comparison Between Measures for Quantifying Randomness in GNN Community Detection. (arXiv:2312.09015v1 [cs.LG])

chat

retrieval augmented generation

rag

Title: Personalized Decision Supports based on Theory of Mind Modeling and Explainable Reinforcement Learning. (arXiv:2312.08397v1 [cs.LG])

Title: How much can change in a year? Revisiting Evaluation in Multi-Agent Reinforcement Learning. (arXiv:2312.08463v1 [cs.AI])

Title: On Diagnostics for Understanding Agent Training Behaviour in Cooperative MARL. (arXiv:2312.08468v1 [cs.AI])

Title: Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report). (arXiv:2312.08520v1 [cs.AI])

Title: World Models via Policy-Guided Trajectory Diffusion. (arXiv:2312.08533v1 [cs.LG])

Title: Adaptive Shortcut Debiasing for Online Continual Learning. (arXiv:2312.08677v1 [cs.LG])

Title: Learning Safety Constraints From Demonstration Using One-Class Decision Trees. (arXiv:2312.08837v1 [cs.LG])

Title: Diffusion-C: Unveiling the Generative Challenges of Diffusion Models through Corrupted Data. (arXiv:2312.08843v1 [cs.LG])

Title: Knowledge-Driven Modulation of Neural Networks with Attention Mechanism for Next Activity Prediction. (arXiv:2312.08847v1 [cs.AI])

Title: Weighted Ensemble Models Are Strong Continual Learners. (arXiv:2312.08977v1 [cs.LG])

Title: PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and Environments. (arXiv:2312.08755v1 [cs.CL])

Title: Simplicial Representation Learning with Neural $k$-forms. (arXiv:2312.08515v1 [cs.LG])

Title: Occupancy Detection Based on Electricity Consumption. (arXiv:2312.08535v1 [cs.LG])

Title: Estimating calibration error under label shift without labels. (arXiv:2312.08586v1 [cs.LG])

Title: Automated detection of Zika and dengue in Aedes aegypti using neural spiking analysis. (arXiv:2312.08654v1 [cs.LG])

Title: Read Between the Layers: Leveraging Intra-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models. (arXiv:2312.08888v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought