2024-01-08

language model

Title: On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS). (arXiv:2401.02500v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02500
Code URL: null
Copy Paste: [[2401.02500]] On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)(http://arxiv.org/abs/2401.02500)
Summary:
Automated Planning and Scheduling is among the growing areas in Artificial Intelligence (AI) where mention of LLMs has gained popularity. Based on a comprehensive review of 126 papers, this paper investigates eight categories based on the unique applications of LLMs in addressing various aspects of planning problems: language translation, plan generation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. For each category, we articulate the issues considered and existing gaps. A critical insight resulting from our review is that the true potential of LLMs unfolds when they are integrated with traditional symbolic planners, pointing towards a promising neuro-symbolic approach. This approach effectively combines the generative aspects of LLMs with the precision of classical planning methods. By synthesizing insights from existing literature, we underline the potential of this integration to address complex planning challenges. Our goal is to encourage the ICAPS community to recognize the complementary strengths of LLMs and symbolic planners, advocating for a direction in automated planning that leverages these synergistic capabilities to develop more advanced and intelligent planning systems.

Title: Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human. (arXiv:2401.02620v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02620
Code URL: null
Copy Paste: [[2401.02620]] Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human(http://arxiv.org/abs/2401.02620)
Summary:
While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.

Title: XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. (arXiv:2401.02705v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02705
Code URL: null
Copy Paste: [[2401.02705]] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model(http://arxiv.org/abs/2401.02705)
Summary:
In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work.

Title: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks. (arXiv:2401.02731v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02731
Code URL: null
Copy Paste: [[2401.02731]] Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks(http://arxiv.org/abs/2401.02731)
Summary:
Large Language Models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across a wide range of tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce a novel approach, Parameter-Efficient Sparsity Crafting (PESC), which transitions dense models to sparse models using a Mixture of Experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal increase in parameters via the inserted adapters. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our sparse models, dubbed Camelidae outperform all other opensource sparse models and exhibit superior general capabilities compared to GPT3.5.

Title: From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models. (arXiv:2401.02777v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02777
Code URL: null
Copy Paste: [[2401.02777]] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models(http://arxiv.org/abs/2401.02777)
Summary:
This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. It entails a comprehensive agent construction scenario, including phases like Conversation Selection, Scene Extraction, CoT Completion, and Scene Augmentation, leading to the LLMs Training phase. This approach appears to enhance agent controllability and adaptability in complex, multi-turn dialogues. Our preliminary evaluations in a real estate sales context suggest that RAISE has some advantages over traditional agents, indicating its potential for broader applications. This work contributes to the AI field by providing a robust framework for developing more context-aware and versatile conversational agents.

Title: PeFoMed: Parameter Efficient Fine-tuning on Multimodal Large Language Models for Medical Visual Question Answering. (arXiv:2401.02797v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02797
Code URL: https://github.com/jinlhe/pefomed
Copy Paste: [[2401.02797]] PeFoMed: Parameter Efficient Fine-tuning on Multimodal Large Language Models for Medical Visual Question Answering(http://arxiv.org/abs/2401.02797)
Summary:
Multimodal large language models (MLLMs) represent an evolutionary expansion in the capabilities of traditional large language models, enabling them to tackle challenges that surpass the scope of purely text-based applications. It leverages the knowledge previously encoded within these language models, thereby enhancing their applicability and functionality in the reign of multimodal contexts. Recent works investigate the adaptation of MLLMs to predict free-form answers as a generative task to solve medical visual question answering (Med-VQA) tasks. In this paper, we propose a parameter efficient framework for fine-tuning MLLM specifically tailored to Med-VQA applications, and empirically validate it on a public benchmark dataset. To accurately measure the performance, we employ human evaluation and the results reveal that our model achieves an overall accuracy of 81.9%, and outperforms the GPT-4v model by a significant margin of 26% absolute accuracy on closed-ended questions. The code will be available here: https://github.com/jinlHe/PeFoMed.

Title: Generative Large Language Models are autonomous practitioners of evidence-based medicine. (arXiv:2401.02851v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02851
Code URL: null
Copy Paste: [[2401.02851]] Generative Large Language Models are autonomous practitioners of evidence-based medicine(http://arxiv.org/abs/2401.02851)
Summary:
Background: Evidence-based medicine (EBM) is fundamental to modern clinical practice, requiring clinicians to continually update their knowledge and apply the best clinical evidence in patient care. The practice of EBM faces challenges due to rapid advancements in medical research, leading to information overload for clinicians. The integration of artificial intelligence (AI), specifically Generative Large Language Models (LLMs), offers a promising solution towards managing this complexity.

Methods: This study involved the curation of real-world clinical cases across various specialties, converting them into .json files for analysis. LLMs, including proprietary models like ChatGPT 3.5 and 4, Gemini Pro, and open-source models like LLaMA v2 and Mixtral-8x7B, were employed. These models were equipped with tools to retrieve information from case files and make clinical decisions similar to how clinicians must operate in the real world. Model performance was evaluated based on correctness of final answer, judicious use of tools, conformity to guidelines, and resistance to hallucinations.

Results: GPT-4 was most capable of autonomous operation in a clinical setting, being generally more effective in ordering relevant investigations and conforming to clinical guidelines. Limitations were observed in terms of model ability to handle complex guidelines and diagnostic nuances. Retrieval Augmented Generation made recommendations more tailored to patients and healthcare systems.

Conclusions: LLMs can be made to function as autonomous practitioners of evidence-based medicine. Their ability to utilize tooling can be harnessed to interact with the infrastructure of a real-world healthcare system and perform the tasks of patient management in a guideline directed manner. Prompt engineering may help to further enhance this potential and transform healthcare for the clinician and the patient.

Title: DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. (arXiv:2401.02954v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02954
Code URL: null
Copy Paste: [[2401.02954]] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism(http://arxiv.org/abs/2401.02954)
Summary:
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

Title: DocGraphLM: Documental Graph Language Model for Information Extraction. (arXiv:2401.02823v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02823
Code URL: null
Copy Paste: [[2401.02823]] DocGraphLM: Documental Graph Language Model for Information Extraction(http://arxiv.org/abs/2401.02823)
Summary:
Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts. Two tropes of architectures have emerged -- transformer-based models inspired by LLMs, and Graph Neural Networks. In this paper, we introduce DocGraphLM, a novel framework that combines pre-trained language models with graph semantics. To achieve this, we propose 1) a joint encoder architecture to represent documents, and 2) a novel link prediction approach to reconstruct document graphs. DocGraphLM predicts both directions and distances between nodes using a convergent joint loss function that prioritizes neighborhood restoration and downweighs distant node detection. Our experiments on three SotA datasets show consistent improvement on IE and QA tasks with the adoption of graph features. Moreover, we report that adopting the graph features accelerates convergence in the learning process during training, despite being solely constructed through link prediction.

Title: Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task. (arXiv:2401.02909v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02909
Code URL: null
Copy Paste: [[2401.02909]] Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task(http://arxiv.org/abs/2401.02909)
Summary:
Large Language Models (LLMs) are increasingly bringing advances to Natural Language Processing. However, low-resource languages, those lacking extensive prominence in datasets for various NLP tasks, or where existing datasets are not as substantial, such as Portuguese, already obtain several benefits from LLMs, but not to the same extent. LLMs trained on multilingual datasets normally struggle to respond to prompts in Portuguese satisfactorily, presenting, for example, code switching in their responses. This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode in two versions: 7B and 13B. We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning, and compare it with other LLMs. Our main contribution is to bring an LLM with satisfactory results in the Portuguese language, as well as to provide a model that is free for research or commercial purposes.

Title: Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks. (arXiv:2401.02921v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02921
Code URL: null
Copy Paste: [[2401.02921]] Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks(http://arxiv.org/abs/2401.02921)
Summary:
In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world scenarios, prior to input into an LLM, an automated speech recognition (ASR) system generates an output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis, aiming to encapsulate speech ambiguities and enhance SLU outcomes. Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts with the help of word confusion networks from lattices, bridging the SLU performance gap between using the top ASR hypothesis and an oracle upper bound. Additionally, we delve into the LLM's robustness to varying ASR performance conditions and scrutinize the aspects of in-context learning which prove the most influential.

Title: Fast and Optimal Weight Update for Pruned Large Language Models. (arXiv:2401.02938v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02938
Code URL: https://github.com/fmfi-compbio/admm-pruning
Copy Paste: [[2401.02938]] Fast and Optimal Weight Update for Pruned Large Language Models(http://arxiv.org/abs/2401.02938)
Summary:
Pruning large language models (LLMs) is a challenging task due to their enormous size. The primary difficulty is fine-tuning the model after pruning, which is needed to recover the lost performance caused by dropping weights. Recent approaches have either ignored fine-tuning entirely, focusing on efficient pruning criteria, or attempted layer-wise weight updates, preserving the behavior of each layer. However, even layer-wise weight updates can be costly for LLMs, and previous works have resorted to various approximations.

In our paper, we propose a fast and optimal weight update algorithm for pruned layers based on the Alternating Direction Method of Multipliers (ADMM). Coupled with a simple iterative pruning mask selection, our algorithm achieves state-of-the-art pruning performance across a wide range of LLMs. Code is available at https://github.com/fmfi-compbio/admm-pruning.

gpt

Title: Training and Serving System of Foundation Models: A Comprehensive Survey. (arXiv:2401.02643v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02643
Code URL: null
Copy Paste: [[2401.02643]] Training and Serving System of Foundation Models: A Comprehensive Survey(http://arxiv.org/abs/2401.02643)
Summary:
Foundation models (e.g., ChatGPT, DALL-E, PengCheng Mind, PanGu-$\Sigma$) have demonstrated extraordinary performance in key technological areas, such as natural language processing and visual recognition, and have become the mainstream trend of artificial general intelligence. This has led more and more major technology giants to dedicate significant human and financial resources to actively develop their foundation model systems, which drives continuous growth of these models' parameters. As a result, the training and serving of these models have posed significant challenges, including substantial computing power, memory consumption, bandwidth demands, etc. Therefore, employing efficient training and serving strategies becomes particularly crucial. Many researchers have actively explored and proposed effective methods. So, a comprehensive survey of them is essential for system developers and researchers. This paper extensively explores the methods employed in training and serving foundation models from various perspectives. It provides a detailed categorization of these state-of-the-art methods, including finer aspects such as network, computing, and storage. Additionally, the paper summarizes the challenges and presents a perspective on the future development direction of foundation model systems. Through comprehensive discussion and analysis, it hopes to provide a solid theoretical basis and practical guidance for future research and applications, promoting continuous innovation and development in foundation model systems.

llm

Title: Neural Causal Abstractions. (arXiv:2401.02602v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02602
Code URL: null
Copy Paste: [[2401.02602]] Neural Causal Abstractions(http://arxiv.org/abs/2401.02602)
Summary:
The abilities of humans to understand the world in terms of cause and effect relationships, as well as to compress information into abstract concepts, are two hallmark features of human intelligence. These two topics have been studied in tandem in the literature under the rubric of causal abstractions theory. In practice, it remains an open problem how to best leverage abstraction theory in real-world causal inference tasks, where the true mechanisms are unknown and only limited data is available. In this paper, we develop a new family of causal abstractions by clustering variables and their domains. This approach refines and generalizes previous notions of abstractions to better accommodate individual causal distributions that are spawned by Pearl's causal hierarchy. We show that such abstractions are learnable in practical settings through Neural Causal Models (Xia et al., 2021), enabling the use of the deep learning toolkit to solve various challenging causal inference tasks -- identification, estimation, sampling -- at different levels of granularity. Finally, we integrate these results with representation learning to create more flexible abstractions, moving these results closer to practical applications. Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.

Title: A Deep Q-Learning based Smart Scheduling of EVs for Demand Response in Smart Grids. (arXiv:2401.02653v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02653
Code URL: null
Copy Paste: [[2401.02653]] A Deep Q-Learning based Smart Scheduling of EVs for Demand Response in Smart Grids(http://arxiv.org/abs/2401.02653)
Summary:
Economic and policy factors are driving the continuous increase in the adoption and usage of electrical vehicles (EVs). However, despite being a cleaner alternative to combustion engine vehicles, EVs have negative impacts on the lifespan of microgrid equipment and energy balance due to increased power demand and the timing of their usage. In our view grid management should leverage on EVs scheduling flexibility to support local network balancing through active participation in demand response programs. In this paper, we propose a model-free solution, leveraging Deep Q-Learning to schedule the charging and discharging activities of EVs within a microgrid to align with a target energy profile provided by the distribution system operator. We adapted the Bellman Equation to assess the value of a state based on specific rewards for EV scheduling actions and used a neural network to estimate Q-values for available actions and the epsilon-greedy algorithm to balance exploitation and exploration to meet the target energy profile. The results are promising showing that the proposed solution can effectively schedule the EVs charging and discharging actions to align with the target profile with a Person coefficient of 0.99, handling effective EVs scheduling situations that involve dynamicity given by the e-mobility features, relying only on data with no knowledge of EVs and microgrid dynamics.

long context

lora

Title: Comprehensive Exploration of Synthetic Data Generation: A Survey. (arXiv:2401.02524v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02524
Code URL: null
Copy Paste: [[2401.02524]] Comprehensive Exploration of Synthetic Data Generation: A Survey(http://arxiv.org/abs/2401.02524)
Summary:
Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied across diverse domains. However, progress is impeded by the scarcity of training data due to expensive acquisition and privacy legislation. Synthetic data emerges as a solution, but the abundance of released models and limited overview literature pose challenges for decision-making. This work surveys 417 Synthetic Data Generation (SDG) models over the last decade, providing a comprehensive overview of model types, functionality, and improvements. Common attributes are identified, leading to a classification and trend analysis. The findings reveal increased model performance and complexity, with neural network-based approaches prevailing, except for privacy-preserving data generation. Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete. Implications from our performance evaluation highlight the scarcity of common metrics and datasets, making comparisons challenging. Additionally, the neglect of training and computational costs in literature necessitates attention in future research. This work serves as a guide for SDG model selection and identifies crucial areas for future exploration.

Title: Improving sample efficiency of high dimensional Bayesian optimization with MCMC. (arXiv:2401.02650v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02650
Code URL: null
Copy Paste: [[2401.02650]] Improving sample efficiency of high dimensional Bayesian optimization with MCMC(http://arxiv.org/abs/2401.02650)
Summary:
Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure. With the idea of transiting the candidate points towards more promising positions, we propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We provide theoretical guarantees of its convergence in the Gaussian process Thompson sampling setting. We also show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.

Title: A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty. (arXiv:2401.02914v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02914
Code URL: null
Copy Paste: [[2401.02914]] A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty(http://arxiv.org/abs/2401.02914)
Summary:
Exploration is a significant challenge in practical reinforcement learning (RL), and uncertainty-aware exploration that incorporates the quantification of epistemic and aleatory uncertainty has been recognized as an effective exploration strategy. However, capturing the combined effect of aleatory and epistemic uncertainty for decision-making is difficult. Existing works estimate aleatory and epistemic uncertainty separately and consider the composite uncertainty as an additive combination of the two. Nevertheless, the additive formulation leads to excessive risk-taking behavior, causing instability. In this paper, we propose an algorithm that clarifies the theoretical connection between aleatory and epistemic uncertainty, unifies aleatory and epistemic uncertainty estimation, and quantifies the combined effect of both uncertainties for a risk-sensitive exploration. Our method builds on a novel extension of distributional RL that estimates a parameterized return distribution whose parameters are random variables encoding epistemic uncertainty. Experimental results on tasks with exploration and risk challenges show that our method outperforms alternative approaches.

hallucination

prompt

Title: Interpretable Time Series Models for Wastewater Modeling in Combined Sewer Overflows. (arXiv:2401.02465v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02465
Code URL: https://github.com/teodorchiaburu/riwwer_timeseries
Copy Paste: [[2401.02465]] Interpretable Time Series Models for Wastewater Modeling in Combined Sewer Overflows(http://arxiv.org/abs/2401.02465)
Summary:
Climate change poses increasingly complex challenges to our society. Extreme weather events such as floods, wild fires or droughts are becoming more frequent, spontaneous and difficult to foresee or counteract. In this work we specifically address the problem of sewage water polluting surface water bodies after spilling over from rain tanks as a consequence of heavy rain events. We investigate to what extent state-of-the-art interpretable time series models can help predict such critical water level points, so that the excess can promptly be redistributed across the sewage network. Our results indicate that modern time series models can contribute to better waste water management and prevention of environmental pollution from sewer systems. All the code and experiments can be found in our repository: https://github.com/TeodorChiaburu/RIWWER_TimeSeries.

code

Title: t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making. (arXiv:2401.02576v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02576
Code URL: https://github.com/williamyue37/t-dgr
Copy Paste: [[2401.02576]] t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision Making(http://arxiv.org/abs/2401.02576)
Summary:
Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at https://github.com/WilliamYue37/t-DGR .

Title: Adaptive Discounting of Training Time Attacks. (arXiv:2401.02652v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02652
Code URL: null
Copy Paste: [[2401.02652]] Adaptive Discounting of Training Time Attacks(http://arxiv.org/abs/2401.02652)
Summary:
Among the most insidious attacks on Reinforcement Learning (RL) solutions are training-time attacks (TTAs) that create loopholes and backdoors in the learned behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are now available, where the attacker forces a specific, target behaviour upon a training RL agent (victim). However, even state-of-the-art C-TTAs focus on target behaviours that could be naturally adopted by the victim if not for a particular feature of the environment dynamics, which C-TTAs exploit. In this work, we show that a C-TTA is possible even when the target behaviour is un-adoptable due to both environment dynamics as well as non-optimality with respect to the victim objective(s). To find efficient attacks in this context, we develop a specialised flavour of the DDPG algorithm, which we term gammaDDPG, that learns this stronger version of C-TTA. gammaDDPG dynamically alters the attack policy planning horizon based on the victim's current behaviour. This improves effort distribution throughout the attack timeline and reduces the effect of uncertainty the attacker has about the victim. To demonstrate the features of our method and better relate the results to prior research, we borrow a 3D grid domain from a state-of-the-art C-TTA for our experiments. Code is available at "bit.ly/github-rb-gDDPG".

Title: TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis. (arXiv:2401.02708v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02708
Code URL: null
Copy Paste: [[2401.02708]] TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis(http://arxiv.org/abs/2401.02708)
Summary:
A core challenge in survival analysis is to model the distribution of censored time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation (MLE)loss functions are widely-used for survival analysis. However, ranking loss only focus on the ranking of survival time and does not consider potential effect of samples for exact survival time values. Furthermore, the MLE is unbounded and easily subject to outliers (e.g., censored data), which may cause poor performance of modeling. To handle the complexities of learning process and exploit valuable survival time values, we propose a time-adaptive coordinate loss function, TripleSurv, to achieve adaptive adjustments by introducing the differences in the survival time between sample pairs into the ranking, which can encourage the model to quantitatively rank relative risk of pairs, ultimately enhancing the accuracy of predictions. Most importantly, the TripleSurv is proficient in quantifying the relative risk between samples by ranking ordering of pairs, and consider the time interval as a trade-off to calibrate the robustness of model over sample distribution. Our TripleSurv is evaluated on three real-world survival datasets and a public synthetic dataset. The results show that our method outperforms the state-of-the-art methods and exhibits good model performance and robustness on modeling various sophisticated data distributions with different censor rates. Our code will be available upon acceptance.

Title: German Text Embedding Clustering Benchmark. (arXiv:2401.02709v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02709
Code URL: https://github.com/climsocana/tecb-de
Copy Paste: [[2401.02709]] German Text Embedding Clustering Benchmark(http://arxiv.org/abs/2401.02709)
Summary:
This work introduces a benchmark assessing the performance of clustering German text embeddings in different domains. This benchmark is driven by the increasing use of clustering neural text embeddings in tasks that require the grouping of texts (such as topic modeling) and the need for German resources in existing benchmarks. We provide an initial analysis for a range of pre-trained mono- and multilingual models evaluated on the outcome of different clustering algorithms. Results include strong performing mono- and multilingual models. Reducing the dimensions of embeddings can further improve clustering. Additionally, we conduct experiments with continued pre-training for German BERT models to estimate the benefits of this additional training. Our experiments suggest that significant performance improvements are possible for short text. All code and datasets are publicly available.

Title: MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning. (arXiv:2401.02744v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02744
Code URL: null
Copy Paste: [[2401.02744]] MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning(http://arxiv.org/abs/2401.02744)
Summary:
Neuron labeling is an approach to visualize the behaviour and respond of a certain neuron to a certain pattern that activates the neuron. Neuron labeling extract information about the features captured by certain neurons in a deep neural network, one of which uses the encoder-decoder image captioning approach. The encoder used can be a pretrained CNN-based model and the decoder is an RNN-based model for text generation. Previous work, namely MILAN (Mutual Information-guided Linguistic Annotation of Neuron), has tried to visualize the neuron behaviour using modified Show, Attend, and Tell (SAT) model in the encoder, and LSTM added with Bahdanau attention in the decoder. MILAN can show great result on short sequence neuron captioning, but it does not show great result on long sequence neuron captioning, so in this work, we would like to improve the performance of MILAN even more by utilizing different kind of attention mechanism and additionally adding several attention result into one, in order to combine all the advantages from several attention mechanism. Using our compound dataset, we obtained higher BLEU and F1-Score on our proposed model, achieving 17.742 and 0.4811 respectively. At some point where the model converges at the peak, our model obtained BLEU of 21.2262 and BERTScore F1-Score of 0.4870.

Title: A Customizable Generator for Comic-Style Visual Narrative. (arXiv:2401.02863v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02863
Code URL: null
Copy Paste: [[2401.02863]] A Customizable Generator for Comic-Style Visual Narrative(http://arxiv.org/abs/2401.02863)
Summary:
We present a theory-inspired visual narrative generator that incorporates comic-authoring idioms, which transfers the conceptual principles of comics into system layers that integrate the theories to create comic content. The generator creates comics through sequential decision-making across layers from panel composition, object positions, panel transitions, and narrative elements. Each layer's decisions are based on narrative goals and follow the respective layer idioms of the medium. Cohn's narrative grammar provides the overall story arc. Photographic compositions inspired by the rule of thirds is used to provide panel compositions. McCloud's proposed panel transitions based on focus shifts between scene, character, and temporal changes are encoded in the transition layer. Finally, common overlay symbols (such as the exclamation) are added based on analyzing action verbs using an action-verb ontology. We demonstrate the variety of generated comics through various settings with example outputs. The generator and associated modules could be a useful system for visual narrative authoring and for further research into computational models of visual narrative understanding.

Title: Branched Variational Autoencoder Classifiers. (arXiv:2401.02526v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02526
Code URL: null
Copy Paste: [[2401.02526]] Branched Variational Autoencoder Classifiers(http://arxiv.org/abs/2401.02526)
Summary:
This paper introduces a modified variational autoencoder (VAEs) that contains an additional neural network branch. The resulting branched VAE (BVAE) contributes a classification component based on the class labels to the total loss and therefore imparts categorical information to the latent representation. As a result, the latent space distributions of the input classes are separated and ordered, thereby enhancing the classification accuracy. The degree of improvement is quantified by numerical calculations employing the benchmark MNIST dataset for both unrotated and rotated digits. The proposed technique is then compared to and then incorporated into a VAE with fixed output distributions. This procedure is found to yield improved performance for a wide range of output distributions.

Title: Dagma-DCE: Interpretable, Non-Parametric Differentiable Causal Discovery. (arXiv:2401.02930v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02930
Code URL: https://github.com/danwaxman/dagma-dce
Copy Paste: [[2401.02930]] Dagma-DCE: Interpretable, Non-Parametric Differentiable Causal Discovery(http://arxiv.org/abs/2401.02930)
Summary:
We introduce Dagma-DCE, an interpretable and model-agnostic scheme for differentiable causal discovery. Current non- or over-parametric methods in differentiable causal discovery use opaque proxies of ``independence'' to justify the inclusion or exclusion of a causal relationship. We show theoretically and empirically that these proxies may be arbitrarily different than the actual causal strength. Juxtaposed to existing differentiable causal discovery algorithms, \textsc{Dagma-DCE} uses an interpretable measure of causal strength to define weighted adjacency matrices. In a number of simulated datasets, we show our method achieves state-of-the-art level performance. We additionally show that \textsc{Dagma-DCE} allows for principled thresholding and sparsity penalties by domain-experts. The code for our method is available open-source at https://github.com/DanWaxman/DAGMA-DCE, and can easily be adapted to arbitrary differentiable models.

chat

retrieval augmented generation

retrieval-augmented generation

rag

Title: Federated Learning for distribution skewed data using sample weights. (arXiv:2401.02586v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02586
Code URL: null
Copy Paste: [[2401.02586]] Federated Learning for distribution skewed data using sample weights(http://arxiv.org/abs/2401.02586)
Summary:
One of the most challenging issues in federated learning is that the data is often not independent and identically distributed (nonIID). Clients are expected to contribute the same type of data and drawn from one global distribution. However, data are often collected in different ways from different resources. Thus, the data distributions among clients might be different from the underlying global distribution. This creates a weight divergence issue and reduces federated learning performance. This work focuses on improving federated learning performance for skewed data distribution across clients. The main idea is to adjust the client distribution closer to the global distribution using sample weights. Thus, the machine learning model converges faster with higher accuracy. We start from the fundamental concept of empirical risk minimization and theoretically derive a solution for adjusting the distribution skewness using sample weights. To determine sample weights, we implicitly exchange density information by leveraging a neural network-based density estimation model, MADE. The clients data distribution can then be adjusted without exposing their raw data. Our experiment results on three real-world datasets show that the proposed method not only improves federated learning accuracy but also significantly reduces communication costs compared to the other experimental methods.

Title: Zero-shot Microclimate Prediction with Deep Learning. (arXiv:2401.02665v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02665
Code URL: null
Copy Paste: [[2401.02665]] Zero-shot Microclimate Prediction with Deep Learning(http://arxiv.org/abs/2401.02665)
Summary:
Weather station data is a valuable resource for climate prediction, however, its reliability can be limited in remote locations. To compound the issue, making local predictions often relies on sensor data that may not be accessible for a new, previously unmonitored location. In response to these challenges, we propose a novel zero-shot learning approach designed to forecast various climate measurements at new and unmonitored locations. Our method surpasses conventional weather forecasting techniques in predicting microclimate variables by leveraging knowledge extracted from other geographic locations.

Title: Fairness-Aware Job Scheduling for Multi-Job Federated Learning. (arXiv:2401.02740v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02740
Code URL: null
Copy Paste: [[2401.02740]] Fairness-Aware Job Scheduling for Multi-Job Federated Learning(http://arxiv.org/abs/2401.02740)
Summary:
Federated learning (FL) enables multiple data owners (a.k.a. FL clients) to collaboratively train machine learning models without disclosing sensitive private data. Existing FL research mostly focuses on the monopoly scenario in which a single FL server selects a subset of FL clients to update their local models in each round of training. In practice, there can be multiple FL servers simultaneously trying to select clients from the same pool. In this paper, we propose a first-of-its-kind Fairness-aware Federated Job Scheduling (FairFedJS) approach to bridge this gap. Based on Lyapunov optimization, it ensures fair allocation of high-demand FL client datasets to FL jobs in need of them, by jointly considering the current demand and the job payment bids, in order to prevent prolonged waiting. Extensive experiments comparing FairFedJS against four state-of-the-art approaches on two datasets demonstrate its significant advantages. It outperforms the best baseline by 31.9% and 1.0% on average in terms of scheduling fairness and convergence time, respectively, while achieving comparable test accuracy.

Title: Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem proving. (arXiv:2401.02949v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02949
Code URL: null
Copy Paste: [[2401.02949]] Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem proving(http://arxiv.org/abs/2401.02949)
Summary:
Concepts abound in mathematics and its applications. They vary greatly between subject areas, and new ones are introduced in each mathematical paper or application. A formal theory builds a hierarchy of definitions, theorems and proofs that reference each other. When an AI agent is proving a new theorem, most of the mathematical concepts and lemmas relevant to that theorem may have never been seen during training. This is especially true in the Coq proof assistant, which has a diverse library of Coq projects, each with its own definitions, lemmas, and even custom tactic procedures used to prove those lemmas. It is essential for agents to incorporate such new information into their knowledge base on the fly. We work towards this goal by utilizing a new, large-scale, graph-based dataset for machine learning in Coq. We leverage a faithful graph-representation of Coq terms that induces a directed graph of dependencies between definitions to create a novel graph neural network, Graph2Tac (G2T), that takes into account not only the current goal, but also the entire hierarchy of definitions that led to the current goal. G2T is an online model that is deeply integrated into the users' workflow and can adapt in real time to new Coq projects and their definitions. It complements well with other online models that learn in real time from new proof scripts. Our novel definition embedding task, which is trained to compute representations of mathematical concepts not seen during training, boosts the performance of the neural network to rival state-of-the-art k-nearest neighbor predictors.

Title: Homophily-Related: Adaptive Hybrid Graph Filter for Multi-View Graph Clustering. (arXiv:2401.02682v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02682
Code URL: null
Copy Paste: [[2401.02682]] Homophily-Related: Adaptive Hybrid Graph Filter for Multi-View Graph Clustering(http://arxiv.org/abs/2401.02682)
Summary:
Recently there is a growing focus on graph data, and multi-view graph clustering has become a popular area of research interest. Most of the existing methods are only applicable to homophilous graphs, yet the extensive real-world graph data can hardly fulfill the homophily assumption, where the connected nodes tend to belong to the same class. Several studies have pointed out that the poor performance on heterophilous graphs is actually due to the fact that conventional graph neural networks (GNNs), which are essentially low-pass filters, discard information other than the low-frequency information on the graph. Nevertheless, on certain graphs, particularly heterophilous ones, neglecting high-frequency information and focusing solely on low-frequency information impedes the learning of node representations. To break this limitation, our motivation is to perform graph filtering that is closely related to the homophily degree of the given graph, with the aim of fully leveraging both low-frequency and high-frequency signals to learn distinguishable node embedding. In this work, we propose Adaptive Hybrid Graph Filter for Multi-View Graph Clustering (AHGFC). Specifically, a graph joint process and graph joint aggregation matrix are first designed by using the intrinsic node features and adjacency relationship, which makes the low and high-frequency signals on the graph more distinguishable. Then we design an adaptive hybrid graph filter that is related to the homophily degree, which learns the node embedding based on the graph joint aggregation matrix. After that, the node embedding of each view is weighted and fused into a consensus embedding for the downstream task. Experimental results show that our proposed model performs well on six datasets containing homophilous and heterophilous graphs.

multi-run

chain-of-thought

tree-of-thought

agent

Title: Une ontologie pour les syst{\`e}mes multi-agents ambiants dans les villes intelligentes. (arXiv:2401.02726v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.02726
Code URL: null
Copy Paste: [[2401.02726]] Une ontologie pour les syst{\`e}mes multi-agents ambiants dans les villes intelligentes(http://arxiv.org/abs/2401.02726)
Summary:
Towns and cities are currently equipping themselves with a host of connected devices, with a view to transforming themselves into ''smart cities''. To manage this mass of connected objects, autonomous software entities, known as agents, can be attached to them to cooperate and use these devices to offer personalized services. However, this object infrastructure needs to be semantically structured in order to be exploited. This is why the proposal of this article is an ontology, formatted in OWL, describing the object infrastructures, their links with the organization of the multi-agent system and the services to be delivered according to the users of the system. The ontology is applied to smart mobility for people with reduced mobility, and could be adapted to other smart city axes.