2023-12-21

language model

Title: When Parameter-efficient Tuning Meets General-purpose Vision-language Models. (arXiv:2312.12458v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12458
Code URL: null
Copy Paste: [[2312.12458]] When Parameter-efficient Tuning Meets General-purpose Vision-language Models(http://arxiv.org/abs/2312.12458)
Summary:
Instruction tuning has shown promising potential for developing general-purpose AI capabilities by using large-scale pre-trained models and boosts growing research to integrate multimodal information for creative applications. However, existing works still face two main limitations: the high training costs and heavy computing resource dependence of full model fine-tuning, and the lack of semantic information in instructions, which hinders multimodal alignment. Addressing these challenges, this paper proposes a novel approach to utilize Parameter-Efficient Tuning for generAl-purpose vision-Language models, namely PETAL. PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique, which significantly reduces the training costs and reliance on heavy computing resources. Furthermore, PETAL enhances the semantic depth of instructions in two innovative ways: 1) by introducing adaptive instruction mixture-of-experts(MOEs), and 2) by fortifying the score-based linkage between parameter-efficient tuning and mutual information. Our extensive experiments across five multimodal downstream benchmarks reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness. Additionally, our approach demonstrates remarkable advantages in few-shot settings, backed by comprehensive visualization analyses. Our source code is available at: https://github. com/melonking32/PETAL.

Title: Towards Better Serialization of Tabular Data for Few-shot Classification. (arXiv:2312.12464v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12464
Code URL: null
Copy Paste: [[2312.12464]] Towards Better Serialization of Tabular Data for Few-shot Classification(http://arxiv.org/abs/2312.12464)
Summary:
We present a study on the integration of Large Language Models (LLMs) in tabular data classification, emphasizing an efficient framework. Building upon existing work done in TabLLM (arXiv:2210.10723), we introduce three novel serialization techniques, including the standout LaTeX serialization method. This method significantly boosts the performance of LLMs in processing domain-specific datasets, Our method stands out for its memory efficiency and ability to fully utilize complex data structures. Through extensive experimentation, including various serialization approaches like feature combination and importance, we demonstrate our work's superiority in accuracy and efficiency over traditional models.

Title: A Performance Evaluation of a Quantized Large Language Model on Various Smartphones. (arXiv:2312.12472v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12472
Code URL: null
Copy Paste: [[2312.12472]] A Performance Evaluation of a Quantized Large Language Model on Various Smartphones(http://arxiv.org/abs/2312.12472)
Summary:
This paper explores the feasibility and performance of on-device large language model (LLM) inference on various Apple iPhone models. Amidst the rapid evolution of generative AI, on-device LLMs offer solutions to privacy, security, and connectivity challenges inherent in cloud-based models. Leveraging existing literature on running multi-billion parameter LLMs on resource-limited devices, our study examines the thermal effects and interaction speeds of a high-performing LLM across different smartphone generations. We present real-world performance results, providing insights into on-device inference capabilities.

Title: Mini-GPTs: Efficient Large Language Models through Contextual Pruning. (arXiv:2312.12682v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12682
Code URL: null
Copy Paste: [[2312.12682]] Mini-GPTs: Efficient Large Language Models through Contextual Pruning(http://arxiv.org/abs/2312.12682)
Summary:
In AI research, the optimization of Large Language Models (LLMs) remains a significant challenge, crucial for advancing the field's practical applications and sustainability. Building upon the foundational work of Professor Song Han's lab at MIT, this paper introduces a novel approach in developing Mini-GPTs via contextual pruning. Our methodology strategically prunes the computational architecture of traditional LLMs, like Phi-1.5, focusing on retaining core functionalities while drastically reducing model sizes. We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles. The results underscore the efficiency and effectiveness of contextual pruning, not merely as a theoretical concept but as a practical tool in developing domain-specific, resource-efficient LLMs. Contextual pruning is a promising method for building domain-specific LLMs, and this research is a building block towards future development with more hardware compute, refined fine-tuning, and quantization.

Title: ALMANACS: A Simulatability Benchmark for Language Model Explainability. (arXiv:2312.12747v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12747
Code URL: https://github.com/edmundmills/almanacs
Copy Paste: [[2312.12747]] ALMANACS: A Simulatability Benchmark for Language Model Explainability(http://arxiv.org/abs/2312.12747)
Summary:
How do we measure the efficacy of language model explainability methods? While many explainability methods have been developed, they are typically evaluated on bespoke tasks, preventing an apples-to-apples comparison. To help fill this gap, we present ALMANACS, a language model explainability benchmark. ALMANACS scores explainability methods on simulatability, i.e., how well the explanations improve behavior prediction on new inputs. The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations. By using another language model to predict behavior based on the explanations, ALMANACS is a fully automated benchmark. We use ALMANACS to evaluate counterfactuals, rationalizations, attention, and Integrated Gradients explanations. Our results are sobering: when averaged across all topics, no explanation method outperforms the explanation-free control. We conclude that despite modest successes in prior work, developing an explanation method that aids simulatability in ALMANACS remains an open challenge.

Title: MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models. (arXiv:2312.12806v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12806
Code URL: null
Copy Paste: [[2312.12806]] MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models(http://arxiv.org/abs/2312.12806)
Summary:
The emergence of various medical large language models (LLMs) in the medical domain has highlighted the need for unified evaluation standards, as manual evaluation of LLMs proves to be time-consuming and labor-intensive. To address this issue, we introduce MedBench, a comprehensive benchmark for the Chinese medical domain, comprising 40,041 questions sourced from authentic examination exercises and medical reports of diverse branches of medicine. In particular, this benchmark is composed of four key components: the Chinese Medical Licensing Examination, the Resident Standardization Training Examination, the Doctor In-Charge Qualification Examination, and real-world clinic cases encompassing examinations, diagnoses, and treatments. MedBench replicates the educational progression and clinical practice experiences of doctors in Mainland China, thereby establishing itself as a credible benchmark for assessing the mastery of knowledge and reasoning abilities in medical language learning models. We perform extensive experiments and conduct an in-depth analysis from diverse perspectives, which culminate in the following findings: (1) Chinese medical LLMs underperform on this benchmark, highlighting the need for significant advances in clinical knowledge and diagnostic precision. (2) Several general-domain LLMs surprisingly possess considerable medical knowledge. These findings elucidate both the capabilities and limitations of LLMs within the context of MedBench, with the ultimate goal of aiding the medical research community.

Title: Language Resources for Dutch Large Language Modelling. (arXiv:2312.12852v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12852
Code URL: https://github.com/bramvanroy/dutch-instruction-datasets
Copy Paste: [[2312.12852]] Language Resources for Dutch Large Language Modelling(http://arxiv.org/abs/2312.12852)
Summary:
Despite the rapid expansion of types of large language models, there remains a notable gap in models specifically designed for the Dutch language. This gap is not only a shortage in terms of pretrained Dutch models but also in terms of data, and benchmarks and leaderboards. This work provides a small step to improve the situation. First, we introduce two fine-tuned variants of the Llama 2 13B model. We first fine-tuned Llama 2 using Dutch-specific web-crawled data and subsequently refined this model further on multiple synthetic instruction and chat datasets. These datasets as well as the model weights are made available. In addition, we provide a leaderboard to keep track of the performance of (Dutch) models on a number of generation tasks, and we include results of a number of state-of-the-art models, including our own. Finally we provide a critical conclusion on what we believe is needed to push forward Dutch language models and the whole eco-system around the models.

Title: HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments. (arXiv:2312.13193v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13193
Code URL: null
Copy Paste: [[2312.13193]] HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments(http://arxiv.org/abs/2312.13193)
Summary:
Warning: This paper contains examples of the language that some people may find offensive.

Detecting and reducing hateful, abusive, offensive comments is a critical and challenging task on social media. Moreover, few studies aim to mitigate the intensity of hate speech. While studies have shown that context-level semantics are crucial for detecting hateful comments, most of this research focuses on English due to the ample datasets available. In contrast, low-resource languages, like Indian languages, remain under-researched because of limited datasets. Contrary to hate speech detection, hate intensity reduction remains unexplored in high-resource and low-resource languages. In this paper, we propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts. First, we fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model. Then, we identified the contextual hateful words. Identification of such hateful words is justified through the state-of-the-art explainable learning model, i.e., Integrated Gradient (IG). Lastly, the Masked Language Modeling (MLM) model has been employed to capture domain-specific nuances to reduce hate intensity. We masked the 50\% hateful words of the comments identified as hateful and predicted the alternative words for these masked terms to generate convincing sentences. An optimal replacement for the original hate comments from the feasible sentences is preferred. Extensive experiments have been conducted on several recent datasets using automatic metric-based evaluation (BERTScore) and thorough human evaluation. To enhance the faithfulness in human evaluation, we arranged a group of three human annotators with varied expertise.

Title: Learning and Forgetting Unsafe Examples in Large Language Models. (arXiv:2312.12736v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12736
Code URL: null
Copy Paste: [[2312.12736]] Learning and Forgetting Unsafe Examples in Large Language Models(http://arxiv.org/abs/2312.12736)
Summary:
As the number of large language models (LLMs) released to the public grows, there is a pressing need to understand the safety implications associated with these models learning from third-party custom finetuning data. We explore the behavior of LLMs finetuned on noisy custom data containing unsafe content, represented by datasets that contain biases, toxicity, and harmfulness, finding that while aligned LLMs can readily learn this unsafe content, they also tend to forget it more significantly than other examples when subsequently finetuned on safer content. Drawing inspiration from the discrepancies in forgetting, we introduce the "ForgetFilter" algorithm, which filters unsafe data based on how strong the model's forgetting signal is for that data. We demonstrate that the ForgetFilter algorithm ensures safety in customized finetuning without compromising downstream task performance, unlike sequential safety finetuning. ForgetFilter outperforms alternative strategies like replay and moral self-correction in curbing LLMs' ability to assimilate unsafe content during custom finetuning, e.g. 75% lower than not applying any safety measures and 62% lower than using self-correction in toxicity score.

Title: Fine-tuning Large Language Models for Adaptive Machine Translation. (arXiv:2312.12740v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12740
Code URL: null
Copy Paste: [[2312.12740]] Fine-tuning Large Language Models for Adaptive Machine Translation(http://arxiv.org/abs/2312.12740)
Summary:
This paper presents the outcomes of fine-tuning Mistral 7B, a general-purpose large language model (LLM), for adaptive machine translation (MT). The fine-tuning process involves utilising a combination of zero-shot and one-shot translation prompts within the medical domain. The primary objective is to enhance real-time adaptive MT capabilities of Mistral 7B, enabling it to adapt translations to the required domain at inference time. The results, particularly for Spanish-to-English MT, showcase the efficacy of the fine-tuned model, demonstrating quality improvements in both zero-shot and one-shot translation scenarios, surpassing Mistral 7B's baseline performance. Notably, the fine-tuned Mistral outperforms ChatGPT "gpt-3.5-turbo" in zero-shot translation while achieving comparable one-shot translation quality. Moreover, the zero-shot translation of the fine-tuned Mistral matches NLLB 3.3B's performance, and its one-shot translation quality surpasses that of NLLB 3.3B. These findings emphasise the significance of fine-tuning efficient LLMs like Mistral 7B to yield high-quality zero-shot translations comparable to task-oriented models like NLLB 3.3B. Additionally, the adaptive gains achieved in one-shot translation are comparable to those of commercial LLMs such as ChatGPT. Our experiments demonstrate that, with a relatively small dataset of 20,000 segments that incorporate a mix of zero-shot and one-shot prompts, fine-tuning significantly enhances Mistral's in-context learning ability, especially for real-time adaptive MT.

Title: CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models. (arXiv:2312.12853v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12853
Code URL: null
Copy Paste: [[2312.12853]] CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models(http://arxiv.org/abs/2312.12853)
Summary:
As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense knowledge in everyday conversations into three dimensions: entity, event, and social interaction. For easy and consistent annotation, we standardize the form of commonsense knowledge annotation in open-domain dialogues as "domain: slot = value". A total of 9 domains and 37 slots are defined to capture diverse commonsense knowledge. With these pre-defined domains and slots, we collect 76,787 commonsense knowledge annotations from 19,700 dialogues through crowdsourcing. To evaluate and enhance the commonsense reasoning capability for LLMs on the curated dataset, we establish a series of dialogue-level reasoning and detection tasks, including commonsense knowledge filling, commonsense knowledge generation, commonsense conflict phrase detection, domain identification, slot identification, and event causal inference. A wide variety of existing open-source Chinese LLMs are evaluated with these tasks on our dataset. Experimental results demonstrate that these models are not competent to predict CORECODE's plentiful reasoning content, and even ChatGPT could only achieve 0.275 and 0.084 accuracy on the domain identification and slot identification tasks under the zero-shot setting. We release the data and codes of CORECODE at https://github.com/danshi777/CORECODE to promote commonsense reasoning evaluation and study of LLMs in the context of daily conversations.

Title: Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors. (arXiv:2312.12918v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12918
Code URL: https://github.com/yfzhang114/robustness-detection
Copy Paste: [[2312.12918]] Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors(http://arxiv.org/abs/2312.12918)
Summary:
To combat the potential misuse of Natural Language Generation (NLG) technology, a variety of algorithms have been developed for the detection of AI-generated texts. Traditionally, this task is treated as a binary classification problem. Although supervised learning has demonstrated promising results, acquiring labeled data for detection purposes poses real-world challenges and the risk of overfitting. In an effort to address these issues, we delve into the realm of zero-shot machine-generated text detection. Existing zero-shot detectors, typically designed for specific tasks or topics, often assume uniform testing scenarios, limiting their practicality. In our research, we explore various advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways. In empirical studies, we uncover a significant correlation between topics and detection performance. Secondly, we delve into the influence of topic shifts on zero-shot detectors. These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.

Title: Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest. (arXiv:2312.12989v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12989
Code URL: null
Copy Paste: [[2312.12989]] Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest(http://arxiv.org/abs/2312.12989)
Summary:
Automated knowledge curation for biomedical ontologies is key to ensure that they remain comprehensive, high-quality and up-to-date. In the era of foundational language models, this study compares and analyzes three NLP paradigms for curation tasks: in-context learning (ICL), fine-tuning (FT), and supervised learning (ML). Using the Chemical Entities of Biological Interest (ChEBI) database as a model ontology, three curation tasks were devised. For ICL, three prompting strategies were employed with GPT-4, GPT-3.5, BioGPT. PubmedBERT was chosen for the FT paradigm. For ML, six embedding models were utilized for training Random Forest and Long-Short Term Memory models. Five setups were designed to assess ML and FT model performance across different data availability scenarios.Datasets for curation tasks included: task 1 (620,386), task 2 (611,430), and task 3 (617,381), maintaining a 50:50 positive versus negative ratio. For ICL models, GPT-4 achieved best accuracy scores of 0.916, 0.766 and 0.874 for tasks 1-3 respectively. In a direct comparison, ML (trained on ~260,000 triples) outperformed ICL in accuracy across all tasks. (accuracy differences: +.11, +.22 and +.17). Fine-tuned PubmedBERT performed similarly to leading ML models in tasks 1 & 2 (F1 differences: -.014 and +.002), but worse in task 3 (-.048). Simulations revealed performance declines in both ML and FT models with smaller and higher imbalanced training data. where ICL (particularly GPT-4) excelled in tasks 1 & 3. GPT-4 excelled in tasks 1 and 3 with less than 6,000 triples, surpassing ML/FT. ICL underperformed ML/FT in task 2.ICL-augmented foundation models can be good assistants for knowledge curation with correct prompting, however, not making ML and FT paradigms obsolete. The latter two require task-specific data to beat ICL. In such cases, ML relies on small pretrained embeddings, minimizing computational demands.

Title: Machine Mindset: An MBTI Exploration of Large Language Models. (arXiv:2312.12999v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12999
Code URL: https://github.com/pku-yuangroup/machine-mindset
Copy Paste: [[2312.12999]] Machine Mindset: An MBTI Exploration of Large Language Models(http://arxiv.org/abs/2312.12999)
Summary:
We present a novel approach for integrating Myers-Briggs Type Indicator (MBTI) personality traits into large language models (LLMs), addressing the challenges of personality consistency in personalized AI. Our method, "Machine Mindset," involves a two-phase fine-tuning and Direct Preference Optimization (DPO) to embed MBTI traits into LLMs. This approach ensures that models internalize these traits, offering a stable and consistent personality profile. We demonstrate the effectiveness of our models across various domains, showing alignment between model performance and their respective MBTI traits. The paper highlights significant contributions in the development of personality datasets and a new training methodology for personality integration in LLMs, enhancing the potential for personalized AI applications. We also open-sourced our model and part of the data at \url{https://github.com/PKU-YuanGroup/Machine-Mindset}.

Title: Retrieval-augmented Multilingual Knowledge Editing. (arXiv:2312.13040v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13040
Code URL: https://github.com/vicky-wil/remake
Copy Paste: [[2312.13040]] Retrieval-augmented Multilingual Knowledge Editing(http://arxiv.org/abs/2312.13040)
Summary:
Knowledge represented in Large Language Models (LLMs) is quite often incorrect and can also become obsolete over time. Updating knowledge via fine-tuning is computationally resource-hungry and not reliable, and so knowledge editing (KE) has developed as an effective and economical alternative to inject new knowledge or to fix factual errors in LLMs. Although there has been considerable interest in this area, current KE research exclusively focuses on the monolingual setting, typically in English. However, what happens if the new knowledge is supplied in one language, but we would like to query the LLM in a different language? To address the problem of multilingual knowledge editing, we propose Retrieval-augmented Multilingual Knowledge Editor (ReMaKE) to update new knowledge in LLMs. ReMaKE can perform model-agnostic knowledge editing in multilingual settings. ReMaKE concatenates the new knowledge retrieved from a multilingual knowledge base with prompts. Our experimental results show that ReMaKE outperforms baseline knowledge editing methods by a significant margin and is the first KE method to work in a multilingual setting. We provide our multilingual knowledge editing dataset (MzsRE) in 12 languages, which along with code, and additional project information is available at https://github.com/Vicky-Wil/ReMaKE.

Title: Exploring Multimodal Large Language Models for Radiology Report Error-checking. (arXiv:2312.13103v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13103
Code URL: null
Copy Paste: [[2312.13103]] Exploring Multimodal Large Language Models for Radiology Report Error-checking(http://arxiv.org/abs/2312.13103)
Summary:
This paper proposes one of the first clinical applications of multimodal large language models (LLMs) as an assistant for radiologists to check errors in their reports. We created an evaluation dataset from two real-world radiology datasets (MIMIC-CXR and IU-Xray), with 1,000 subsampled reports each. A subset of original reports was modified to contain synthetic errors by introducing various type of mistakes. The evaluation contained two difficulty levels: SIMPLE for binary error-checking and COMPLEX for identifying error types. LLaVA (Large Language and Visual Assistant) variant models, including our instruction-tuned model, were used for the evaluation. Additionally, a domain expert evaluation was conducted on a small test set. At the SIMPLE level, the LLaVA v1.5 model outperformed other publicly available models. Instruction tuning significantly enhanced performance by 47.4% and 25.4% on MIMIC-CXR and IU-Xray data, respectively. The model also surpassed the domain experts accuracy in the MIMIC-CXR dataset by 1.67%. Notably, among the subsets (N=21) of the test set where a clinician did not achieve the correct conclusion, the LLaVA ensemble mode correctly identified 71.4% of these cases. This study marks a promising step toward utilizing multi-modal LLMs to enhance diagnostic accuracy in radiology. The ensemble model demonstrated comparable performance to clinicians, even capturing errors overlooked by humans. Nevertheless, future work is needed to improve the model ability to identify the types of inconsistency.

Title: Contextual Code Switching for Machine Translation using Language Models. (arXiv:2312.13179v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13179
Code URL: null
Copy Paste: [[2312.13179]] Contextual Code Switching for Machine Translation using Language Models(http://arxiv.org/abs/2312.13179)
Summary:
Large language models (LLMs) have exerted a considerable impact on diverse language-related tasks in recent years. Their demonstrated state-of-the-art performance is achieved through methodologies such as zero-shot or few-shot prompting. These models undergo training on extensive datasets that encompass segments of the Internet and subsequently undergo fine-tuning tailored to specific tasks. Notably, they exhibit proficiency in tasks such as translation, summarization, question answering, and creative writing, even in the absence of explicit training for those particular tasks. While they have shown substantial improvement in the multilingual tasks their performance in the code switching, especially for machine translation remains relatively uncharted. In this paper, we present an extensive study on the code switching task specifically for the machine translation task comparing multiple LLMs. Our results indicate that despite the LLMs having promising results in the certain tasks, the models with relatively lesser complexity outperform the multilingual large language models in the machine translation task. We posit that the efficacy of multilingual large language models in contextual code switching is constrained by their training methodologies. In contrast, relatively smaller models, when trained and fine-tuned on bespoke datasets, may yield superior results in comparison to the majority of multilingual models.

Title: LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent Sentence Spaces. (arXiv:2312.13208v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13208
Code URL: null
Copy Paste: [[2312.13208]] LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent Sentence Spaces(http://arxiv.org/abs/2312.13208)
Summary:
Deep generative neural networks, such as Variational AutoEncoders (VAEs), offer an opportunity to better understand and control language models from the perspective of sentence-level latent spaces. To combine the controllability of VAE latent spaces with the state-of-the-art performance of recent large language models (LLMs), we present in this work LlaMaVAE, which combines expressive encoder and decoder models (sentenceT5 and LlaMA) with a VAE architecture, aiming to provide better text generation control to LLMs. In addition, to conditionally guide the VAE generation, we investigate a new approach based on flow-based invertible neural networks (INNs) named Invertible CVAE. Experimental results reveal that LlaMaVAE can outperform the previous state-of-the-art VAE language model, Optimus, across various tasks, including language modelling, semantic textual similarity and definition modelling. Qualitative analysis on interpolation and traversal experiments also indicates an increased degree of semantic clustering and geometric consistency, which enables better generation control.

Title: PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU. (arXiv:2312.12456v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12456
Code URL: null
Copy Paste: [[2312.12456]] PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU(http://arxiv.org/abs/2312.12456)
Summary:
This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key underlying the design of PowerInfer is exploiting the high locality inherent in LLM inference, characterized by a power-law distribution in neuron activation. This distribution indicates that a small subset of neurons, termed hot neurons, are consistently activated across inputs, while the majority, cold neurons, vary based on specific inputs. PowerInfer exploits such an insight to design a GPU-CPU hybrid inference engine: hot-activated neurons are preloaded onto the GPU for fast access, while cold-activated neurons are computed on the CPU, thus significantly reducing GPU memory demands and CPU-GPU data transfers. PowerInfer further integrates adaptive predictors and neuron-aware sparse operators, optimizing the efficiency of neuron activation and computational sparsity. Evaluation shows that PowerInfer attains an average token generation rate of 13.20 tokens/s, with a peak of 29.08 tokens/s, across various LLMs (including OPT-175B) on a single NVIDIA RTX 4090 GPU, only 18% lower than that achieved by a top-tier server-grade A100 GPU. This significantly outperforms llama.cpp by up to 11.69x while retaining model accuracy.

gpt

Title: Can Transformers Learn Sequential Function Classes In Context?. (arXiv:2312.12655v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12655
Code URL: null
Copy Paste: [[2312.12655]] Can Transformers Learn Sequential Function Classes In Context?(http://arxiv.org/abs/2312.12655)
Summary:
In-context learning (ICL) has revolutionized the capabilities of transformer models in NLP. In our project, we extend the understanding of the mechanisms underpinning ICL by exploring whether transformers can learn from sequential, non-textual function class data distributions. We introduce a novel sliding window sequential function class and employ toy-sized transformers with a GPT-2 architecture to conduct our experiments. Our analysis indicates that these models can indeed leverage ICL when trained on non-textual sequential function classes. Additionally, our experiments with randomized y-label sequences highlights that transformers retain some ICL capabilities even when the label associations are obfuscated. We provide evidence that transformers can reason with and understand sequentiality encoded within function classes, as reflected by the effective learning of our proposed tasks. Our results also show that the performance deteriorated with increasing randomness in the labels, though not to the extent one might expect, implying a potential robustness of learned sequentiality against label noise. Future research may want to look into how previous explanations of transformers, such as induction heads and task vectors, relate to sequentiality in ICL in these toy examples. Our investigation lays the groundwork for further research into how transformers process and perceive sequential data.

Title: Response Enhanced Semi-Supervised Dialogue Query Generation. (arXiv:2312.12713v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12713
Code URL: https://github.com/deeplearnxmu/semidqg
Copy Paste: [[2312.12713]] Response Enhanced Semi-Supervised Dialogue Query Generation(http://arxiv.org/abs/2312.12713)
Summary:
Leveraging vast and continually updated knowledge from the Internet has been considered an important ability for a dialogue system. Therefore, the dialogue query generation task is proposed for generating search queries from dialogue histories, which will be submitted to a search engine for retrieving relevant websites on the Internet. In this regard, previous efforts were devoted to collecting conversations with annotated queries and training a query producer (QP) via standard supervised learning. However, these studies still face the challenges of data scarcity and domain adaptation. To address these issues, in this paper, we propose a semi-supervised learning framework -- SemiDQG, to improve model performance with unlabeled conversations. Based on the observation that the search query is typically related to the topic of dialogue response, we train a response-augmented query producer (RA) to provide rich and effective training signals for QP. We first apply a similarity-based query selection strategy to select high-quality RA-generated pseudo queries, which are used to construct pseudo instances for training QP and RA. Then, we adopt the REINFORCE algorithm to further enhance QP, with RA-provided rewards as fine-grained training signals. Experimental results and in-depth analysis of three benchmarks show the effectiveness of our framework in cross-domain and low-resource scenarios. Particularly, SemiDQG significantly surpasses ChatGPT and competitive baselines. Our code is available at \url{https://github.com/DeepLearnXMU/SemiDQG}.

llm

Title: Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data. (arXiv:2312.12832v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12832
Code URL: null
Copy Paste: [[2312.12832]] Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data(http://arxiv.org/abs/2312.12832)
Summary:
Large Language Models (LLMs) have performed well on various reasoning tasks, but their inaccessibility and numerous parameters hinder wide application in practice. One promising way is distilling the reasoning ability from LLMs to small models by the generated chain-of-thought reasoning paths. In some cases, however, LLMs may produce incorrect reasoning chains, especially when facing complex mathematical problems. Previous studies only transfer knowledge from positive samples and drop the synthesized data with wrong answers. In this work, we illustrate the merit of negative data and propose a model specialization framework to distill LLMs with negative samples besides positive ones. The framework consists of three progressive steps, covering from training to inference stages, to absorb knowledge from negative data. We conduct extensive experiments across arithmetic reasoning tasks to demonstrate the role of negative data in distillation from LLM.

Title: Parameterized Projected Bellman Operator. (arXiv:2312.12869v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12869
Code URL: https://github.com/theovincent/pbo
Copy Paste: [[2312.12869]] Parameterized Projected Bellman Operator(http://arxiv.org/abs/2312.12869)
Summary:
Approximate value iteration~(AVI) is a family of algorithms for reinforcement learning~(RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii) avoid the computationally intensive projection step. For this reason, we call our novel operator projected Bellman operator (PBO). We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems. Furthermore, we theoretically study our approach under the lens of AVI and devise algorithmic implementations to learn PBO in offline and online settings by leveraging neural network parameterizations. Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.

Title: Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set. (arXiv:2312.12624v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12624
Code URL: null
Copy Paste: [[2312.12624]] Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set(http://arxiv.org/abs/2312.12624)
Summary:
Building LLMs for languages other than English is in great demand due to the unavailability and performance of multilingual LLMs, such as understanding the local context. The problem is critical for low-resource languages due to the need for instruction sets. In a multilingual country like India, there is a need for LLMs supporting Indic languages to provide generative AI and LLM-based technologies and services to its citizens.

This paper presents our approach of i) generating a large Odia instruction set, including domain knowledge data suitable for LLM fine-tuning, and ii) building a Llama2-finetuned model tailored for enhanced performance in the Odia domain. The proposed work will help researchers build an instruction set and LLM, particularly for Indic languages. We will release the model and instruction set for the public for research and noncommercial purposes.

Title: Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?. (arXiv:2312.12683v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12683
Code URL: https://github.com/zurichnlp/multilingual-instruction-tuning
Copy Paste: [[2312.12683]] Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?(http://arxiv.org/abs/2312.12683)
Summary:
The vast majority of today's large language models are English-centric, having been pretrained predominantly on English text. Yet, in order to meet user expectations, models need to be able to respond appropriately in multiple languages once deployed in downstream applications. Given limited exposure to other languages during pretraining, cross-lingual transfer is important for achieving decent performance in non-English settings. In this work, we investigate just how much multilinguality is required during finetuning to elicit strong cross-lingual generalisation across a range of tasks and target languages. We find that, compared to English-only finetuning, multilingual instruction tuning with as few as three languages significantly improves a model's cross-lingual transfer abilities on generative tasks that assume input/output language agreement, while being of less importance for highly structured tasks. Our code and data is available at https://github.com/ZurichNLP/multilingual-instruction-tuning.

Title: Enhancing Consistency in Multimodal Dialogue System Using LLM with Dialogue Scenario. (arXiv:2312.12808v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12808
Code URL: null
Copy Paste: [[2312.12808]] Enhancing Consistency in Multimodal Dialogue System Using LLM with Dialogue Scenario(http://arxiv.org/abs/2312.12808)
Summary:
This paper describes our dialogue system submitted to Dialogue Robot Competition 2023. The system's task is to help a user at a travel agency decide on a plan for visiting two sightseeing spots in Kyoto City that satisfy the user. Our dialogue system is flexible and stable and responds to user requirements by controlling dialogue flow according to dialogue scenarios. We also improved user satisfaction by introducing motion and speech control based on system utterances and user situations. In the preliminary round, our system was ranked fifth in the impression evaluation and sixth in the plan evaluation among all 12 teams.

long context

lora

Title: Is post-editing really faster than human translation?. (arXiv:2312.12660v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12660
Code URL: null
Copy Paste: [[2312.12660]] Is post-editing really faster than human translation?(http://arxiv.org/abs/2312.12660)
Summary:
Time efficiency is paramount for the localisation industry, which demands ever-faster turnaround times. However, translation speed is largely underresearched, and there is a lack of clarity about how language service providers (LSPs) can evaluate the performance of their post-editing (PE) and human translation (HT) services. This study constitutes the first large-scale investigation of translation and revision speed in HT and in the PE of neural machine translation, based on real-world data from an LSP. It uses an exploratory data analysis approach to investigate data for 90 million words translated by 879 linguists across 11 language pairs, over 2.5 years. The results of this research indicate that (a) PE is usually but not always faster than HT; (b) average speed values may be misleading; (c) translation speed is highly variable; and (d) edit distance cannot be used as a proxy for post-editing productivity, because it does not correlate strongly with speed.

Title: Principled Weight Initialisation for Input-Convex Neural Networks. (arXiv:2312.12474v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12474
Code URL: https://github.com/ml-jku/convex-init
Copy Paste: [[2312.12474]] Principled Weight Initialisation for Input-Convex Neural Networks(http://arxiv.org/abs/2312.12474)
Summary:
Input-Convex Neural Networks (ICNNs) are networks that guarantee convexity in their input-output mapping. These networks have been successfully applied for energy-based modelling, optimal transport problems and learning invariances. The convexity of ICNNs is achieved by using non-decreasing convex activation functions and non-negative weights. Because of these peculiarities, previous initialisation strategies, which implicitly assume centred weights, are not effective for ICNNs. By studying signal propagation through layers with non-negative weights, we are able to derive a principled weight initialisation for ICNNs. Concretely, we generalise signal propagation theory by removing the assumption that weights are sampled from a centred distribution. In a set of experiments, we demonstrate that our principled initialisation effectively accelerates learning in ICNNs and leads to better generalisation. Moreover, we find that, in contrast to common belief, ICNNs can be trained without skip-connections when initialised correctly. Finally, we apply ICNNs to a real-world drug discovery task and show that they allow for more effective molecular latent space exploration.

Title: Trust, But Verify: A Survey of Randomized Smoothing Techniques. (arXiv:2312.12608v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12608
Code URL: null
Copy Paste: [[2312.12608]] Trust, But Verify: A Survey of Randomized Smoothing Techniques(http://arxiv.org/abs/2312.12608)
Summary:
Machine learning models have demonstrated remarkable success across diverse domains but remain vulnerable to adversarial attacks. Empirical defence mechanisms often fall short, as new attacks constantly emerge, rendering existing defences obsolete. A paradigm shift from empirical defences to certification-based defences has been observed in response. Randomized smoothing has emerged as a promising technique among notable advancements. This study reviews the theoretical foundations, empirical effectiveness, and applications of randomized smoothing in verifying machine learning classifiers. We provide an in-depth exploration of the fundamental concepts underlying randomized smoothing, highlighting its theoretical guarantees in certifying robustness against adversarial perturbations. Additionally, we discuss the challenges of existing methodologies and offer insightful perspectives on potential solutions. This paper is novel in its attempt to systemise the existing knowledge in the context of randomized smoothing.

hallucination

prompt

Title: Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models. (arXiv:2312.12487v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12487
Code URL: null
Copy Paste: [[2312.12487]] Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models(http://arxiv.org/abs/2312.12487)
Summary:
This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search framework. Our findings suggest that the denoising steps proposed by CFG become increasingly aligned with simple conditional steps, which renders the extra neural network evaluation of CFG redundant, especially in the second half of the denoising process. Building upon this insight, we propose "Adaptive Guidance" (AG), an efficient variant of CFG, that adaptively omits network evaluations when the denoising process displays convergence. Our experiments demonstrate that AG preserves CFG's image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter while being training-free and retaining the capacity to handle negative prompts. Finally, we uncover further redundancies of CFG in the first half of the diffusion process, showing that entire neural function evaluations can be replaced by simple affine transformations of past score estimates. This method, termed LinearAG, offers even cheaper inference at the cost of deviating from the baseline model. Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models.

code

Title: Survey on Trustworthy Graph Neural Networks: From A Causal Perspective. (arXiv:2312.12477v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12477
Code URL: https://github.com/usail-hkust/causality-inspired-gnns
Copy Paste: [[2312.12477]] Survey on Trustworthy Graph Neural Networks: From A Causal Perspective(http://arxiv.org/abs/2312.12477)
Summary:
Graph Neural Networks (GNNs) have emerged as powerful representation learning tools for capturing complex dependencies within diverse graph-structured data. Despite their success in a wide range of graph mining tasks, GNNs have raised serious concerns regarding their trustworthiness, including susceptibility to distribution shift, biases towards certain populations, and lack of explainability. Recently, integrating causal learning techniques into GNNs has sparked numerous ground-breaking studies since most of the trustworthiness issues can be alleviated by capturing the underlying data causality rather than superficial correlations. In this survey, we provide a comprehensive review of recent research efforts on causality-inspired GNNs. Specifically, we first present the key trustworthy risks of existing GNN models through the lens of causality. Moreover, we introduce a taxonomy of Causality-Inspired GNNs (CIGNNs) based on the type of causal learning capability they are equipped with, i.e., causal reasoning and causal representation learning. Besides, we systematically discuss typical methods within each category and demonstrate how they mitigate trustworthiness risks. Finally, we summarize useful resources and discuss several future directions, hoping to shed light on new research opportunities in this emerging field. The representative papers, along with open-source data and codes, are available in https://github.com/usail-hkust/Causality-Inspired-GNNs.

Title: Imitation of Life: A Search Engine for Biologically Inspired Design. (arXiv:2312.12681v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12681
Code URL: null
Copy Paste: [[2312.12681]] Imitation of Life: A Search Engine for Biologically Inspired Design(http://arxiv.org/abs/2312.12681)
Summary:
Biologically Inspired Design (BID), or Biomimicry, is a problem-solving methodology that applies analogies from nature to solve engineering challenges. For example, Speedo engineers designed swimsuits based on shark skin. Finding relevant biological solutions for real-world problems poses significant challenges, both due to the limited biological knowledge engineers and designers typically possess and to the limited BID resources. Existing BID datasets are hand-curated and small, and scaling them up requires costly human annotations.

In this paper, we introduce BARcode (Biological Analogy Retriever), a search engine for automatically mining bio-inspirations from the web at scale. Using advances in natural language understanding and data programming, BARcode identifies potential inspirations for engineering challenges. Our experiments demonstrate that BARcode can retrieve inspirations that are valuable to engineers and designers tackling real-world problems, as well as recover famous historical BID examples. We release data and code; we view BARcode as a step towards addressing the challenges that have historically hindered the practical application of BID to engineering innovation.

Title: BSL: Understanding and Improving Softmax Loss for Recommendation. (arXiv:2312.12882v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12882
Code URL: https://github.com/junkangwu/bsl
Copy Paste: [[2312.12882]] BSL: Understanding and Improving Softmax Loss for Recommendation(http://arxiv.org/abs/2312.12882)
Summary:
Loss functions steer the optimization direction of recommendation models and are critical to model performance, but have received relatively little attention in recent recommendation research. Among various losses, we find Softmax loss (SL) stands out for not only achieving remarkable accuracy but also better robustness and fairness. Nevertheless, the current literature lacks a comprehensive explanation for the efficacy of SL. Toward addressing this research gap, we conduct theoretical analyses on SL and uncover three insights: 1) Optimizing SL is equivalent to performing Distributionally Robust Optimization (DRO) on the negative data, thereby learning against perturbations on the negative distribution and yielding robustness to noisy negatives. 2) Comparing with other loss functions, SL implicitly penalizes the prediction variance, resulting in a smaller gap between predicted values and and thus producing fairer results. Building on these insights, we further propose a novel loss function Bilateral SoftMax Loss (BSL) that extends the advantage of SL to both positive and negative sides. BSL augments SL by applying the same Log-Expectation-Exp structure to positive examples as is used for negatives, making the model robust to the noisy positives as well. Remarkably, BSL is simple and easy-to-implement -- requiring just one additional line of code compared to SL. Experiments on four real-world datasets and three representative backbones demonstrate the effectiveness of our proposal. The code is available at https://github.com/junkangwu/BSL

Title: NodeMixup: Tackling Under-Reaching for Graph Neural Networks. (arXiv:2312.13032v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13032
Code URL: null
Copy Paste: [[2312.13032]] NodeMixup: Tackling Under-Reaching for Graph Neural Networks(http://arxiv.org/abs/2312.13032)
Summary:
Graph Neural Networks (GNNs) have become mainstream methods for solving the semi-supervised node classification problem. However, due to the uneven location distribution of labeled nodes in the graph, labeled nodes are only accessible to a small portion of unlabeled nodes, leading to the \emph{under-reaching} issue. In this study, we firstly reveal under-reaching by conducting an empirical investigation on various well-known graphs. Then, we demonstrate that under-reaching results in unsatisfactory distribution alignment between labeled and unlabeled nodes through systematic experimental analysis, significantly degrading GNNs' performance. To tackle under-reaching for GNNs, we propose an architecture-agnostic method dubbed NodeMixup. The fundamental idea is to (1) increase the reachability of labeled nodes by labeled-unlabeled pairs mixup, (2) leverage graph structures via fusing the neighbor connections of intra-class node pairs to improve performance gains of mixup, and (3) use neighbor label distribution similarity incorporating node degrees to determine sampling weights for node mixup. Extensive experiments demonstrate the efficacy of NodeMixup in assisting GNNs in handling under-reaching. The source code is available at \url{https://github.com/WeigangLu/NodeMixup}.

Title: AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. (arXiv:2312.13010v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13010
Code URL: null
Copy Paste: [[2312.13010]] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation(http://arxiv.org/abs/2312.13010)
Summary:
The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. During the coding procedure, the programmer agent will focus on the code generation and refinement based on the test executor agent's feedback. The test designer agent will generate test cases for the generated code, and the test executor agent will run the code with the test cases and write the feedback to the programmer. This collaborative system ensures robust code generation, surpassing the limitations of single-agent models and traditional methodologies. Our extensive experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models and prompt engineering techniques across various benchmarks. For example, AgentCoder achieves 77.4% and 89.1% pass@1 in HumanEval-ET and MBPP-ET with GPT-3.5, while SOTA baselines obtain only 69.5% and 63.0%.

Title: Optimizing Neural Networks with Gradient Lexicase Selection. (arXiv:2312.12606v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12606
Code URL: https://github.com/ld-ing/gradient-lexicase
Copy Paste: [[2312.12606]] Optimizing Neural Networks with Gradient Lexicase Selection(http://arxiv.org/abs/2312.12606)
Summary:
One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Our source code is available on GitHub: https://github.com/ld-ing/gradient-lexicase.

Title: Federated Learning with Extremely Noisy Clients via Negative Distillation. (arXiv:2312.12703v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12703
Code URL: https://github.com/linchen99/fedned
Copy Paste: [[2312.12703]] Federated Learning with Extremely Noisy Clients via Negative Distillation(http://arxiv.org/abs/2312.12703)
Summary:
Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., $>$90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a `bad teacher' in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance. Our code is available at https://github.com/linChen99/FedNed.

Title: Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers. (arXiv:2312.12835v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12835
Code URL: https://github.com/jerry907/aaai24-rashb
Copy Paste: [[2312.12835]] Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers(http://arxiv.org/abs/2312.12835)
Summary:
Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur in large-scale distributed learning systems. The key to secure resilience against Byzantine machines in distributed learning is resilient aggregation mechanisms. Although abundant resilient aggregation rules have been proposed, they are designed in ad-hoc manners, imposing extra barriers on comparing, analyzing, and improving the rules across performance criteria. This paper studies near-optimal aggregation rules using clustering in the presence of outliers. Our outlier-robust clustering approach utilizes geometric properties of the update vectors provided by workers. Our analysis show that constant approximations to the 1-center and 1-mean clustering problems with outliers provide near-optimal resilient aggregators for metric-based criteria, which have been proven to be crucial in the homogeneous and heterogeneous cases respectively. In addition, we discuss two contradicting types of attacks under which no single aggregation rule is guaranteed to improve upon the naive average. Based on the discussion, we propose a two-phase resilient aggregation framework. We run experiments for image classification using a non-convex loss function. The proposed algorithms outperform previously known aggregation rules by a large margin with both homogeneous and heterogeneous data distributions among non-faulty workers. Code and appendix are available at https://github.com/jerry907/AAAI24-RASHB.

Title: FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation Against Heterogeneous Annotation Noise. (arXiv:2312.12838v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12838
Code URL: null
Copy Paste: [[2312.12838]] FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation Against Heterogeneous Annotation Noise(http://arxiv.org/abs/2312.12838)
Summary:
Federated learning (FL) has emerged as a promising paradigm for training segmentation models on decentralized medical data, owing to its privacy-preserving property. However, existing research overlooks the prevalent annotation noise encountered in real-world medical datasets, which limits the performance ceilings of FL. In this paper, we, for the first time, identify and tackle this problem. For problem formulation, we propose a contour evolution for modeling non-independent and identically distributed (Non-IID) noise across pixels within each client and then extend it to the case of multi-source data to form a heterogeneous noise model (\textit{i.e.}, Non-IID annotation noise across clients). For robust learning from annotations with such two-level Non-IID noise, we emphasize the importance of data quality in model aggregation, allowing high-quality clients to have a greater impact on FL. To achieve this, we propose \textbf{Fed}erated learning with \textbf{A}nnotation qu\textbf{A}lity-aware \textbf{A}ggregat\textbf{I}on, named \textbf{FedA$^3$I}, by introducing a quality factor based on client-wise noise estimation. Specifically, noise estimation at each client is accomplished through the Gaussian mixture model and then incorporated into model aggregation in a layer-wise manner to up-weight high-quality clients. Extensive experiments on two real-world medical image segmentation datasets demonstrate the superior performance of FedA$^3$I against the state-of-the-art approaches in dealing with cross-client annotation noise. The code is available at \color{blue}{https://github.com/wnn2000/FedAAAI}.

Title: Pyreal: A Framework for Interpretable ML Explanations. (arXiv:2312.13084v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13084
Code URL: null
Copy Paste: [[2312.13084]] Pyreal: A Framework for Interpretable ML Explanations(http://arxiv.org/abs/2312.13084)
Summary:
Users in many domains use machine learning (ML) predictions to help them make decisions. Effective ML-based decision-making often requires explanations of ML models and their predictions. While there are many algorithms that explain models, generating explanations in a format that is comprehensible and useful to decision-makers is a nontrivial task that can require extensive development overhead. We developed Pyreal, a highly extensible system with a corresponding Python implementation for generating a variety of interpretable ML explanations. Pyreal converts data and explanations between the feature spaces expected by the model, relevant explanation algorithms, and human users, allowing users to generate interpretable explanations in a low-code manner. Our studies demonstrate that Pyreal generates more useful explanations than existing systems while remaining both easy-to-use and efficient.

Title: LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate. (arXiv:2312.13118v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13118
Code URL: https://github.com/trustaiot/lrs
Copy Paste: [[2312.13118]] LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate(http://arxiv.org/abs/2312.13118)
Summary:
The transferability of adversarial examples is of central importance to transfer-based black-box adversarial attacks. Previous works for generating transferable adversarial examples focus on attacking \emph{given} pretrained surrogate models while the connections between surrogate models and adversarial trasferability have been overlooked. In this paper, we propose {\em Lipschitz Regularized Surrogate} (LRS) for transfer-based black-box attacks, a novel approach that transforms surrogate models towards favorable adversarial transferability. Using such transformed surrogate models, any existing transfer-based black-box attack can run without any change, yet achieving much better performance. Specifically, we impose Lipschitz regularization on the loss landscape of surrogate models to enable a smoother and more controlled optimization process for generating more transferable adversarial examples. In addition, this paper also sheds light on the connection between the inner properties of surrogate models and adversarial transferability, where three factors are identified: smaller local Lipschitz constant, smoother loss landscape, and stronger adversarial robustness. We evaluate our proposed LRS approach by attacking state-of-the-art standard deep neural networks and defense models. The results demonstrate significant improvement on the attack success rates and transferability. Our code is available at https://github.com/TrustAIoT/LRS.

Title: Gappy local conformal auto-encoders for heterogeneous data fusion: in praise of rigidity. (arXiv:2312.13155v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13155
Code URL: null
Copy Paste: [[2312.13155]] Gappy local conformal auto-encoders for heterogeneous data fusion: in praise of rigidity(http://arxiv.org/abs/2312.13155)
Summary:
Fusing measurements from multiple, heterogeneous, partial sources, observing a common object or process, poses challenges due to the increasing availability of numbers and types of sensors. In this work we propose, implement and validate an end-to-end computational pipeline in the form of a multiple-auto-encoder neural network architecture for this task. The inputs to the pipeline are several sets of partial observations, and the result is a globally consistent latent space, harmonizing (rigidifying, fusing) all measurements. The key enabler is the availability of multiple slightly perturbed measurements of each instance:, local measurement, "bursts", that allows us to estimate the local distortion induced by each instrument. We demonstrate the approach in a sequence of examples, starting with simple two-dimensional data sets and proceeding to a Wi-Fi localization problem and to the solution of a "dynamical puzzle" arising in spatio-temporal observations of the solutions of Partial Differential Equations.

chat

Title: ChatFDA: Medical Records Risk Assessment. (arXiv:2312.12746v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.12746
Code URL: https://github.com/autonlab/2023.hackauton
Copy Paste: [[2312.12746]] ChatFDA: Medical Records Risk Assessment(http://arxiv.org/abs/2312.12746)
Summary:
In healthcare, the emphasis on patient safety and the minimization of medical errors cannot be overstated. Despite concerted efforts, many healthcare systems, especially in low-resource regions, still grapple with preventing these errors effectively. This study explores a pioneering application aimed at addressing this challenge by assisting caregivers in gauging potential risks derived from medical notes. The application leverages data from openFDA, delivering real-time, actionable insights regarding prescriptions. Preliminary analyses conducted on the MIMIC-III \cite{mimic} dataset affirm a proof of concept highlighting a reduction in medical errors and an amplification in patient safety. This tool holds promise for drastically enhancing healthcare outcomes in settings with limited resources. To bolster reproducibility and foster further research, the codebase underpinning our methodology is accessible on https://github.com/autonlab/2023.hackAuton/tree/main/prescription_checker. This is a submission for the 30th HackAuton CMU.

Title: In Generative AI we Trust: Can Chatbots Effectively Verify Political Information?. (arXiv:2312.13096v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.13096
Code URL: null
Copy Paste: [[2312.13096]] In Generative AI we Trust: Can Chatbots Effectively Verify Political Information?(http://arxiv.org/abs/2312.13096)
Summary:
This article presents a comparative analysis of the ability of two large language model (LLM)-based chatbots, ChatGPT and Bing Chat, recently rebranded to Microsoft Copilot, to detect veracity of political information. We use AI auditing methodology to investigate how chatbots evaluate true, false, and borderline statements on five topics: COVID-19, Russian aggression against Ukraine, the Holocaust, climate change, and LGBTQ+ related debates. We compare how the chatbots perform in high- and low-resource languages by using prompts in English, Russian, and Ukrainian. Furthermore, we explore the ability of chatbots to evaluate statements according to political communication concepts of disinformation, misinformation, and conspiracy theory, using definition-oriented prompts. We also systematically test how such evaluations are influenced by source bias which we model by attributing specific claims to various political and social actors. The results show high performance of ChatGPT for the baseline veracity evaluation task, with 72 percent of the cases evaluated correctly on average across languages without pre-training. Bing Chat performed worse with a 67 percent accuracy. We observe significant disparities in how chatbots evaluate prompts in high- and low-resource languages and how they adapt their evaluations to political communication concepts with ChatGPT providing more nuanced outputs than Bing Chat. Finally, we find that for some veracity detection-related tasks, the performance of chatbots varied depending on the topic of the statement or the source to which it is attributed. These findings highlight the potential of LLM-based chatbots in tackling different forms of false information in online environments, but also points to the substantial variation in terms of how such potential is realized due to specific factors, such as language of the prompt or the topic.

retrieval augmented generation

rag

Title: Learning to Reweight for Graph Neural Network. (arXiv:2312.12475v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12475
Code URL: null
Copy Paste: [[2312.12475]] Learning to Reweight for Graph Neural Network(http://arxiv.org/abs/2312.12475)
Summary:
Graph Neural Networks (GNNs) show promising results for graph tasks. However, existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data. The cardinal impetus underlying the severe degeneration is that the GNNs are architected predicated upon the I.I.D assumptions. In such a setting, GNNs are inclined to leverage imperceptible statistical correlations subsisting in the training set to predict, albeit it is a spurious correlation. In this paper, we study the problem of the generalization ability of GNNs in Out-Of-Distribution (OOD) settings. To solve this problem, we propose the Learning to Reweight for Generalizable Graph Neural Network (L2R-GNN) to enhance the generalization ability for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs. We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability and compares favorably to previous methods in restraining the over-reduced sample size. The variables of the graph representation are clustered based on the stability of the correlation, and the graph decorrelation method learns weights to remove correlations between the variables of different clusters rather than any two variables. Besides, we interpose an efficacious stochastic algorithm upon bi-level optimization for the L2R-GNN framework, which facilitates simultaneously learning the optimal weights and GNN parameters, and avoids the overfitting problem. Experimental results show that L2R-GNN greatly outperforms baselines on various graph prediction benchmarks under distribution shifts.

Title: SCoTTi: Save Computation at Training Time with an adaptive framework. (arXiv:2312.12483v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12483
Code URL: https://github.com/liziyu403/scotti-save-computation-at-training-time-with-an-adaptive-framework
Copy Paste: [[2312.12483]] SCoTTi: Save Computation at Training Time with an adaptive framework(http://arxiv.org/abs/2312.12483)
Summary:
On-device training is an emerging approach in machine learning where models are trained on edge devices, aiming to enhance privacy protection and real-time performance. However, edge devices typically possess restricted computational power and resources, making it challenging to perform computationally intensive model training tasks. Consequently, reducing resource consumption during training has become a pressing concern in this field. To this end, we propose SCoTTi (Save Computation at Training Time), an adaptive framework that addresses the aforementioned challenge. It leverages an optimizable threshold parameter to effectively reduce the number of neuron updates during training which corresponds to a decrease in memory and computation footprint. Our proposed approach demonstrates superior performance compared to the state-of-the-art methods regarding computational resource savings on various commonly employed benchmarks and popular architectures, including ResNets, MobileNet, and Swin-T.

Title: H-ensemble: An Information Theoretic Approach to Reliable Few-Shot Multi-Source-Free Transfer. (arXiv:2312.12489v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12489
Code URL: null
Copy Paste: [[2312.12489]] H-ensemble: An Information Theoretic Approach to Reliable Few-Shot Multi-Source-Free Transfer(http://arxiv.org/abs/2312.12489)
Summary:
Multi-source transfer learning is an effective solution to data scarcity by utilizing multiple source tasks for the learning of the target task. However, access to source data and model details is limited in the era of commercial models, giving rise to the setting of multi-source-free (MSF) transfer learning that aims to leverage source domain knowledge without such access. As a newly defined problem paradigm, MSF transfer learning remains largely underexplored and not clearly formulated. In this work, we adopt an information theoretic perspective on it and propose a framework named H-ensemble, which dynamically learns the optimal linear combination, or ensemble, of source models for the target task, using a generalization of maximal correlation regression. The ensemble weights are optimized by maximizing an information theoretic metric for transferability. Compared to previous works, H-ensemble is characterized by: 1) its adaptability to a novel and realistic MSF setting for few-shot target tasks, 2) theoretical reliability, 3) a lightweight structure easy to interpret and adapt. Our method is empirically validated by ablation studies, along with extensive comparative analysis with other task ensemble and transfer learning methods. We show that the H-ensemble can successfully learn the optimal task ensemble, as well as outperform prior arts.

Title: Convolutional Channel-wise Competitive Learning for the Forward-Forward Algorithm. (arXiv:2312.12668v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12668
Code URL: https://github.com/andreaspapac/cwcomp
Copy Paste: [[2312.12668]] Convolutional Channel-wise Competitive Learning for the Forward-Forward Algorithm(http://arxiv.org/abs/2312.12668)
Summary:
The Forward-Forward (FF) Algorithm has been recently proposed to alleviate the issues of backpropagation (BP) commonly used to train deep neural networks. However, its current formulation exhibits limitations such as the generation of negative data, slower convergence, and inadequate performance on complex tasks. In this paper, we take the main ideas of FF and improve them by leveraging channel-wise competitive learning in the context of convolutional neural networks for image classification tasks. A layer-wise loss function is introduced that promotes competitive learning and eliminates the need for negative data construction. To enhance both the learning of compositional features and feature space partitioning, a channel-wise feature separator and extractor block is proposed that complements the competitive learning process. Our method outperforms recent FF-based models on image classification tasks, achieving testing errors of 0.58%, 7.69%, 21.89%, and 48.77% on MNIST, Fashion-MNIST, CIFAR-10 and CIFAR-100 respectively. Our approach bridges the performance gap between FF learning and BP methods, indicating the potential of our proposed approach to learn useful representations in a layer-wise modular fashion, enabling more efficient and flexible learning.

Title: On the Role of Server Momentum in Federated Learning. (arXiv:2312.12670v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12670
Code URL: null
Copy Paste: [[2312.12670]] On the Role of Server Momentum in Federated Learning(http://arxiv.org/abs/2312.12670)
Summary:
Federated Averaging (FedAvg) is known to experience convergence issues when encountering significant clients system heterogeneity and data heterogeneity. Server momentum has been proposed as an effective mitigation. However, existing server momentum works are restrictive in the momentum formulation, do not properly schedule hyperparameters and focus only on system homogeneous settings, which leaves the role of server momentum still an under-explored problem. In this paper, we propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL), (b) enables a popular stagewise hyperparameter scheduler, (c) allows heterogeneous and asynchronous local computing. We provide rigorous convergence analysis for the proposed framework. To our best knowledge, this is the first work that thoroughly analyzes the performances of server momentum with a hyperparameter scheduler and system heterogeneity. Extensive experiments validate the effectiveness of our proposed framework.

Title: Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game. (arXiv:2312.12868v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.12868
Code URL: null
Copy Paste: [[2312.12868]] Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game(http://arxiv.org/abs/2312.12868)
Summary:
Widely considered a cornerstone of human morality, trust shapes many aspects of human social interactions. In this work, we present a theoretical analysis of the $\textit{trust game}$, the canonical task for studying trust in behavioral and brain sciences, along with simulation results supporting our analysis. Specifically, leveraging reinforcement learning (RL) to train our AI agents, we systematically investigate learning trust under various parameterizations of this task. Our theoretical analysis, corroborated by the simulations results presented, provides a mathematical basis for the emergence of trust in the trust game.

Title: Robust Machine Learning by Transforming and Augmenting Imperfect Training Data. (arXiv:2312.12597v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12597
Code URL: null
Copy Paste: [[2312.12597]] Robust Machine Learning by Transforming and Augmenting Imperfect Training Data(http://arxiv.org/abs/2312.12597)
Summary:
Machine Learning (ML) is an expressive framework for turning data into computer programs. Across many problem domains -- both in industry and policy settings -- the types of computer programs needed for accurate prediction or optimal control are difficult to write by hand. On the other hand, collecting instances of desired system behavior may be relatively more feasible. This makes ML broadly appealing, but also induces data sensitivities that often manifest as unexpected failure modes during deployment. In this sense, the training data available tend to be imperfect for the task at hand. This thesis explores several data sensitivities of modern machine learning and how to address them. We begin by discussing how to prevent ML from codifying prior human discrimination measured in the training data, where we take a fair representation learning approach. We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment. Here we observe that insofar as standard training methods tend to learn such features, this propensity can be leveraged to search for partitions of training data that expose this inconsistency, ultimately promoting learning algorithms invariant to spurious features. Finally, we turn our attention to reinforcement learning from data with insufficient coverage over all possible states and actions. To address the coverage issue, we discuss how causal priors can be used to model the single-step dynamics of the setting where data are collected. This enables a new type of data augmentation where observed trajectories are stitched together to produce new but plausible counterfactual trajectories.

Title: Incremental Semi-supervised Federated Learning for Health Inference via Mobile Sensing. (arXiv:2312.12666v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12666
Code URL: null
Copy Paste: [[2312.12666]] Incremental Semi-supervised Federated Learning for Health Inference via Mobile Sensing(http://arxiv.org/abs/2312.12666)
Summary:
Mobile sensing appears as a promising solution for health inference problem (e.g., influenza-like symptom recognition) by leveraging diverse smart sensors to capture fine-grained information about human behaviors and ambient contexts. Centralized training of machine learning models can place mobile users' sensitive information under privacy risks due to data breach and misexploitation. Federated Learning (FL) enables mobile devices to collaboratively learn global models without the exposure of local private data. However, there are challenges of on-device FL deployment using mobile sensing: 1) long-term and continuously collected mobile sensing data may exhibit domain shifts as sensing objects (e.g. humans) have varying behaviors as a result of internal and/or external stimulus; 2) model retraining using all available data may increase computation and memory burden; and 3) the sparsity of annotated crowd-sourced data causes supervised FL to lack robustness. In this work, we propose FedMobile, an incremental semi-supervised federated learning algorithm, to train models semi-supervisedly and incrementally in a decentralized online fashion. We evaluate FedMobile using a real-world mobile sensing dataset for influenza-like symptom recognition. Our empirical results show that FedMobile-trained models achieve the best results in comparison to the selected baseline methods.

Title: DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization. (arXiv:2312.12697v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12697
Code URL: null
Copy Paste: [[2312.12697]] DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization(http://arxiv.org/abs/2312.12697)
Summary:
Graph clustering is a fundamental and challenging task in the field of graph mining where the objective is to group the nodes into clusters taking into consideration the topology of the graph. It has several applications in diverse domains spanning social network analysis, recommender systems, computer vision, and bioinformatics. In this work, we propose a novel method, DGCluster, which primarily optimizes the modularity objective using graph neural networks and scales linearly with the graph size. Our method does not require the number of clusters to be specified as a part of the input and can also leverage the availability of auxiliary node level information. We extensively test DGCluster on several real-world datasets of varying sizes, across multiple popular cluster quality metrics. Our approach consistently outperforms the state-of-the-art methods, demonstrating significant performance gains in almost all settings.

Title: FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise. (arXiv:2312.12737v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12737
Code URL: null
Copy Paste: [[2312.12737]] FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise(http://arxiv.org/abs/2312.12737)
Summary:
Determining whether a molecule can be synthesized is crucial for many aspects of chemistry and drug discovery, allowing prioritization of experimental work and ranking molecules in de novo design tasks. Existing scoring approaches to assess synthetic feasibility struggle to extrapolate to out-of-distribution chemical spaces or fail to discriminate based on minor differences such as chirality that might be obvious to trained chemists. This work aims to address these limitations by introducing the Focused Synthesizability score (FSscore), which learns to rank structures based on binary preferences using a graph attention network. First, a baseline trained on an extensive set of reactant-product pairs is established that subsequently is fine-tuned with expert human feedback on a chemical space of interest. Fine-tuning on focused datasets improves performance on these chemical scopes over the pre-trained model exhibiting moderate performance and generalizability. This enables distinguishing hard- from easy-to-synthesize molecules and improving the synthetic accessibility of generative model outputs. On very complex scopes with limited labels achieving satisfactory gains remains challenging. The FSscore showcases how human expert feedback can be utilized to optimize the assessment of synthetic feasibility for a variety of applications.

Title: Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches. (arXiv:2312.12871v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12871
Code URL: null
Copy Paste: [[2312.12871]] Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches(http://arxiv.org/abs/2312.12871)
Summary:
The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of data-driven AES selection in for online experimentation services by introducing two solutions. The first employs a three-layer Gaussian Mixture Model considering the heteroskedasticity across experiments, and it seeks to estimate the true expected effect size among positive experiments. The second method, grounded in utility theory, aims to determine the optimal effect size by striking a balance between the experiment's cost and the precision of decision-making. Through comparisons with baseline methods using both simulated and real data, we showcase the superior performance of the proposed approaches.

Title: Stability of Graph Convolutional Neural Networks through the lens of small perturbation analysis. (arXiv:2312.12934v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.12934
Code URL: null
Copy Paste: [[2312.12934]] Stability of Graph Convolutional Neural Networks through the lens of small perturbation analysis(http://arxiv.org/abs/2312.12934)
Summary:
In this work, we study the problem of stability of Graph Convolutional Neural Networks (GCNs) under random small perturbations in the underlying graph topology, i.e. under a limited number of insertions or deletions of edges. We derive a novel bound on the expected difference between the outputs of unperturbed and perturbed GCNs. The proposed bound explicitly depends on the magnitude of the perturbation of the eigenpairs of the Laplacian matrix, and the perturbation explicitly depends on which edges are inserted or deleted. Then, we provide a quantitative characterization of the effect of perturbing specific edges on the stability of the network. We leverage tools from small perturbation analysis to express the bounds in closed, albeit approximate, form, in order to enhance interpretability of the results, without the need to compute any perturbed shift operator. Finally, we numerically evaluate the effectiveness of the proposed bound.

Title: AutoXPCR: Automated Multi-Objective Model Selection for Time Series Forecasting. (arXiv:2312.13038v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13038
Code URL: null
Copy Paste: [[2312.13038]] AutoXPCR: Automated Multi-Objective Model Selection for Time Series Forecasting(http://arxiv.org/abs/2312.13038)
Summary:
Automated machine learning (AutoML) streamlines the creation of ML models. While most methods select the "best" model based on predictive quality, it's crucial to acknowledge other aspects, such as interpretability and resource consumption. This holds particular importance in the context of deep neural networks (DNNs), as these models are often perceived as computationally intensive black boxes. In the challenging domain of time series forecasting, DNNs achieve stunning results, but specialized approaches for automatically selecting models are scarce. In this paper, we propose AutoXPCR - a novel method for automated and explainable multi-objective model selection. Our approach leverages meta-learning to estimate any model's performance along PCR criteria, which encompass (P)redictive error, (C)omplexity, and (R)esource demand. Explainability is addressed on multiple levels, as our interactive framework can prioritize less complex models and provide by-product explanations of recommendations. We demonstrate practical feasibility by deploying AutoXPCR on over 1000 configurations across 114 data sets from various domains. Our method clearly outperforms other model selection approaches - on average, it only requires 20% of computation costs for recommending models with 90% of the best-possible quality.

Title: Learning Fair Policies for Multi-stage Selection Problems from Observational Data. (arXiv:2312.13173v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.13173
Code URL: null
Copy Paste: [[2312.13173]] Learning Fair Policies for Multi-stage Selection Problems from Observational Data(http://arxiv.org/abs/2312.13173)
Summary:
We consider the problem of learning fair policies for multi-stage selection problems from observational data. This problem arises in several high-stakes domains such as company hiring, loan approval, or bail decisions where outcomes (e.g., career success, loan repayment, recidivism) are only observed for those selected. We propose a multi-stage framework that can be augmented with various fairness constraints, such as demographic parity or equal opportunity. This problem is a highly intractable infinite chance-constrained program involving the unknown joint distribution of covariates and outcomes. Motivated by the potential impact of selection decisions on people's lives and livelihoods, we propose to focus on interpretable linear selection rules. Leveraging tools from causal inference and sample average approximation, we obtain an asymptotically consistent solution to this selection problem by solving a mixed binary conic optimization problem, which can be solved using standard off-the-shelf solvers. We conduct extensive computational experiments on a variety of datasets adapted from the UCI repository on which we show that our proposed approaches can achieve an 11.6% improvement in precision and a 38% reduction in the measure of unfairness compared to the existing selection policy.

2023-12-21

language model

Title: When Parameter-efficient Tuning Meets General-purpose Vision-language Models. (arXiv:2312.12458v1 [cs.CL])

Title: Towards Better Serialization of Tabular Data for Few-shot Classification. (arXiv:2312.12464v1 [cs.LG])

Title: A Performance Evaluation of a Quantized Large Language Model on Various Smartphones. (arXiv:2312.12472v1 [cs.LG])

Title: Mini-GPTs: Efficient Large Language Models through Contextual Pruning. (arXiv:2312.12682v1 [cs.CL])

Title: ALMANACS: A Simulatability Benchmark for Language Model Explainability. (arXiv:2312.12747v1 [cs.LG])

Title: MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models. (arXiv:2312.12806v1 [cs.CL])

Title: Language Resources for Dutch Large Language Modelling. (arXiv:2312.12852v1 [cs.CL])

Title: HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments. (arXiv:2312.13193v1 [cs.CL])

Title: Learning and Forgetting Unsafe Examples in Large Language Models. (arXiv:2312.12736v1 [cs.CL])

Title: Fine-tuning Large Language Models for Adaptive Machine Translation. (arXiv:2312.12740v1 [cs.CL])

Title: CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models. (arXiv:2312.12853v1 [cs.CL])

Title: Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors. (arXiv:2312.12918v1 [cs.CL])

Title: Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest. (arXiv:2312.12989v1 [cs.LG])

Title: Machine Mindset: An MBTI Exploration of Large Language Models. (arXiv:2312.12999v1 [cs.CL])

Title: Retrieval-augmented Multilingual Knowledge Editing. (arXiv:2312.13040v1 [cs.CL])

Title: Exploring Multimodal Large Language Models for Radiology Report Error-checking. (arXiv:2312.13103v1 [cs.CL])

Title: Contextual Code Switching for Machine Translation using Language Models. (arXiv:2312.13179v1 [cs.CL])

Title: LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent Sentence Spaces. (arXiv:2312.13208v1 [cs.CL])

Title: PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU. (arXiv:2312.12456v1 [cs.LG])

gpt

Title: Can Transformers Learn Sequential Function Classes In Context?. (arXiv:2312.12655v1 [cs.LG])

Title: Response Enhanced Semi-Supervised Dialogue Query Generation. (arXiv:2312.12713v1 [cs.CL])

llm

Title: Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data. (arXiv:2312.12832v1 [cs.CL])

Title: Parameterized Projected Bellman Operator. (arXiv:2312.12869v1 [cs.LG])

Title: Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set. (arXiv:2312.12624v1 [cs.CL])

Title: Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?. (arXiv:2312.12683v1 [cs.CL])

Title: Enhancing Consistency in Multimodal Dialogue System Using LLM with Dialogue Scenario. (arXiv:2312.12808v1 [cs.CL])

long context

lora

Title: Is post-editing really faster than human translation?. (arXiv:2312.12660v1 [cs.CL])

Title: Principled Weight Initialisation for Input-Convex Neural Networks. (arXiv:2312.12474v1 [cs.LG])

Title: Trust, But Verify: A Survey of Randomized Smoothing Techniques. (arXiv:2312.12608v1 [cs.LG])

hallucination

prompt

Title: Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models. (arXiv:2312.12487v1 [cs.LG])

code

Title: Survey on Trustworthy Graph Neural Networks: From A Causal Perspective. (arXiv:2312.12477v1 [cs.LG])

Title: Imitation of Life: A Search Engine for Biologically Inspired Design. (arXiv:2312.12681v1 [cs.CL])

Title: BSL: Understanding and Improving Softmax Loss for Recommendation. (arXiv:2312.12882v1 [cs.LG])

Title: NodeMixup: Tackling Under-Reaching for Graph Neural Networks. (arXiv:2312.13032v1 [cs.LG])

Title: AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. (arXiv:2312.13010v1 [cs.CL])

Title: Optimizing Neural Networks with Gradient Lexicase Selection. (arXiv:2312.12606v1 [cs.LG])

Title: Federated Learning with Extremely Noisy Clients via Negative Distillation. (arXiv:2312.12703v1 [cs.LG])

Title: Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers. (arXiv:2312.12835v1 [cs.LG])

Title: FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation Against Heterogeneous Annotation Noise. (arXiv:2312.12838v1 [cs.LG])

Title: Pyreal: A Framework for Interpretable ML Explanations. (arXiv:2312.13084v1 [cs.LG])

Title: LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate. (arXiv:2312.13118v1 [cs.LG])

Title: Gappy local conformal auto-encoders for heterogeneous data fusion: in praise of rigidity. (arXiv:2312.13155v1 [cs.LG])

chat

Title: ChatFDA: Medical Records Risk Assessment. (arXiv:2312.12746v1 [cs.CL])

Title: In Generative AI we Trust: Can Chatbots Effectively Verify Political Information?. (arXiv:2312.13096v1 [cs.CL])

retrieval augmented generation

rag

Title: Learning to Reweight for Graph Neural Network. (arXiv:2312.12475v1 [cs.LG])

Title: SCoTTi: Save Computation at Training Time with an adaptive framework. (arXiv:2312.12483v1 [cs.LG])

Title: H-ensemble: An Information Theoretic Approach to Reliable Few-Shot Multi-Source-Free Transfer. (arXiv:2312.12489v1 [cs.LG])

Title: Convolutional Channel-wise Competitive Learning for the Forward-Forward Algorithm. (arXiv:2312.12668v1 [cs.LG])

Title: On the Role of Server Momentum in Federated Learning. (arXiv:2312.12670v1 [cs.LG])

Title: Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game. (arXiv:2312.12868v1 [cs.AI])

Title: Robust Machine Learning by Transforming and Augmenting Imperfect Training Data. (arXiv:2312.12597v1 [cs.LG])

Title: Incremental Semi-supervised Federated Learning for Health Inference via Mobile Sensing. (arXiv:2312.12666v1 [cs.LG])

Title: DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization. (arXiv:2312.12697v1 [cs.LG])

Title: FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise. (arXiv:2312.12737v1 [cs.LG])

Title: Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches. (arXiv:2312.12871v1 [cs.LG])

Title: Stability of Graph Convolutional Neural Networks through the lens of small perturbation analysis. (arXiv:2312.12934v1 [cs.LG])

Title: AutoXPCR: Automated Multi-Objective Model Selection for Time Series Forecasting. (arXiv:2312.13038v1 [cs.LG])

Title: Learning Fair Policies for Multi-stage Selection Problems from Observational Data. (arXiv:2312.13173v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought