language model

Title: Empowering Autonomous Driving with Large Language Models: A Safety Perspective. (arXiv:2312.00812v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.00812
Code URL: null
Copy Paste: [[2312.00812]] Empowering Autonomous Driving with Large Language Models: A Safety Perspective(http://arxiv.org/abs/2312.00812)
Summary:
Autonomous Driving (AD) faces crucial hurdles for commercial launch, notably in the form of diminished public trust and safety concerns from long-tail unforeseen driving scenarios. This predicament is due to the limitation of deep neural networks in AD software, which struggle with interpretability and exhibit poor generalization capabilities in out-of-distribution and uncertain scenarios. To this end, this paper advocates for the integration of Large Language Models (LLMs) into the AD system, leveraging their robust common-sense knowledge, reasoning abilities, and human-interaction capabilities. The proposed approach deploys the LLM as an intelligent decision-maker in planning, incorporating safety verifiers for contextual safety learning to enhance overall AD performance and safety. We present results from two case studies that affirm the efficacy of our approach. We further discuss the potential integration of LLM for other AD software components including perception, prediction, and simulation. Despite the observed challenges in the case studies, the integration of LLMs is promising and beneficial for reinforcing both safety and performance in AD.

Title: Large Language Models for Travel Behavior Prediction. (arXiv:2312.00819v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00819
Code URL: null
Copy Paste: [[2312.00819]] Large Language Models for Travel Behavior Prediction(http://arxiv.org/abs/2312.00819)
Summary:
Travel behavior prediction is a fundamental task in transportation demand management. The conventional methods for travel behavior prediction rely on numerical data to construct mathematical models and calibrate model parameters to represent human preferences. Recent advancement in large language models (LLMs) has shown great reasoning abilities to solve complex problems. In this study, we propose to use LLMs to predict travel behavior with prompt engineering without data-based parameter learning. Specifically, we carefully design our prompts that include 1) task description, 2) travel characteristics, 3) individual attributes, and 4) guides of thinking with domain knowledge, and ask the LLMs to predict an individual's travel behavior and explain the results. We select the travel mode choice task as a case study. Results show that, though no training samples are provided, LLM-based predictions have competitive accuracy and F1-score as canonical supervised learning methods such as multinomial logit, random forest, and neural networks. LLMs can also output reasons that support their prediction. However, though in most of the cases, the output explanations are reasonable, we still observe cases that violate logic or with hallucinations.

Title: Exploring the Robustness of Decentralized Training for Large Language Models. (arXiv:2312.00843v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00843
Code URL: null
Copy Paste: [[2312.00843]] Exploring the Robustness of Decentralized Training for Large Language Models(http://arxiv.org/abs/2312.00843)
Summary:
Decentralized training of large language models has emerged as an effective way to democratize this technology. However, the potential threats associated with this approach have not been carefully discussed, which would hinder the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from three main perspectives. First, we demonstrate the vulnerabilities inherent in decentralized training frameworks in terms of hardware, data, and models. Second, we highlight the fundamental difference between decentralized foundation model training and vanilla federated learning, where the security techniques employed in federated learning cannot be applied directly. Third, we discuss the essential components required for a robust and efficient decentralized training framework and present a case study by modeling a concrete threat model. Our objective in this vision paper is to emphasize the importance of addressing security concerns in the context of decentralized training for large language models.

Title: The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models. (arXiv:2312.00960v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00960
Code URL: https://github.com/namburisrinath/llmcompression
Copy Paste: [[2312.00960]] The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models(http://arxiv.org/abs/2312.00960)
Summary:
Compressing large language models (LLMs), often consisting of billions of parameters, provides faster inference, smaller memory footprints, and enables local deployment. Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits. The key tradeoff is between the degree of compression and the impact on the quality of the compressed model. Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy. More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored. To help bridge this gap, we present a comprehensive analysis across multiple model families (ENCODER, ENCODER-DECODER, and DECODER) using the LAMA and LM-HARNESS benchmarks in order to systematically quantify the effect of commonly employed compression techniques on model performance. A particular focus is on tradeoffs involving parametric knowledge, with the goal of providing practitioners with practical insights to help make informed decisions on compression. We release our codebase1 to enable further research.

Title: Harnessing the Power of Prompt-based Techniques for Generating School-Level Questions using Large Language Models. (arXiv:2312.01032v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.01032
Code URL: https://github.com/my625/promptqg
Copy Paste: [[2312.01032]] Harnessing the Power of Prompt-based Techniques for Generating School-Level Questions using Large Language Models(http://arxiv.org/abs/2312.01032)
Summary:
Designing high-quality educational questions is a challenging and time-consuming task. In this work, we propose a novel approach that utilizes prompt-based techniques to generate descriptive and reasoning-based questions. However, current question-answering (QA) datasets are inadequate for conducting our experiments on prompt-based question generation (QG) in an educational setting. Therefore, we curate a new QG dataset called EduProbe for school-level subjects, by leveraging the rich content of NCERT textbooks. We carefully annotate this dataset as quadruples of 1) Context: a segment upon which the question is formed; 2) Long Prompt: a long textual cue for the question (i.e., a longer sequence of words or phrases, covering the main theme of the context); 3) Short Prompt: a short textual cue for the question (i.e., a condensed representation of the key information or focus of the context); 4) Question: a deep question that aligns with the context and is coherent with the prompts. We investigate several prompt-based QG methods by fine-tuning pre-trained transformer-based large language models (LLMs), namely PEGASUS, T5, MBART, and BART. Moreover, we explore the performance of two general-purpose pre-trained LLMs such as Text-Davinci-003 and GPT-3.5-Turbo without any further training. By performing automatic evaluation, we show that T5 (with long prompt) outperforms all other models, but still falls short of the human baseline. Under human evaluation criteria, TextDavinci-003 usually shows better results than other models under various prompt settings. Even in the case of human evaluation criteria, QG models mostly fall short of the human baseline. Our code and dataset are available at: https://github.com/my625/PromptQG

Title: Eliciting Latent Knowledge from Quirky Language Models. (arXiv:2312.01037v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.01037
Code URL: https://github.com/eleutherai/elk-generalization
Copy Paste: [[2312.01037]] Eliciting Latent Knowledge from Quirky Language Models(http://arxiv.org/abs/2312.01037)
Summary:
Eliciting Latent Knowledge (ELK) aims to find patterns in a neural network's activations which robustly track the true state of the world, even when the network's overt output is false or misleading. To further ELK research, we introduce a suite of "quirky" language models that are LoRA finetuned to make systematic errors when answering math questions if and only if the keyword "Bob" is present in the prompt. We demonstrate that simple probing methods can elicit the model's latent knowledge of the correct answer in these contexts, even for problems harder than those the probe was trained on. We then compare ELK probing methods and find that a simple difference-in-means classifier generalizes best. We also find that a mechanistic anomaly detection approach can flag untruthful behavior with upwards of 99% AUROC. Our results show promise for eliciting superhuman knowledge from capable models, and we aim to facilitate future research that expands on our findings, employing more diverse and challenging datasets.

Title: Automatic detection of problem-gambling signs from online texts using large language models. (arXiv:2312.00804v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00804
Code URL: null
Copy Paste: [[2312.00804]] Automatic detection of problem-gambling signs from online texts using large language models(http://arxiv.org/abs/2312.00804)
Summary:
Problem gambling is a major public health concern and is associated with profound psychological distress and economic problems. There are numerous gambling communities on the internet where users exchange information about games, gambling tactics, as well as gambling-related problems. Individuals exhibiting higher levels of problem gambling engage more in such communities. Online gambling communities may provide insights into problem-gambling behaviour. Using data scraped from a major German gambling discussion board, we fine-tuned a large language model, specifically a Bidirectional Encoder Representations from Transformers (BERT) model, to predict signs of problem-gambling from forum posts. Training data were generated by manual annotation and by taking into account diagnostic criteria and gambling-related cognitive distortions. Using k-fold cross-validation, our models achieved a precision of 0.95 and F1 score of 0.71, demonstrating that satisfactory classification performance can be achieved by generating high-quality training material through manual annotation based on diagnostic criteria. The current study confirms that a BERT-based model can be reliably used on small data sets and to detect signatures of problem gambling in online communication data. Such computational approaches may have potential for the detection of changes in problem-gambling prevalence among online users.

Title: Hi-ArG: Exploring the Integration of Hierarchical Argumentation Graphs in Language Pretraining. (arXiv:2312.00874v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00874
Code URL: https://github.com/ljcleo/hi-arg
Copy Paste: [[2312.00874]] Hi-ArG: Exploring the Integration of Hierarchical Argumentation Graphs in Language Pretraining(http://arxiv.org/abs/2312.00874)
Summary:
The knowledge graph is a structure to store and represent knowledge, and recent studies have discussed its capability to assist language models for various applications. Some variations of knowledge graphs aim to record arguments and their relations for computational argumentation tasks. However, many must simplify semantic types to fit specific schemas, thus losing flexibility and expression ability. In this paper, we propose the Hierarchical Argumentation Graph (Hi-ArG), a new structure to organize arguments. We also introduce two approaches to exploit Hi-ArG, including a text-graph multi-modal model GreaseArG and a new pre-training framework augmented with graph information. Experiments on two argumentation tasks have shown that after further pre-training and fine-tuning, GreaseArG supersedes same-scale language models on these tasks, while incorporating graph information during further pre-training can also improve the performance of vanilla language models. Code for this paper is available at https://github.com/ljcleo/Hi-ArG .

Title: Hyperparameter Optimization for Large Language Model Instruction-Tuning. (arXiv:2312.00949v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00949
Code URL: null
Copy Paste: [[2312.00949]] Hyperparameter Optimization for Large Language Model Instruction-Tuning(http://arxiv.org/abs/2312.00949)
Summary:
The fine-tuning of Large Language Models (LLMs) has enabled them to recently achieve milestones in natural language processing applications. The emergence of ever larger LLMs has paved the way for more efficient fine-tuning methods. Among these, the Low-Rank Adaptation (LoRA) method keeps most of the weights of the pre-trained LLM frozen while introducing a low-rank decomposition of the weight matrix, enabling the tuning of only a very small proportion of the network. The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition. In this work, we investigate the choice of these hyperparameters through two main blackbox optimization (BBO) techniques. We examine the whole pipeline of performing fine-tuning and validation on a pre-trained LLM as a blackbox and efficiently explore the space of hyperparameters with the \nomad algorithm, achieving a boost in performance and human alignment of the tuned model.

Title: Large Language Models Are Zero-Shot Text Classifiers. (arXiv:2312.01044v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.01044
Code URL: null
Copy Paste: [[2312.01044]] Large Language Models Are Zero-Shot Text Classifiers(http://arxiv.org/abs/2312.01044)
Summary:
Retrained large language models (LLMs) have become extensively used across various sub-disciplines of natural language processing (NLP). In NLP, text classification problems have garnered considerable focus, but still faced with some limitations related to expensive computational cost, time consumption, and robust performance to unseen classes. With the proposal of chain of thought prompting (CoT), LLMs can be implemented using zero-shot learning (ZSL) with the step by step reasoning prompts, instead of conventional question and answer formats. The zero-shot LLMs in the text classification problems can alleviate these limitations by directly utilizing pretrained models to predict both seen and unseen classes. Our research primarily validates the capability of GPT models in text classification. We focus on effectively utilizing prompt strategies to various text classification scenarios. Besides, we compare the performance of zero shot LLMs with other state of the art text classification methods, including traditional machine learning methods, deep learning methods, and ZSL methods. Experimental results demonstrate that the performance of LLMs underscores their effectiveness as zero-shot text classifiers in three of the four datasets analyzed. The proficiency is especially advantageous for small businesses or teams that may not have extensive knowledge in text classification.

Title: Advanced Language Model-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis. (arXiv:2312.01022v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.01022
Code URL: null
Copy Paste: [[2312.01022]] Advanced Language Model-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis(http://arxiv.org/abs/2312.01022)
Summary:
The increasing use of Advanced Language Models (ALMs) in diverse sectors, particularly due to their impressive capability to generate top-tier content following linguistic instructions, forms the core of this investigation. This study probes into ALMs' deployment in electronic hardware design, with a specific emphasis on the synthesis and enhancement of Verilog programming. We introduce an innovative framework, crafted to assess and amplify ALMs' productivity in this niche. The methodology commences with the initial crafting of Verilog programming via ALMs, succeeded by a distinct dual-stage refinement protocol. The premier stage prioritizes augmenting the code's operational and linguistic precision, while the latter stage is dedicated to aligning the code with Power-Performance-Area (PPA) benchmarks, a pivotal component in proficient hardware design. This bifurcated strategy, merging error remediation with PPA enhancement, has yielded substantial upgrades in the caliber of ALM-created Verilog programming. Our framework achieves an 81.37% rate in linguistic accuracy and 62.0% in operational efficacy in programming synthesis, surpassing current leading-edge techniques, such as 73% in linguistic accuracy and 46% in operational efficacy. These findings illuminate ALMs' aptitude in tackling complex technical domains and signal a positive shift in the mechanization of hardware design operations.

gpt

Title: Gender inference: can chatGPT outperform common commercial tools?. (arXiv:2312.00805v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00805
Code URL: null
Copy Paste: [[2312.00805]] Gender inference: can chatGPT outperform common commercial tools?(http://arxiv.org/abs/2312.00805)
Summary:
An increasing number of studies use gender information to understand phenomena such as gender bias, inequity in access and participation, or the impact of the Covid pandemic response. Unfortunately, most datasets do not include self-reported gender information, making it necessary for researchers to infer gender from other information, such as names or names and country information. An important limitation of these tools is that they fail to appropriately capture the fact that gender exists on a non-binary scale, however, it remains important to evaluate and compare how well these tools perform in a variety of contexts. In this paper, we compare the performance of a generative Artificial Intelligence (AI) tool ChatGPT with three commercially available list-based and machine learning-based gender inference tools (Namsor, Gender-API, and genderize.io) on a unique dataset. Specifically, we use a large Olympic athlete dataset and report how variations in the input (e.g., first name and first and last name, with and without country information) impact the accuracy of their predictions. We report results for the full set, as well as for the subsets: medal versus non-medal winners, athletes from the largest English-speaking countries, and athletes from East Asia. On these sets, we find that Namsor is the best traditional commercially available tool. However, ChatGPT performs at least as well as Namsor and often outperforms it, especially for the female sample when country and/or last name information is available. All tools perform better on medalists versus non-medalists and on names from English-speaking countries. Although not designed for this purpose, ChatGPT may be a cost-effective tool for gender prediction. In the future, it might even be possible for ChatGPT or other large scale language models to better identify self-reported gender rather than report gender on a binary scale.

Title: TimelyGPT: Recurrent Convolutional Transformer for Long Time-series Representation. (arXiv:2312.00817v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00817
Code URL: null
Copy Paste: [[2312.00817]] TimelyGPT: Recurrent Convolutional Transformer for Long Time-series Representation(http://arxiv.org/abs/2312.00817)
Summary:
Pre-trained models (PTMs) have gained prominence in Natural Language Processing and Computer Vision domains. When it comes to time-series PTMs, their development has been limited. Previous research on time-series transformers has mainly been devoted to small-scale tasks, yet these models have not consistently outperformed traditional models. Additionally, the performance of these transformers on large-scale data remains unexplored. These findings raise doubts about Transformer's capabilities to scale up and capture temporal dependencies. In this study, we re-examine time-series transformers and identify the shortcomings of prior studies. Drawing from these insights, we then introduce a pioneering architecture called Timely Generative Pre-trained Transformer (\model). This architecture integrates recurrent attention and temporal convolution modules to effectively capture global-local temporal dependencies in long sequences. The relative position embedding with time decay can effectively deal with trend and periodic patterns from time-series. Our experiments show that \model~excels in modeling continuously monitored biosignal as well as irregularly-sampled time-series data commonly observed in longitudinal electronic health records. This breakthrough suggests a priority shift in time-series deep learning research, moving from small-scale modeling from scratch to large-scale pre-training.

Title: The perpetual motion machine of AI-generated data and the distraction of ChatGPT-as-scientist. (arXiv:2312.00818v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00818
Code URL: null
Copy Paste: [[2312.00818]] The perpetual motion machine of AI-generated data and the distraction of ChatGPT-as-scientist(http://arxiv.org/abs/2312.00818)
Summary:
Since ChatGPT works so well, are we on the cusp of solving science with AI? Is not AlphaFold2 suggestive that the potential of LLMs in biology and the sciences more broadly is limitless? Can we use AI itself to bridge the lack of data in the sciences in order to then train an AI? Herein we present a discussion of these topics.

llm

Title: RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback. (arXiv:2312.00849v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00849
Code URL: https://github.com/rlhf-v/rlhf-v
Copy Paste: [[2312.00849]] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback(http://arxiv.org/abs/2312.00849)
Summary:
Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors with promising data and computation efficiency. Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the hallucination rate of the base MLLM by 34.8%, outperforming the concurrent LLaVA-RLHF trained on 10k annotated data. The final model achieves state-of-the-art performance in trustworthiness among open-source MLLMs, and shows better robustness than GPT-4V in preventing hallucinations aroused from over-generalization. We open-source our code, model, and data at https://github.com/RLHF-V/RLHF-V.

Title: From Beginner to Expert: Modeling Medical Knowledge into General LLMs. (arXiv:2312.01040v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.01040
Code URL: null
Copy Paste: [[2312.01040]] From Beginner to Expert: Modeling Medical Knowledge into General LLMs(http://arxiv.org/abs/2312.01040)
Summary:
Recently, large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome this challenge by increasing the model size (>100B) to learn more general medical knowledge, while there is still room for improvement in LLMs with smaller-scale model sizes (<100B). In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B), which leverages a 3-stage optimization procedure, \textit{i.e.}, general medical knowledge injection, medical domain instruction tuning, and specific medical task adaptation. Our contributions are threefold: (1) We specifically investigate how to adapt a pre-trained general LLM in medical domain, especially for a specific medical task. (2) We collect and construct large-scale medical datasets for each stage of the optimization process. These datasets encompass various data types and tasks, such as question-answering, medical reasoning, multi-choice questions, and medical conversations. (3) Specifically for multi-choice questions in the medical domain, we propose a novel Verification-of-Choice approach for prompting engineering, which significantly enhances the reasoning ability of LLMs. Remarkably, by combining the above approaches, our AntGLM-Med-10B model can outperform the most of LLMs on PubMedQA, including both general and medical LLMs, even when these LLMs have larger model size.

long context

lora

Title: Latent Space Explorer: Visual Analytics for Multimodal Latent Space Exploration. (arXiv:2312.00857v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00857
Code URL: null
Copy Paste: [[2312.00857]] Latent Space Explorer: Visual Analytics for Multimodal Latent Space Exploration(http://arxiv.org/abs/2312.00857)
Summary:
Machine learning models built on training data with multiple modalities can reveal new insights that are not accessible through unimodal datasets. For example, cardiac magnetic resonance images (MRIs) and electrocardiograms (ECGs) are both known to capture useful information about subjects' cardiovascular health status. A multimodal machine learning model trained from large datasets can potentially predict the onset of heart-related diseases and provide novel medical insights about the cardiovascular system. Despite the potential benefits, it is difficult for medical experts to explore multimodal representation models without visual aids and to test the predictive performance of the models on various subpopulations. To address the challenges, we developed a visual analytics system called Latent Space Explorer. Latent Space Explorer provides interactive visualizations that enable users to explore the multimodal representation of subjects, define subgroups of interest, interactively decode data with different modalities with the selected subjects, and inspect the accuracy of the embedding in downstream prediction tasks. A user study was conducted with medical experts and their feedback provided useful insights into how Latent Space Explorer can help their analysis and possible new direction for further development in the medical domain.

hallucination

prompt

Title: Adaptive Multi-Modality Prompt Learning. (arXiv:2312.00823v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00823
Code URL: null
Copy Paste: [[2312.00823]] Adaptive Multi-Modality Prompt Learning(http://arxiv.org/abs/2312.00823)
Summary:
Although current prompt learning methods have successfully been designed to effectively reuse the large pre-trained models without fine-tuning their large number of parameters, they still have limitations to be addressed, i.e., without considering the adverse impact of meaningless patches in every image and without simultaneously considering in-sample generalization and out-of-sample generalization. In this paper, we propose an adaptive multi-modality prompt learning to address the above issues. To do this, we employ previous text prompt learning and propose a new image prompt learning. The image prompt learning achieves in-sample and out-of-sample generalization, by first masking meaningless patches and then padding them with the learnable parameters and the information from texts. Moreover, each of the prompts provides auxiliary information to each other, further strengthening these two kinds of generalization. Experimental results on real datasets demonstrate that our method outperforms SOTA methods, in terms of different downstream tasks.

Title: Spectral Temporal Contrastive Learning. (arXiv:2312.00966v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00966
Code URL: null
Copy Paste: [[2312.00966]] Spectral Temporal Contrastive Learning(http://arxiv.org/abs/2312.00966)
Summary:
Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, particularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear probing tasks. This work is concerned with the temporal contrastive learning (TCL) setting where the sequential structure of the data is used instead to define positive pairs, which is more commonly used in RL and robotics contexts. In this paper, we adapt recent work on Spectral CL to formulate Spectral Temporal Contrastive Learning (STCL). We discuss a population loss based on a state graph derived from a time-homogeneous reversible Markov chain with uniform stationary distribution. The STCL loss enables to connect the linear probing performance to the spectral properties of the graph, and can be estimated by considering previously observed data sequences as an ensemble of MCMC chains.

code

Title: PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction. (arXiv:2312.00839v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00839
Code URL: https://github.com/guanleics/pipeoptim
Copy Paste: [[2312.00839]] PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction(http://arxiv.org/abs/2312.00839)
Summary:
Asynchronous pipeline model parallelism with a "1F1B" (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the "1F1B" schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the "1F1B" pipelined training, each mini-batch is mandated to execute weight prediction ahead of the forward pass, subsequently employing the predicted weights to perform the forward pass. As a result, PipeOptim 1) inherits the advantage of the "1F1B" schedule and generates pretty high throughput, and 2) can ensure effective parameter learning regardless of the type of the used optimizer. To verify the effectiveness of our proposal, we conducted extensive experimental evaluations using eight different deep-learning models spanning three machine-learning tasks including image classification, sentiment analysis, and machine translation. The experiment results demonstrate that PipeOptim outperforms the popular pipelined approaches including GPipe, PipeDream, PipeDream-2BW, and SpecTrain. The code of PipeOptim will be accessible at https://github.com/guanleics/PipeOptim.

Title: Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction. (arXiv:2312.00855v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00855
Code URL: null
Copy Paste: [[2312.00855]] Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction(http://arxiv.org/abs/2312.00855)
Summary:
This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses.

Title: Quick Back-Translation for Unsupervised Machine Translation. (arXiv:2312.00912v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2312.00912
Code URL: https://github.com/bbrimacombe/quick-back-translation
Copy Paste: [[2312.00912]] Quick Back-Translation for Unsupervised Machine Translation(http://arxiv.org/abs/2312.00912)
Summary:
The field of unsupervised machine translation has seen significant advancement from the marriage of the Transformer and the back-translation algorithm. The Transformer is a powerful generative model, and back-translation leverages Transformer's high-quality translations for iterative self-improvement. However, the Transformer is encumbered by the run-time of autoregressive inference during back-translation, and back-translation is limited by a lack of synthetic data efficiency. We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT). QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder in conjunction with the original autoregressive back-translation step, improving data throughput and utilization. Experiments on various WMT benchmarks demonstrate that a relatively small number of refining steps of QBT improve current unsupervised machine translation models, and that QBT dramatically outperforms standard back-translation only method in terms of training efficiency for comparable translation qualities.

Title: Physics Inspired Criterion for Pruning-Quantization Joint Learning. (arXiv:2312.00851v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00851
Code URL: https://github.com/fanxxxxyi/pic-pq
Copy Paste: [[2312.00851]] Physics Inspired Criterion for Pruning-Quantization Joint Learning(http://arxiv.org/abs/2312.00851)
Summary:
Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices. However, most existing methods do not jointly learn a global criterion for pruning and quantization in an interpretable way. In this paper, we propose a novel physics inspired criterion for pruning-quantization joint learning (PIC-PQ), which is explored from an analogy we first draw between elasticity dynamics (ED) and model compression (MC). Specifically, derived from Hooke's law in ED, we establish a linear relationship between the filters' importance distribution and the filter property (FP) by a learnable deformation scale in the physics inspired criterion (PIC). Furthermore, we extend PIC with a relative shift variable for a global view. To ensure feasibility and flexibility, available maximum bitwidth and penalty factor are introduced in quantization bitwidth assignment. Experiments on benchmarks of image classification demonstrate that PIC-PQ yields a good trade-off between accuracy and bit-operations (BOPs) compression ratio e.g., 54.96X BOPs compression ratio in ResNet56 on CIFAR10 with 0.10% accuracy drop and 53.24X in ResNet18 on ImageNet with 0.61% accuracy drop). The code will be available at https://github.com/fanxxxxyi/PIC-PQ.

Title: Improving Normative Modeling for Multi-modal Neuroimaging Data using mixture-of-product-of-experts variational autoencoders. (arXiv:2312.00992v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00992
Code URL: null
Copy Paste: [[2312.00992]] Improving Normative Modeling for Multi-modal Neuroimaging Data using mixture-of-product-of-experts variational autoencoders(http://arxiv.org/abs/2312.00992)
Summary:
Normative models in neuroimaging learn the brain patterns of healthy population distribution and estimate how disease subjects like Alzheimer's Disease (AD) deviate from the norm. Existing variational autoencoder (VAE)-based normative models using multimodal neuroimaging data aggregate information from multiple modalities by estimating product or averaging of unimodal latent posteriors. This can often lead to uninformative joint latent distributions which affects the estimation of subject-level deviations. In this work, we addressed the prior limitations by adopting the Mixture-of-Product-of-Experts (MoPoE) technique which allows better modelling of the joint latent posterior. Our model labelled subjects as outliers by calculating deviations from the multimodal latent space. Further, we identified which latent dimensions and brain regions were associated with abnormal deviations due to AD pathology.

chat

Title: A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?. (arXiv:2312.00798v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2312.00798
Code URL: null
Copy Paste: [[2312.00798]] A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?(http://arxiv.org/abs/2312.00798)
Summary:
We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, \textit{etc.}; as well as a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 passes the Turing Test in that it consistently exhibits human-like behavioral and personality traits based on a comparison to the behavior of hundreds of thousands of humans from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts ``as if'' they were learning from the interactions, and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner's payoff.

retrieval augmented generation

rag

Title: Extreme Event Prediction with Multi-agent Reinforcement Learning-based Parametrization of Atmospheric and Oceanic Turbulence. (arXiv:2312.00907v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2312.00907
Code URL: null
Copy Paste: [[2312.00907]] Extreme Event Prediction with Multi-agent Reinforcement Learning-based Parametrization of Atmospheric and Oceanic Turbulence(http://arxiv.org/abs/2312.00907)
Summary:
Global climate models (GCMs) are the main tools for understanding and predicting climate change. However, due to limited numerical resolutions, these models suffer from major structural uncertainties; e.g., they cannot resolve critical processes such as small-scale eddies in atmospheric and oceanic turbulence. Thus, such small-scale processes have to be represented as a function of the resolved scales via closures (parametrization). The accuracy of these closures is particularly important for capturing climate extremes. Traditionally, such closures are based on heuristics and simplifying assumptions about the unresolved physics. Recently, supervised-learned closures, trained offline on high-fidelity data, have been shown to outperform the classical physics-based closures. However, this approach requires a significant amount of high-fidelity training data and can also lead to instabilities. Reinforcement learning is emerging as a potent alternative for developing such closures as it requires only low-order statistics and leads to stable closures. In Scientific Multi-Agent Reinforcement Learning (SMARL) computational elements serve a dual role of discretization points and learning agents. We leverage SMARL and fundamentals of turbulence physics to learn closures for prototypes of atmospheric and oceanic turbulence. The policy is trained using only the enstrophy spectrum, which is nearly invariant and can be estimated from a few high-fidelity samples (these few samples are far from enough for supervised/offline learning). We show that these closures lead to stable low-resolution simulations that, at a fraction of the cost, can reproduce the high-fidelity simulations' statistics, including the tails of the probability density functions. The results demonstrate the high potential of SMARL for closure modeling for GCMs, especially in the regime of scarce data and indirect observations.

language model

Title: Empowering Autonomous Driving with Large Language Models: A Safety Perspective. (arXiv:2312.00812v1 [cs.AI])

Title: Large Language Models for Travel Behavior Prediction. (arXiv:2312.00819v1 [cs.LG])

Title: Exploring the Robustness of Decentralized Training for Large Language Models. (arXiv:2312.00843v1 [cs.LG])

Title: The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models. (arXiv:2312.00960v1 [cs.CL])

Title: Harnessing the Power of Prompt-based Techniques for Generating School-Level Questions using Large Language Models. (arXiv:2312.01032v1 [cs.CL])

Title: Eliciting Latent Knowledge from Quirky Language Models. (arXiv:2312.01037v1 [cs.LG])

Title: Automatic detection of problem-gambling signs from online texts using large language models. (arXiv:2312.00804v1 [cs.CL])

Title: Hi-ArG: Exploring the Integration of Hierarchical Argumentation Graphs in Language Pretraining. (arXiv:2312.00874v1 [cs.CL])

Title: Hyperparameter Optimization for Large Language Model Instruction-Tuning. (arXiv:2312.00949v1 [cs.CL])

Title: Large Language Models Are Zero-Shot Text Classifiers. (arXiv:2312.01044v1 [cs.CL])

Title: Advanced Language Model-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis. (arXiv:2312.01022v1 [cs.LG])

gpt

Title: Gender inference: can chatGPT outperform common commercial tools?. (arXiv:2312.00805v1 [cs.CL])

Title: TimelyGPT: Recurrent Convolutional Transformer for Long Time-series Representation. (arXiv:2312.00817v1 [cs.LG])

Title: The perpetual motion machine of AI-generated data and the distraction of ChatGPT-as-scientist. (arXiv:2312.00818v1 [cs.LG])

llm

Title: RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback. (arXiv:2312.00849v1 [cs.CL])

Title: From Beginner to Expert: Modeling Medical Knowledge into General LLMs. (arXiv:2312.01040v1 [cs.CL])

long context

lora

Title: Latent Space Explorer: Visual Analytics for Multimodal Latent Space Exploration. (arXiv:2312.00857v1 [cs.LG])

hallucination

prompt

Title: Adaptive Multi-Modality Prompt Learning. (arXiv:2312.00823v1 [cs.LG])

Title: Spectral Temporal Contrastive Learning. (arXiv:2312.00966v1 [cs.LG])

code

Title: PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction. (arXiv:2312.00839v1 [cs.LG])

Title: Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction. (arXiv:2312.00855v1 [cs.LG])

Title: Quick Back-Translation for Unsupervised Machine Translation. (arXiv:2312.00912v1 [cs.CL])

Title: Physics Inspired Criterion for Pruning-Quantization Joint Learning. (arXiv:2312.00851v1 [cs.LG])

Title: Improving Normative Modeling for Multi-modal Neuroimaging Data using mixture-of-product-of-experts variational autoencoders. (arXiv:2312.00992v1 [cs.LG])

chat

Title: A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?. (arXiv:2312.00798v1 [cs.AI])

retrieval augmented generation

rag

Title: Extreme Event Prediction with Multi-agent Reinforcement Learning-based Parametrization of Atmospheric and Oceanic Turbulence. (arXiv:2312.00907v1 [cs.LG])

multi-run

chain-of-thought

tree-of-thought