2024-02-19

Title: Correlational Lagrangian Schrödinger Bridge: Learning Dynamics with Population-Level Regularization

Authors: Yuning You, Ruida Zhou, Yang Shen
Subjects: cs.LG, stat.ML
Abstract URL: https://arxiv.org/abs/2402.10227
Pdf URL: https://arxiv.org/pdf/2402.10227
Copy Paste: [[2402.10227]] Correlational Lagrangian Schrödinger Bridge: Learning Dynamics with Population-Level Regularization(https://arxiv.org/abs/2402.10227)
Keywords: generative
Abstract: Accurate modeling of system dynamics holds intriguing potential in broad scientific fields including cytodynamics and fluid mechanics. This task often presents significant challenges when (i) observations are limited to cross-sectional samples (where individual trajectories are inaccessible for learning), and moreover, (ii) the behaviors of individual particles are heterogeneous (especially in biological systems due to biodiversity). To address them, we introduce a novel framework dubbed correlational Lagrangian Schr\"odinger bridge (CLSB), aiming to seek for the evolution "bridging" among cross-sectional observations, while regularized for the minimal population "cost". In contrast to prior methods relying on \textit{individual}-level regularizers for all particles \textit{homogeneously} (e.g. restraining individual motions), CLSB operates at the population level admitting the heterogeneity nature, resulting in a more generalizable modeling in practice. To this end, our contributions include (1) a new class of population regularizers capturing the temporal variations in multivariate relations, with the tractable formulation derived, (2) three domain-informed instantiations based on genetic co-expression stability, and (3) an integration of population regularizers into data-driven generative models as constrained optimization, and a numerical solution, with further extension to conditional generative models. Empirically, we demonstrate the superiority of CLSB in single-cell sequencing data analyses such as simulating cell development over time and predicting cellular responses to drugs of varied doses.

Title: A Dynamical View of the Question of Why

Authors: Mehdi Fatemi, Sindhu Gowda
Subjects: cs.LG, cs.AI, eess.SY
Abstract URL: https://arxiv.org/abs/2402.10240
Pdf URL: https://arxiv.org/pdf/2402.10240
Copy Paste: [[2402.10240]] A Dynamical View of the Question of Why(https://arxiv.org/abs/2402.10240)
Keywords: diffusion
Abstract: We address causal reasoning in multivariate time series data generated by stochastic processes. Existing approaches are largely restricted to static settings, ignoring the continuity and emission of variations across time. In contrast, we propose a learning paradigm that directly establishes causation between events in the course of time. We present two key lemmas to compute causal contributions and frame them as reinforcement learning problems. Our approach offers formal and computational tools for uncovering and quantifying causal relationships in diffusion processes, subsuming various important settings such as discrete-time Markov decision processes. Finally, in fairly intricate experiments and through sheer learning, our framework reveals and quantifies causal links, which otherwise seem inexplicable.

Title: GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

Authors: Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
Subjects: cs.CV, cs.GR
Abstract URL: https://arxiv.org/abs/2402.10259
Pdf URL: https://arxiv.org/pdf/2402.10259
Copy Paste: [[2402.10259]] GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting(https://arxiv.org/abs/2402.10259)
Keywords: diffusion
Abstract: Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods.

Title: Backdoor Attack against One-Class Sequential Anomaly Detection Models

Authors: He Cheng, Shuhan Yuan
Subjects: cs.LG, cs.AI, cs.CR, cs.IT
Abstract URL: https://arxiv.org/abs/2402.10283
Pdf URL: https://arxiv.org/pdf/2402.10283
Copy Paste: [[2402.10283]] Backdoor Attack against One-Class Sequential Anomaly Detection Models(https://arxiv.org/abs/2402.10283)
Keywords: anomaly
Abstract: Deep anomaly detection on sequential data has garnered significant attention due to the wide application scenarios. However, deep learning-based models face a critical security threat - their vulnerability to backdoor attacks. In this paper, we explore compromising deep sequential anomaly detection models by proposing a novel backdoor attack strategy. The attack approach comprises two primary steps, trigger generation and backdoor injection. Trigger generation is to derive imperceptible triggers by crafting perturbed samples from the benign normal data, of which the perturbed samples are still normal. The backdoor injection is to properly inject the backdoor triggers to comprise the model only for the samples with triggers. The experimental results demonstrate the effectiveness of our proposed attack strategy by injecting backdoors on two well-established one-class anomaly detection models.

Title: Discrete Probabilistic Inference as Control in Multi-path Environments

Authors: Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua Bengio
Subjects: cs.LG
Abstract URL: https://arxiv.org/abs/2402.10309
Pdf URL: https://arxiv.org/pdf/2402.10309
Copy Paste: [[2402.10309]] Discrete Probabilistic Inference as Control in Multi-path Environments(https://arxiv.org/abs/2402.10309)
Keywords: generative
Abstract: We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.

Title: Interpretable Generative Adversarial Imitation Learning

Authors: Wenliang Liu, Danyang Li, Erfan Aasi, Roberto Tron, Calin Belta
Subjects: cs.LG, eess.SY
Abstract URL: https://arxiv.org/abs/2402.10310
Pdf URL: https://arxiv.org/pdf/2402.10310
Copy Paste: [[2402.10310]] Interpretable Generative Adversarial Imitation Learning(https://arxiv.org/abs/2402.10310)
Keywords: generative
Abstract: Imitation learning methods have demonstrated considerable success in teaching autonomous systems complex tasks through expert demonstrations. However, a limitation of these methods is their lack of interpretability, particularly in understanding the specific task the learning agent aims to accomplish. In this paper, we propose a novel imitation learning method that combines Signal Temporal Logic (STL) inference and control synthesis, enabling the explicit representation of the task as an STL formula. This approach not only provides a clear understanding of the task but also allows for the incorporation of human knowledge and adaptation to new scenarios through manual adjustments of the STL formulae. Additionally, we employ a Generative Adversarial Network (GAN)-inspired training approach for both the inference and the control policy, effectively narrowing the gap between the expert and learned policies. The effectiveness of our algorithm is demonstrated through two case studies, showcasing its practical applicability and adaptability.

Title: Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review

Authors: Jing Su, Chufeng Jiang, Xin Jin, Yuxin Qiao, Tingsong Xiao, Hongda Ma, Rong Wei, Zhi Jing, Jiajun Xu, Junhong Lin
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10350
Pdf URL: https://arxiv.org/pdf/2402.10350
Copy Paste: [[2402.10350]] Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review(https://arxiv.org/abs/2402.10350)
Keywords: anomaly
Abstract: This systematic literature review comprehensively examines the application of Large Language Models (LLMs) in forecasting and anomaly detection, highlighting the current state of research, inherent challenges, and prospective future directions. LLMs have demonstrated significant potential in parsing and analyzing extensive datasets to identify patterns, predict future events, and detect anomalous behavior across various domains. However, this review identifies several critical challenges that impede their broader adoption and effectiveness, including the reliance on vast historical datasets, issues with generalizability across different contexts, the phenomenon of model hallucinations, limitations within the models' knowledge boundaries, and the substantial computational resources required. Through detailed analysis, this review discusses potential solutions and strategies to overcome these obstacles, such as integrating multimodal data, advancements in learning methodologies, and emphasizing model explainability and computational efficiency. Moreover, this review outlines critical trends that are likely to shape the evolution of LLMs in these fields, including the push toward real-time processing, the importance of sustainable modeling practices, and the value of interdisciplinary collaboration. Conclusively, this review underscores the transformative impact LLMs could have on forecasting and anomaly detection while emphasizing the need for continuous innovation, ethical considerations, and practical solutions to realize their full potential.

Title: Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models

Authors: Kang He, Yinghan Long, Kaushik Roy
Subjects: cs.CL, cs.LG
Abstract URL: https://arxiv.org/abs/2402.10353
Pdf URL: https://arxiv.org/pdf/2402.10353
Copy Paste: [[2402.10353]] Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models(https://arxiv.org/abs/2402.10353)
Keywords: in-context
Abstract: Prompt learning is susceptible to intrinsic bias present in pre-trained language models (LMs), resulting in sub-optimal performance of prompt-based zero/few-shot learning. In this work, we propose a null-input prompting method to calibrate intrinsic bias encoded in pre-trained LMs. Different from prior efforts that address intrinsic bias primarily for social fairness and often involve excessive computational cost, our objective is to explore enhancing LMs' performance in downstream zero/few-shot learning while emphasizing the efficiency of intrinsic bias calibration. Specifically, we leverage a diverse set of auto-selected null-meaning inputs generated from GPT-4 to prompt pre-trained LMs for intrinsic bias probing. Utilizing the bias-reflected probability distribution, we formulate a distribution disparity loss for bias calibration, where we exclusively update bias parameters ($0.1\%$ of total parameters) of LMs towards equal probability distribution. Experimental results show that the calibration promotes an equitable starting point for LMs while preserving language modeling abilities. Across a wide range of datasets, including sentiment analysis and topic classification, our method significantly improves zero/few-shot learning performance of LMs for both in-context learning and prompt-based fine-tuning (on average $9\%$ and $2\%$, respectively).

Title: BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

Authors: Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, Richard Dufour
Subjects: cs.CL, cs.AI, cs.LG
Abstract URL: https://arxiv.org/abs/2402.10373
Pdf URL: https://arxiv.org/pdf/2402.10373
Copy Paste: [[2402.10373]] BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains(https://arxiv.org/abs/2402.10373)
Keywords: foundation model
Abstract: Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine. Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges. In this paper, we introduce BioMistral, an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central. We conduct a comprehensive evaluation of BioMistral on a benchmark comprising 10 established medical question-answering (QA) tasks in English. We also explore lightweight models obtained through quantization and model merging approaches. Our results demonstrate BioMistral's superior performance compared to existing open-source medical models and its competitive edge against proprietary counterparts. Finally, to address the limited availability of data beyond English and to assess the multilingual generalization of medical LLMs, we automatically translated and evaluated this benchmark into 7 other languages. This marks the first large-scale multilingual evaluation of LLMs in the medical domain. Datasets, multilingual evaluation benchmarks, scripts, and all the models obtained during our experiments are freely released.

Title: Pretext Training Algorithms for Event Sequence Data

Authors: Yimu Wang, He Zhao, Ruizhi Deng, Frederick Tung, Greg Mori
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10392
Pdf URL: https://arxiv.org/pdf/2402.10392
Copy Paste: [[2402.10392]] Pretext Training Algorithms for Event Sequence Data(https://arxiv.org/abs/2402.10392)
Keywords: self-supervised
Abstract: Pretext training followed by task-specific fine-tuning has been a successful approach in vision and language domains. This paper proposes a self-supervised pretext training framework tailored to event sequence data. We introduce a novel alignment verification task that is specialized to event sequences, building on good practices in masked reconstruction and contrastive learning. Our pretext tasks unlock foundational representations that are generalizable across different down-stream tasks, including next-event prediction for temporal point process models, event sequence classification, and missing event interpolation. Experiments on popular public benchmarks demonstrate the potential of the proposed method across different tasks and data domains.

Title: LogELECTRA: Self-supervised Anomaly Detection for Unstructured Logs

Authors: Yuuki Yamanaka, Tomokatsu Takahashi, Takuya Minami, Yoshiaki Nakajima
Subjects: cs.LG, cs.SE
Abstract URL: https://arxiv.org/abs/2402.10397
Pdf URL: https://arxiv.org/pdf/2402.10397
Copy Paste: [[2402.10397]] LogELECTRA: Self-supervised Anomaly Detection for Unstructured Logs(https://arxiv.org/abs/2402.10397)
Keywords: self-supervised, anomaly
Abstract: System logs are some of the most important information for the maintenance of software systems, which have become larger and more complex in recent years. The goal of log-based anomaly detection is to automatically detect system anomalies by analyzing the large number of logs generated in a short period of time, which is a critical challenge in the real world. Previous studies have used a log parser to extract templates from unstructured log data and detect anomalies on the basis of patterns of the template occurrences. These methods have limitations for logs with unknown templates. Furthermore, since most log anomalies are known to be point anomalies rather than contextual anomalies, detection methods based on occurrence patterns can cause unnecessary delays in detection. In this paper, we propose LogELECTRA, a new log anomaly detection model that analyzes a single line of log messages more deeply on the basis of self-supervised anomaly detection. LogELECTRA specializes in detecting log anomalies as point anomalies by applying ELECTRA, a natural language processing model, to analyze the semantics of a single line of log messages. LogELECTRA outperformed existing state-of-the-art methods in experiments on the public benchmark log datasets BGL, Sprit, and Thunderbird.

Title: ManiFPT: Defining and Analyzing Fingerprints of Generative Models

Authors: Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed
Subjects: cs.LG, cs.CV
Abstract URL: https://arxiv.org/abs/2402.10401
Pdf URL: https://arxiv.org/pdf/2402.10401
Copy Paste: [[2402.10401]] ManiFPT: Defining and Analyzing Fingerprints of Generative Models(https://arxiv.org/abs/2402.10401)
Keywords: generative
Abstract: Recent works have shown that generative models leave traces of their underlying generative process on the generated samples, broadly referred to as fingerprints of a generative model, and have studied their utility in detecting synthetic images from real ones. However, the extend to which these fingerprints can distinguish between various types of synthetic image and help identify the underlying generative process remain under-explored. In particular, the very definition of a fingerprint remains unclear, to our knowledge. To that end, in this work, we formalize the definition of artifact and fingerprint in generative models, propose an algorithm for computing them in practice, and finally study its effectiveness in distinguishing a large array of different generative models. We find that using our proposed definition can significantly improve the performance on the task of identifying the underlying generative process from samples (model attribution) compared to existing methods. Additionally, we study the structure of the fingerprints, and observe that it is very predictive of the effect of different design choices on the generative process.

Title: Explaining generative diffusion models via visual analysis for interpretable decision-making process

Authors: Ji-Hoon Park, Yeong-Joon Ju, Seong-Whan Lee
Subjects: cs.CV, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10404
Pdf URL: https://arxiv.org/pdf/2402.10404
Copy Paste: [[2402.10404]] Explaining generative diffusion models via visual analysis for interpretable decision-making process(https://arxiv.org/abs/2402.10404)
Keywords: diffusion, generative
Abstract: Diffusion models have demonstrated remarkable performance in generation tasks. Nevertheless, explaining the diffusion process remains challenging due to it being a sequence of denoising noisy images that are difficult for experts to interpret. To address this issue, we propose the three research questions to interpret the diffusion process from the perspective of the visual concepts generated by the model and the region where the model attends in each time step. We devise tools for visualizing the diffusion process and answering the aforementioned research questions to render the diffusion process human-understandable. We show how the output is progressively generated in the diffusion process by explaining the level of denoising and highlighting relationships to foundational visual concepts at each time step through the results of experiments with various visual analyses using the tools. Throughout the training of the diffusion model, the model learns diverse visual concepts corresponding to each time-step, enabling the model to predict varying levels of visual concepts at different stages. We substantiate our tools using Area Under Cover (AUC) score, correlation quantification, and cross-attention mapping. Our findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms.

Title: Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

Authors: Jiaheng Wei, Yuanshun Yao, Jean-Francois Ton, Hongyi Guo, Andrew Estornell, Yang Liu
Subjects: cs.CL, cs.AI, cs.LG
Abstract URL: https://arxiv.org/abs/2402.10412
Pdf URL: https://arxiv.org/pdf/2402.10412
Copy Paste: [[2402.10412]] Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting(https://arxiv.org/abs/2402.10412)
Keywords: in-context
Abstract: LLM hallucination, i.e. generating factually incorrect yet seemingly convincing answers, is currently a major threat to the trustworthiness and reliability of LLMs. The first step towards solving this complicated problem is to measure it. However, existing hallucination metrics require to have a benchmark dataset with gold-standard answers, i.e. "best" or "correct" answers written by humans. Such requirement makes hallucination measurement costly and prone to human errors. In this work, we propose Factualness Evaluations via Weighting LLMs (FEWL), the first hallucination metric that is specifically designed for the scenario when gold-standard answers are absent. FEWL leverages the answers from off-the-shelf LLMs that serve as a proxy of gold-standard answers. The key challenge is how to quantify the expertise of reference LLMs resourcefully. We show FEWL has certain theoretical guarantees and demonstrate empirically it gives more accurate hallucination measures than naively using reference LLMs. We also show how to leverage FEWL to reduce hallucination through both in-context learning and supervised finetuning. Last, we build a large-scale benchmark dataset to facilitate LLM hallucination research.

Title: Understanding In-Context Learning with a Pelican Soup Framework

Authors: Ting-Rui Chiang, Dani Yogatama
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10424
Pdf URL: https://arxiv.org/pdf/2402.10424
Copy Paste: [[2402.10424]] Understanding In-Context Learning with a Pelican Soup Framework(https://arxiv.org/abs/2402.10424)
Keywords: in-context
Abstract: Many existing theoretical analyses of in-context learning for natural language processing are based on latent variable models that leaves gaps between theory and practice. We aim to close these gaps by proposing a theoretical framework, the Pelican Soup Framework. In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classification tasks, and the notion of (3) meaning association. Under this framework, we can establish a $\mathcal{O}(1/T)$ loss bound for in-context learning, where $T$ is the number of example-label pairs in the demonstration. Compared with previous works, our bound reflects the effect of the choice of verbalizers and the effect of instruction tuning. An additional notion of \textit{atom concepts} makes our framework possible to explain the generalization to tasks unseen in the language model training data. Finally, we propose a toy setup, Calcutec, and a digit addition task that mimics types of distribution shifts a model needs to overcome to perform in-context learning. We also experiment with GPT2-Large on real-world NLP tasks. Our empirical results demonstrate the efficacy of our framework to explain in-context learning.

Title: Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Authors: Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10466
Pdf URL: https://arxiv.org/pdf/2402.10466
Copy Paste: [[2402.10466]] Large Language Models as Zero-shot Dialogue State Tracker through Function Calling(https://arxiv.org/abs/2402.10466)
Keywords: generative, in-context
Abstract: Large language models (LLMs) are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% Avg. JGA. Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We plan to open-source experimental code and model.

Title: Understanding Likelihood of Normalizing Flow and Image Complexity through the Lens of Out-of-Distribution Detection

Authors: Genki Osada, Tsubasa Takahashi, Takashi Nishide
Subjects: cs.LG
Abstract URL: https://arxiv.org/abs/2402.10477
Pdf URL: https://arxiv.org/pdf/2402.10477
Copy Paste: [[2402.10477]] Understanding Likelihood of Normalizing Flow and Image Complexity through the Lens of Out-of-Distribution Detection(https://arxiv.org/abs/2402.10477)
Keywords: generative
Abstract: Out-of-distribution (OOD) detection is crucial to safety-critical machine learning applications and has been extensively studied. While recent studies have predominantly focused on classifier-based methods, research on deep generative model (DGM)-based methods have lagged relatively. This disparity may be attributed to a perplexing phenomenon: DGMs often assign higher likelihoods to unknown OOD inputs than to their known training data. This paper focuses on explaining the underlying mechanism of this phenomenon. We propose a hypothesis that less complex images concentrate in high-density regions in the latent space, resulting in a higher likelihood assignment in the Normalizing Flow (NF). We experimentally demonstrate its validity for five NF architectures, concluding that their likelihood is untrustworthy. Additionally, we show that this problem can be alleviated by treating image complexity as an independent variable. Finally, we provide evidence of the potential applicability of our hypothesis in another DGM, PixelCNN++.

Title: Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Authors: Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen
Subjects: cs.CV
Abstract URL: https://arxiv.org/abs/2402.10491
Pdf URL: https://arxiv.org/pdf/2402.10491
Copy Paste: [[2402.10491]] Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation(https://arxiv.org/abs/2402.10491)
Keywords: diffusion
Abstract: Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data. Adapting large pre-trained diffusion models for higher resolution demands substantial computational and optimization resources, yet achieving a generation capability comparable to low-resolution models remains elusive. This paper proposes a novel self-cascade diffusion model that leverages the rich knowledge gained from a well-trained low-resolution model for rapid adaptation to higher-resolution image and video generation, employing either tuning-free or cheap upsampler tuning paradigms. Integrating a sequence of multi-scale upsampler modules, the self-cascade diffusion model can efficiently adapt to a higher resolution, preserving the original composition and generation capabilities. We further propose a pivot-guided noise re-schedule strategy to speed up the inference process and improve local structural details. Compared to full fine-tuning, our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters. Extensive experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.

Title: Provably Sample Efficient RLHF via Active Preference Optimization

Authors: Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury
Subjects: cs.LG, cs.AI, cs.CL
Abstract URL: https://arxiv.org/abs/2402.10500
Pdf URL: https://arxiv.org/pdf/2402.10500
Copy Paste: [[2402.10500]] Provably Sample Efficient RLHF via Active Preference Optimization(https://arxiv.org/abs/2402.10500)
Keywords: generative
Abstract: Reinforcement Learning from Human Feedback (RLHF) is pivotal in aligning Large Language Models (LLMs) with human preferences. While these aligned generative models have demonstrated impressive capabilities across various tasks, the dependence on high-quality human preference data poses a costly bottleneck in practical implementation of RLHF. Hence better and adaptive strategies for data collection is needed. To this end, we frame RLHF as a contextual preference bandit problem with prompts as contexts and show that the naive way of collecting preference data by choosing prompts uniformly at random leads to a policy that suffers an $\Omega(1)$ suboptimality gap in rewards. Then we propose $\textit{Active Preference Optimization}$ ($\texttt{APO}$), an algorithm that actively selects prompts to collect preference data. Under the Bradley-Terry-Luce (BTL) preference model, \texttt{APO} achieves sample efficiency without compromising on policy performance. We show that given a sample budget of $T$, the suboptimality gap of a policy learned via $\texttt{APO}$ scales as $O(1/\sqrt{T})$. Next, we propose a compute-efficient batch version of $\texttt{APO}$ with minor modification and evaluate its performance in practice. Experimental evaluations on a human preference dataset validate \texttt{APO}'s efficacy as a sample-efficient and practical solution to data collection for RLHF, facilitating alignment of LLMs with human preferences in a cost-effective and scalable manner.

Title: LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty

Authors: Zhen Zhang, Yuhua Zhao, Hang Gao, Mengting Hu
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.10573
Pdf URL: https://arxiv.org/pdf/2402.10573
Copy Paste: [[2402.10573]] LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty(https://arxiv.org/abs/2402.10573)
Keywords: in-context
Abstract: Named Entity Recognition (NER) serves as a fundamental task in natural language understanding, bearing direct implications for web content analysis, search engines, and information retrieval systems. Fine-tuned NER models exhibit satisfactory performance on standard NER benchmarks. However, due to limited fine-tuning data and lack of knowledge, it performs poorly on unseen entity recognition. As a result, the usability and reliability of NER models in web-related applications are compromised. Instead, Large Language Models (LLMs) like GPT-4 possess extensive external knowledge, but research indicates that they lack specialty for NER tasks. Furthermore, non-public and large-scale weights make tuning LLMs difficult. To address these challenges, we propose a framework that combines small fine-tuned models with LLMs (LinkNER) and an uncertainty-based linking strategy called RDC that enables fine-tuned models to complement black-box LLMs, achieving better performance. We experiment with both standard NER test sets and noisy social media datasets. LinkNER enhances NER task performance, notably surpassing SOTA models in robustness tests. We also quantitatively analyze the influence of key components like uncertainty estimation methods, LLMs, and in-context learning on diverse NER tasks, offering specific web-related recommendations.

Title: Symbolic Autoencoding for Self-Supervised Sequence Learning

Authors: Mohammad Hossein Amani, Nicolas Mario Baldwin, Amin Mansouri, Martin Josifoski, Maxime Peyrard, Robert West
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10575
Pdf URL: https://arxiv.org/pdf/2402.10575
Copy Paste: [[2402.10575]] Symbolic Autoencoding for Self-Supervised Sequence Learning(https://arxiv.org/abs/2402.10575)
Keywords: self-supervised, generative
Abstract: Traditional language models, adept at next-token prediction in text sequences, often struggle with transduction tasks between distinct symbolic systems, particularly when parallel data is scarce. Addressing this issue, we introduce \textit{symbolic autoencoding} ($\Sigma$AE), a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data. $\Sigma$AE connects two generative models via a discrete bottleneck layer and is optimized end-to-end by minimizing reconstruction loss (simultaneously with supervised loss for the parallel data), such that the sequence generated by the discrete bottleneck can be read out as the transduced input sequence. We also develop gradient-based methods allowing for efficient self-supervised sequence learning despite the discreteness of the bottleneck. Our results demonstrate that $\Sigma$AE significantly enhances performance on transduction tasks, even with minimal parallel data, offering a promising solution for weakly supervised learning scenarios.

Title: PEGASUS: Personalized Generative 3D Avatars with Composable Attributes

Authors: Hyunsoo Cha, Byungjun Kim, Hanbyul Joo
Subjects: cs.CV
Abstract URL: https://arxiv.org/abs/2402.10636
Pdf URL: https://arxiv.org/pdf/2402.10636
Copy Paste: [[2402.10636]] PEGASUS: Personalized Generative 3D Avatars with Composable Attributes(https://arxiv.org/abs/2402.10636)
Keywords: generative
Abstract: We present, PEGASUS, a method for constructing personalized generative 3D face avatars from monocular video sources. As a compositional generative model, our model enables disentangled controls to selectively alter the facial attributes (e.g., hair or nose) of the target individual, while preserving the identity. We present two key approaches to achieve this goal. First, we present a method to construct a person-specific generative 3D avatar by building a synthetic video collection of the target identity with varying facial attributes, where the videos are synthesized by borrowing parts from diverse individuals from other monocular videos. Through several experiments, we demonstrate the superior performance of our approach by generating unseen attributes with high realism. Subsequently, we introduce a zero-shot approach to achieve the same generative modeling more efficiently by leveraging a previously constructed personalized generative model.

Title: Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Authors: Yaroslav Aksenov, Nikita Balagansky, Sofia Maria Lo Cicero Vaina, Boris Shaposhnikov, Alexey Gorbatovski, Daniil Gavrilov
Subjects: cs.LG
Abstract URL: https://arxiv.org/abs/2402.10644
Pdf URL: https://arxiv.org/pdf/2402.10644
Copy Paste: [[2402.10644]] Linear Transformers with Learnable Kernel Functions are Better In-Context Models(https://arxiv.org/abs/2402.10644)
Keywords: in-context
Abstract: Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.

Title: Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL

Authors: Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.10663
Pdf URL: https://arxiv.org/pdf/2402.10663
Copy Paste: [[2402.10663]] Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL(https://arxiv.org/abs/2402.10663)
Keywords: in-context
Abstract: Currently, the in-context learning method based on large language models (LLMs) has become the mainstream of text-to-SQL research. Previous works have discussed how to select demonstrations related to the user question from a human-labeled demonstration pool. However, human labeling suffers from the limitations of insufficient diversity and high labeling overhead. Therefore, in this paper, we discuss how to measure and improve the diversity of the demonstrations for text-to-SQL. We present a metric to measure the diversity of the demonstrations and analyze the insufficient of the existing labeled data by experiments. Based on the above discovery, we propose fusing iteratively for demonstrations (Fused) to build a high-diversity demonstration pool through human-free multiple-iteration synthesis, improving diversity and lowering label cost. Our method achieves an average improvement of 3.2% and 5.0% with and without human labeling on several mainstream datasets, which proves the effectiveness of Fused.

Title: OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

Authors: Yuxuan Kuang, Hai Lin, Meng Jiang
Subjects: cs.CL, cs.RO
Abstract URL: https://arxiv.org/abs/2402.10670
Pdf URL: https://arxiv.org/pdf/2402.10670
Copy Paste: [[2402.10670]] OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models(https://arxiv.org/abs/2402.10670)
Keywords: foundation model
Abstract: Object navigation (ObjectNav) requires an agent to navigate through unseen environments to find queried objects. Many previous methods attempted to solve this task by relying on supervised or reinforcement learning, where they are trained on limited household datasets with close-set objects. However, two key challenges are unsolved: understanding free-form natural language instructions that demand open-set objects, and generalizing to new environments in a zero-shot manner. Aiming to solve the two challenges, in this paper, we propose OpenFMNav, an Open-set Foundation Model based framework for zero-shot object Navigation. We first unleash the reasoning abilities of large language models (LLMs) to extract proposed objects from natural language instructions that meet the user's demand. We then leverage the generalizability of large vision language models (VLMs) to actively discover and detect candidate objects from the scene, building a Versatile Semantic Score Map (VSSM). Then, by conducting common sense reasoning on VSSM, our method can perform effective language-guided exploration and exploitation of the scene and finally reach the goal. By leveraging the reasoning and generalizing abilities of foundation models, our method can understand free-form human instructions and perform effective open-set zero-shot navigation in diverse environments. Extensive experiments on the HM3D ObjectNav benchmark show that our method surpasses all the strong baselines on all metrics, proving our method's effectiveness. Furthermore, we perform real robot demonstrations to validate our method's open-set-ness and generalizability to real-world environments.

Title: Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm

Authors: Yuanzhen Xie, Xinzhou Jin, Tao Xie, MingXiong Lin, Liang Chen, Chenyun Yu, Lei Cheng, ChengXiang Zhuo, Bo Hu, Zang Li
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.10671
Pdf URL: https://arxiv.org/pdf/2402.10671
Copy Paste: [[2402.10671]] Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm(https://arxiv.org/abs/2402.10671)
Keywords: diffusion, in-context
Abstract: In-context learning of large-language models (LLMs) has achieved remarkable success in the field of natural language processing, while extensive case studies reveal that the single-step chain-of-thought prompting approach faces challenges such as attention diffusion and inadequate performance in complex tasks like text-to-SQL. To improve the contextual learning capabilities of LLMs in text-to-SQL, a workflow paradigm method is proposed, aiming to enhance the attention and problem-solving scope of LLMs through decomposition. Specifically, the information determination module for eliminating redundant information and the brand-new prompt structure based on problem classification greatly enhance the model's attention. Additionally, the inclusion of self-correcting and active learning modules greatly expands the problem-solving scope of LLMs, hence improving the upper limit of LLM-based approaches. Extensive experiments conducted on three datasets demonstrate that our approach outperforms other methods by a significant margin. About 2-3 percentage point improvements compared to the existing baseline on the Spider Dev and Spider-Realistic datasets and new SOTA results on the Spider Test dataset are achieved. Our code is available on GitHub: \url{https://github.com/FlyingFeather/DEA-SQL}.

Title: German Text Simplification: Finetuning Large Language Models with Semi-Synthetic Data

Authors: Lars Klöser, Mika Beele, Jan-Niklas Schagen, Bodo Kraft
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.10675
Pdf URL: https://arxiv.org/pdf/2402.10675
Copy Paste: [[2402.10675]] German Text Simplification: Finetuning Large Language Models with Semi-Synthetic Data(https://arxiv.org/abs/2402.10675)
Keywords: generative
Abstract: This study pioneers the use of synthetically generated data for training generative models in document-level text simplification of German texts. We demonstrate the effectiveness of our approach with real-world online texts. Addressing the challenge of data scarcity in language simplification, we crawled professionally simplified German texts and synthesized a corpus using GPT-4. We finetune Large Language Models with up to 13 billion parameters on this data and evaluate their performance. This paper employs various methodologies for evaluation and demonstrates the limitations of currently used rule-based metrics. Both automatic and manual evaluations reveal that our models can significantly simplify real-world online texts, indicating the potential of synthetic data in improving text simplification.

Title: Multi-Cultural Commonsense Knowledge Distillation

Authors: Tuan-Phong Nguyen, Simon Razniewski, Gerhard Weikum
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.10689
Pdf URL: https://arxiv.org/pdf/2402.10689
Copy Paste: [[2402.10689]] Multi-Cultural Commonsense Knowledge Distillation(https://arxiv.org/abs/2402.10689)
Keywords: generative
Abstract: Despite recent progress, large language models (LLMs) still face the challenge of appropriately reacting to the intricacies of social and cultural conventions. This paper presents MANGO, a methodology for distilling high-accuracy, high-recall assertions of cultural knowledge. We judiciously and iteratively prompt LLMs for this purpose from two entry points, concepts and cultures. Outputs are consolidated via clustering and generative summarization. Running the MANGO method with GPT-3.5 as underlying LLM yields 167K high-accuracy assertions for 30K concepts and 11K cultures, surpassing prior resources by a large margin. For extrinsic evaluation, we explore augmenting dialogue systems with cultural knowledge assertions. We find that adding knowledge from MANGO improves the overall quality, specificity, and cultural sensitivity of dialogue responses, as judged by human annotators. Data and code are available for download.

Title: Rethinking Human-like Translation Strategy: Integrating Drift-Diffusion Model with Large Language Models for Machine Translation

Authors: Hongbin Na, Zimu Wang, Mieradilijiang Maimaiti, Tong Chen, Wei Wang, Tao Shen, Ling Chen
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.10699
Pdf URL: https://arxiv.org/pdf/2402.10699
Copy Paste: [[2402.10699]] Rethinking Human-like Translation Strategy: Integrating Drift-Diffusion Model with Large Language Models for Machine Translation(https://arxiv.org/abs/2402.10699)
Keywords: diffusion
Abstract: Large language models (LLMs) have demonstrated promising potential in various downstream tasks, including machine translation. However, prior work on LLM-based machine translation has mainly focused on better utilizing training data, demonstrations, or pre-defined and universal knowledge to improve performance, with a lack of consideration of decision-making like human translators. In this paper, we incorporate Thinker with the Drift-Diffusion Model (Thinker-DDM) to address this issue. We then redefine the Drift-Diffusion process to emulate human translators' dynamic decision-making under constrained resources. We conduct extensive experiments under the high-resource, low-resource, and commonsense translation settings using the WMT22 and CommonMT datasets, in which Thinker-DDM outperforms baselines in the first two scenarios. We also perform additional analysis and evaluation on commonsense translation to illustrate the high effectiveness and efficacy of the proposed method.

Title: An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference

Authors: Atsuki Yamaguchi, Aline Villavicencio, Nikolaos Aletras
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10712
Pdf URL: https://arxiv.org/pdf/2402.10712
Copy Paste: [[2402.10712]] An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference(https://arxiv.org/abs/2402.10712)
Keywords: generative
Abstract: The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of various cross-lingual vocabulary adaptation methods on five generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that cross-lingual vocabulary adaptation substantially contributes to LLM inference speedups of up to 271.5%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.

Title: BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion

Authors: Raktim Kumar Mondol, Ewan K.A. Millar, Arcot Sowmya, Erik Meijering
Subjects: cs.CV, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10717
Pdf URL: https://arxiv.org/pdf/2402.10717
Copy Paste: [[2402.10717]] BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data Fusion(https://arxiv.org/abs/2402.10717)
Keywords: self-supervised
Abstract: Breast cancer is a significant health concern affecting millions of women worldwide. Accurate survival risk stratification plays a crucial role in guiding personalised treatment decisions and improving patient outcomes. Here we present BioFusionNet, a deep learning framework that fuses image-derived features with genetic and clinical data to achieve a holistic patient profile and perform survival risk stratification of ER+ breast cancer patients. We employ multiple self-supervised feature extractors, namely DINO and MoCoV3, pretrained on histopathology patches to capture detailed histopathological image features. We then utilise a variational autoencoder (VAE) to fuse these features, and harness the latent space of the VAE to feed into a self-attention network, generating patient-level features. Next, we develop a co-dual-cross-attention mechanism to combine the histopathological features with genetic data, enabling the model to capture the interplay between them. Additionally, clinical data is incorporated using a feed-forward network (FFN), further enhancing predictive performance and achieving comprehensive multimodal feature integration. Furthermore, we introduce a weighted Cox loss function, specifically designed to handle imbalanced survival data, which is a common challenge in the field. The proposed model achieves a mean concordance index (C-index) of 0.77 and a time-dependent area under the curve (AUC) of 0.84, outperforming state-of-the-art methods. It predicts risk (high versus low) with prognostic significance for overall survival (OS) in univariate analysis (HR=2.99, 95% CI: 1.88--4.78, p<0.005), and maintains independent significance in multivariate analysis incorporating standard clinicopathological variables (HR=2.91, 95% CI: 1.80--4.68, p<0.005). The proposed method not only improves model performance but also addresses a critical gap in handling imbalanced data.

Title: Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning

Authors: Yinpeng Liu, Jiawei Liu, Xiang Shi, Qikai Cheng, Wei Lu
Subjects: cs.CL
Abstract URL: https://arxiv.org/abs/2402.10738
Pdf URL: https://arxiv.org/pdf/2402.10738
Copy Paste: [[2402.10738]] Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning(https://arxiv.org/abs/2402.10738)
Keywords: in-context
Abstract: Demonstration ordering, which is an important strategy for in-context learning (ICL), can significantly affects the performance of large language models (LLMs). However, most of the current approaches of ordering require additional knowledge and similarity calculation. We advocate the few-shot in-context curriculum learning (ICCL), a simple but effective demonstration ordering method for ICL, which implies gradually increasing the complexity of prompt demonstrations during the inference process. Then we design three experiments to discuss the effectiveness of ICCL, the formation mechanism of LLM's ICCL capability, and the impact of ordering subjects. Experimental results demonstrate that ICCL, developed during the instruction-tuning stage, is effective for open-source LLMs. Moreover, LLMs exhibit a weaker capacity compared to humans in discerning the difficulty levels of demonstrations. We release our code at https://github.com/61peng/curri_learning.

Title: GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models

Authors: Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, Jiawei Han
Subjects: cs.CL, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10744
Pdf URL: https://arxiv.org/pdf/2402.10744
Copy Paste: [[2402.10744]] GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models(https://arxiv.org/abs/2402.10744)
Keywords: generative
Abstract: The field of relation extraction (RE) is experiencing a notable shift towards generative relation extraction (GRE), leveraging the capabilities of large language models (LLMs). However, we discovered that traditional relation extraction (RE) metrics like precision and recall fall short in evaluating GRE methods. This shortfall arises because these metrics rely on exact matching with human-annotated reference relations, while GRE methods often produce diverse and semantically accurate relations that differ from the references. To fill this gap, we introduce GenRES for a multi-dimensional assessment in terms of the topic similarity, uniqueness, granularity, factualness, and completeness of the GRE results. With GenRES, we empirically identified that (1) precision/recall fails to justify the performance of GRE methods; (2) human-annotated referential relations can be incomplete; (3) prompting LLMs with a fixed set of relations or entities can cause hallucinations. Next, we conducted a human evaluation of GRE methods that shows GenRES is consistent with human preferences for RE quality. Last, we made a comprehensive evaluation of fourteen leading LLMs using GenRES across document, bag, and sentence level RE datasets, respectively, to set the benchmark for future research in GRE

Title: Distillation Enhanced Generative Retrieval

Authors: Yongqi Li, Zhen Zhang, Wenjie Wang, Liqiang Nie, Wenjie Li, Tat-Seng Chua
Subjects: cs.CL, cs.AI, cs.IR
Abstract URL: https://arxiv.org/abs/2402.10769
Pdf URL: https://arxiv.org/pdf/2402.10769
Copy Paste: [[2402.10769]] Distillation Enhanced Generative Retrieval(https://arxiv.org/abs/2402.10769)
Keywords: generative
Abstract: Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generative language models, distinct from traditional sparse or dense retrieval methods. In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR. DGR utilizes sophisticated ranking models, such as the cross-encoder, in a teacher role to supply a passage rank list, which captures the varying relevance degrees of passages instead of binary hard labels; subsequently, DGR employs a specially designed distilled RankNet loss to optimize the generative retrieval model, considering the passage rank order provided by the teacher model as labels. This framework only requires an additional distillation step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods. Additionally, DGR demonstrates exceptional robustness and generalizability with various teacher models and distillation losses.

Title: In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Authors: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev
Subjects: cs.CL, cs.AI, cs.LG
Abstract URL: https://arxiv.org/abs/2402.10790
Pdf URL: https://arxiv.org/pdf/2402.10790
Copy Paste: [[2402.10790]] In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss(https://arxiv.org/abs/2402.10790)
Keywords: generative
Abstract: This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to $10^7$ elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.

Title: VATr++: Choose Your Words Wisely for Handwritten Text Generation

Authors: Bram Vanherle, Vittorio Pippi, Silvia Cascianelli, Nick Michiels, Frank Van Reeth, Rita Cucchiara
Subjects: cs.CV, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10798
Pdf URL: https://arxiv.org/pdf/2402.10798
Copy Paste: [[2402.10798]] VATr++: Choose Your Words Wisely for Handwritten Text Generation(https://arxiv.org/abs/2402.10798)
Keywords: diffusion
Abstract: Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.

Title: TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection Models

Authors: Haotian Si, Changhua Pei, Hang Cui, Jingwen Yang, Yongqian Sun, Shenglin Zhang, Jingjing Li, Haiming Zhang, Jing Han, Dan Pei, Jianhui Li, Gaogang Xie
Subjects: cs.LG
Abstract URL: https://arxiv.org/abs/2402.10802
Pdf URL: https://arxiv.org/pdf/2402.10802
Copy Paste: [[2402.10802]] TimeSeriesBench: An Industrial-Grade Benchmark for Time Series Anomaly Detection Models(https://arxiv.org/abs/2402.10802)
Keywords: anomaly
Abstract: Driven by the proliferation of real-world application scenarios and scales, time series anomaly detection (TSAD) has attracted considerable scholarly and industrial interest. However, existing algorithms exhibit a gap in terms of training paradigm, online detection paradigm, and evaluation criteria when compared to the actual needs of real-world industrial systems. Firstly, current algorithms typically train a specific model for each individual time series. In a large-scale online system with tens of thousands of curves, maintaining such a multitude of models is impractical. The performance of using merely one single unified model to detect anomalies remains unknown. Secondly, most TSAD models are trained on the historical part of a time series and are tested on its future segment. In distributed systems, however, there are frequent system deployments and upgrades, with new, previously unseen time series emerging daily. The performance of testing newly incoming unseen time series on current TSAD algorithms remains unknown. Lastly, although some papers have conducted detailed surveys, the absence of an online evaluation platform prevents answering questions like "Who is the best at anomaly detection at the current stage?" In this paper, we propose TimeSeriesBench, an industrial-grade benchmark that we continuously maintain as a leaderboard. On this leaderboard, we assess the performance of existing algorithms across more than 168 evaluation settings combining different training and testing paradigms, evaluation metrics and datasets. Through our comprehensive analysis of the results, we provide recommendations for the future design of anomaly detection algorithms. To address known issues with existing public datasets, we release an industrial dataset to the public together with TimeSeriesBench. All code, data, and the online leaderboard have been made publicly available.

Title: Training Class-Imbalanced Diffusion Model Via Overlap Optimization

Authors: Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang
Subjects: cs.CV
Abstract URL: https://arxiv.org/abs/2402.10821
Pdf URL: https://arxiv.org/pdf/2402.10821
Copy Paste: [[2402.10821]] Training Class-Imbalanced Diffusion Model Via Overlap Optimization(https://arxiv.org/abs/2402.10821)
Keywords: diffusion, generative
Abstract: Diffusion models have made significant advances recently in high-quality image synthesis and related tasks. However, diffusion models trained on real-world datasets, which often follow long-tailed distributions, yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. To address the observed appearance overlap between synthesized images of rare classes and tail classes, we propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes. We show variants of our probabilistic contrastive learning method can be applied to any class conditional diffusion model. We show significant improvement in image synthesis using our loss for multiple datasets with long-tailed distribution. Extensive experimental results demonstrate that the proposed method can effectively handle imbalanced data for diffusion-based generation and classification models. Our code and datasets will be publicly available at https://github.com/yanliang3612/DiffROP.

Title: Enhancement-Driven Pretraining for Robust Fingerprint Representation Learning

Authors: Ekta Gavas, Kaustubh Olpadkar, Anoop Namboodiri
Subjects: cs.CV
Abstract URL: https://arxiv.org/abs/2402.10847
Pdf URL: https://arxiv.org/pdf/2402.10847
Copy Paste: [[2402.10847]] Enhancement-Driven Pretraining for Robust Fingerprint Representation Learning(https://arxiv.org/abs/2402.10847)
Keywords: self-supervised
Abstract: Fingerprint recognition stands as a pivotal component of biometric technology, with diverse applications from identity verification to advanced search tools. In this paper, we propose a unique method for deriving robust fingerprint representations by leveraging enhancement-based pre-training. Building on the achievements of U-Net-based fingerprint enhancement, our method employs a specialized encoder to derive representations from fingerprint images in a self-supervised manner. We further refine these representations, aiming to enhance the verification capabilities. Our experimental results, tested on publicly available fingerprint datasets, reveal a marked improvement in verification performance against established self-supervised training techniques. Our findings not only highlight the effectiveness of our method but also pave the way for potential advancements. Crucially, our research indicates that it is feasible to extract meaningful fingerprint representations from degraded images without relying on enhanced samples.

Title: Control Color: Multimodal Diffusion-based Interactive Image Colorization

Authors: Zhexin Liang, Zhaochen Li, Shangchen Zhou, Chongyi Li, Chen Change Loy
Subjects: cs.CV
Abstract URL: https://arxiv.org/abs/2402.10855
Pdf URL: https://arxiv.org/pdf/2402.10855
Copy Paste: [[2402.10855]] Control Color: Multimodal Diffusion-based Interactive Image Colorization(https://arxiv.org/abs/2402.10855)
Keywords: diffusion
Abstract: Despite the existence of numerous colorization methods, several limitations still exist, such as lack of user interaction, inflexibility in local colorization, unnatural color rendering, insufficient color variation, and color overflow. To solve these issues, we introduce Control Color (CtrlColor), a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model, offering promising capabilities in highly controllable interactive image colorization. While several diffusion-based methods have been proposed, supporting colorization in multiple modalities remains non-trivial. In this study, we aim to tackle both unconditional and conditional image colorization (text prompts, strokes, exemplars) and address color overflow and incorrect color within a unified framework. Specifically, we present an effective way to encode user strokes to enable precise local color manipulation and employ a practical way to constrain the color distribution similar to exemplars. Apart from accepting text prompts as conditions, these designs add versatility to our approach. We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring. Extensive comparisons show that our model outperforms state-of-the-art image colorization methods both qualitatively and quantitatively.

Title: RLVF: Learning from Verbal Feedback without Overgeneralization

Authors: Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen, Sheryl Hsu, Archit Sharma, Chelsea Finn
Subjects: cs.LG, cs.AI
Abstract URL: https://arxiv.org/abs/2402.10893
Pdf URL: https://arxiv.org/pdf/2402.10893
Copy Paste: [[2402.10893]] RLVF: Learning from Verbal Feedback without Overgeneralization(https://arxiv.org/abs/2402.10893)
Keywords: in-context
Abstract: The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences. A convenient interface to specify such model adjustments is high-level verbal feedback, such as "Don't use emojis when drafting emails to my boss." However, while writing high-level feedback is far simpler than collecting annotations for reinforcement learning from human feedback (RLHF), we find that simply prompting a model with such feedback leads to overgeneralization of the feedback to contexts where it is not relevant. We study the problem of incorporating verbal feedback without such overgeneralization, inspiring a new method Contextualized Critiques with Constrained Preference Optimization (C3PO). C3PO uses a piece of high-level feedback to generate a small synthetic preference dataset specifying how the feedback should (and should not) be applied. It then fine-tunes the model in accordance with the synthetic preference data while minimizing the divergence from the original model for prompts where the feedback does not apply. Our experimental results indicate that our approach effectively applies verbal feedback to relevant scenarios while preserving existing behaviors for other contexts. For both human- and GPT-4-generated high-level feedback, C3PO effectively adheres to the given feedback comparably to in-context baselines while reducing overgeneralization by 30%.

Title: Fusion of Diffusion Weighted MRI and Clinical Data for Predicting Functional Outcome after Acute Ischemic Stroke with Deep Contrastive Learning

Authors: Chia-Ling Tsai, Hui-Yun Su, Shen-Feng Sung, Wei-Yang Lin, Ying-Ying Su, Tzu-Hsien Yang, Man-Lin Mai
Subjects: cs.CV, cs.LG
Abstract URL: https://arxiv.org/abs/2402.10894
Pdf URL: https://arxiv.org/pdf/2402.10894
Copy Paste: [[2402.10894]] Fusion of Diffusion Weighted MRI and Clinical Data for Predicting Functional Outcome after Acute Ischemic Stroke with Deep Contrastive Learning(https://arxiv.org/abs/2402.10894)
Keywords: diffusion
Abstract: Stroke is a common disabling neurological condition that affects about one-quarter of the adult population over age 25; more than half of patients still have poor outcomes, such as permanent functional dependence or even death, after the onset of acute stroke. The aim of this study is to investigate the efficacy of diffusion-weighted MRI modalities combining with structured health profile on predicting the functional outcome to facilitate early intervention. A deep fusion learning network is proposed with two-stage training: the first stage focuses on cross-modality representation learning and the second stage on classification. Supervised contrastive learning is exploited to learn discriminative features that separate the two classes of patients from embeddings of individual modalities and from the fused multimodal embedding. The network takes as the input DWI and ADC images, and structured health profile data. The outcome is the prediction of the patient needing long-term care at 3 months after the onset of stroke. Trained and evaluated with a dataset of 3297 patients, our proposed fusion model achieves 0.87, 0.80 and 80.45% for AUC, F1-score and accuracy, respectively, outperforming existing models that consolidate both imaging and structured data in the medical domain. If trained with comprehensive clinical variables, including NIHSS and comorbidities, the gain from images on making accurate prediction is not considered substantial, but significant. However, diffusion-weighted MRI can replace NIHSS to achieve comparable level of accuracy combining with other readily available clinical variables for better generalization.