2024-01-09

language model

Title: Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online. (arXiv:2401.02974v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02974
Code URL: null
Copy Paste: [[2401.02974]] Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online(http://arxiv.org/abs/2401.02974)
Summary:
This paper examines the efficacy of utilizing large language models (LLMs) to detect public threats posted online. Amid rising concerns over the spread of threatening rhetoric and advance notices of violence, automated content analysis techniques may aid in early identification and moderation. Custom data collection tools were developed to amass post titles from a popular Korean online community, comprising 500 non-threat examples and 20 threats. Various LLMs (GPT-3.5, GPT-4, PaLM) were prompted to classify individual posts as either "threat" or "safe." Statistical analysis found all models demonstrated strong accuracy, passing chi-square goodness of fit tests for both threat and non-threat identification. GPT-4 performed best overall with 97.9% non-threat and 100% threat accuracy. Affordability analysis also showed PaLM API pricing as highly cost-efficient. The findings indicate LLMs can effectively augment human content moderation at scale to help mitigate emerging online risks. However, biases, transparency, and ethical oversight remain vital considerations before real-world implementation.

Title: BIBench: Benchmarking Data Analysis Knowledge of Large Language Models. (arXiv:2401.02982v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02982
Code URL: null
Copy Paste: [[2401.02982]] BIBench: Benchmarking Data Analysis Knowledge of Large Language Models(http://arxiv.org/abs/2401.02982)
Summary:
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. However, their proficiency and reliability in the specialized domain of Data Analysis, particularly with a focus on data-driven thinking, remain uncertain. To bridge this gap, we introduce BIBench, a comprehensive benchmark designed to evaluate the data analysis capabilities of LLMs within the context of Business Intelligence (BI). BIBench assesses LLMs across three dimensions: 1) BI foundational knowledge, evaluating the models' numerical reasoning and familiarity with financial concepts; 2) BI knowledge application, determining the models' ability to quickly comprehend textual information and generate analysis questions from multiple views; and 3) BI technical skills, examining the models' use of technical knowledge to address real-world data analysis challenges. BIBench comprises 11 sub-tasks, spanning three categories of task types: classification, extraction, and generation. Additionally, we've developed BIChat, a domain-specific dataset with over a million data points, to fine-tune LLMs. We will release BIBenchmark, BIChat, and the evaluation scripts at \url{https://github.com/cubenlp/BIBench}. This benchmark aims to provide a measure for in-depth analysis of LLM abilities and foster the advancement of LLMs in the field of data analysis.

Title: Large Language Models in Mental Health Care: a Scoping Review. (arXiv:2401.02984v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02984
Code URL: null
Copy Paste: [[2401.02984]] Large Language Models in Mental Health Care: a Scoping Review(http://arxiv.org/abs/2401.02984)
Summary:
Objective: The growing use of large language models (LLMs) stimulates a need for a comprehensive review of their applications and outcomes in mental health care contexts. This scoping review aims to critically analyze the existing development and applications of LLMs in mental health care, highlighting their successes and identifying their challenges and limitations in these specialized fields. Materials and Methods: A broad literature search was conducted in November 2023 using six databases (PubMed, Web of Science, Google Scholar, arXiv, medRxiv, and PsyArXiv) following the 2020 version of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 313 publications were initially identified, and after applying the study inclusion criteria, 34 publications were selected for the final review. Results: We identified diverse applications of LLMs in mental health care, including diagnosis, therapy, patient engagement enhancement, etc. Key challenges include data availability and reliability, nuanced handling of mental states, and effective evaluation methods. Despite successes in accuracy and accessibility improvement, gaps in clinical applicability and ethical considerations were evident, pointing to the need for robust data, standardized evaluations, and interdisciplinary collaboration. Conclusion: LLMs show promising potential in advancing mental health care, with applications in diagnostics, and patient support. Continued advancements depend on collaborative, multidisciplinary efforts focused on framework enhancement, rigorous dataset development, technological refinement, and ethical integration to ensure the effective and safe application of LLMs in mental health care.

Title: Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education. (arXiv:2401.02985v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02985
Code URL: null
Copy Paste: [[2401.02985]] Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education(http://arxiv.org/abs/2401.02985)
Summary:
The rapid evolution of artificial intelligence (AI), especially in the domain of Large Language Models (LLMs) and generative AI, has opened new avenues for application across various fields, yet its role in business education remains underexplored. This study introduces the first benchmark to assess the performance of seven major LLMs, OpenAI's models (GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo), Google's models (PaLM 2, Gemini 1.0 Pro), and Anthropic's models (Claude 2 and Claude 2.1), on the GMAT, which is a key exam in the admission process for graduate business programs. Our analysis shows that most LLMs outperform human candidates, with GPT-4 Turbo not only outperforming the other models but also surpassing the average scores of graduate students at top business schools. Through a case study, this research examines GPT-4 Turbo's ability to explain answers, evaluate responses, identify errors, tailor instructions, and generate alternative scenarios. The latest LLM versions, GPT-4 Turbo, Claude 2.1, and Gemini 1.0 Pro, show marked improvements in reasoning tasks compared to their predecessors, underscoring their potential for complex problem-solving. While AI's promise in education, assessment, and tutoring is clear, challenges remain. Our study not only sheds light on LLMs' academic potential but also emphasizes the need for careful development and application of AI in education. As AI technology advances, it is imperative to establish frameworks and protocols for AI interaction, verify the accuracy of AI-generated content, ensure worldwide access for diverse learners, and create an educational environment where AI supports human expertise. This research sets the stage for further exploration into the responsible use of AI to enrich educational experiences and improve exam preparation and assessment methods.

Title: Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach. (arXiv:2401.02987v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02987
Code URL: null
Copy Paste: [[2401.02987]] Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach(http://arxiv.org/abs/2401.02987)
Summary:
The emergence of pretrained models has significantly impacted from Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and more effectively. In this study, we explore a novel approach where we leverage the meta features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta features as a metric for evaluating pretrained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and images models.

Title: GLIDE-RL: Grounded Language Instruction through DEmonstration in RL. (arXiv:2401.02991v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02991
Code URL: null
Copy Paste: [[2401.02991]] GLIDE-RL: Grounded Language Instruction through DEmonstration in RL(http://arxiv.org/abs/2401.02991)
Summary:
One of the final frontiers in the development of complex human - AI collaborative systems is the ability of AI agents to comprehend the natural language and perform tasks accordingly. However, training efficient Reinforcement Learning (RL) agents grounded in natural language has been a long-standing challenge due to the complexity and ambiguity of the language and sparsity of the rewards, among other factors. Several advances in reinforcement learning, curriculum learning, continual learning, language models have independently contributed to effective training of grounded agents in various environments. Leveraging these developments, we present a novel algorithm, Grounded Language Instruction through DEmonstration in RL (GLIDE-RL) that introduces a teacher-instructor-student curriculum learning framework for training an RL agent capable of following natural language instructions that can generalize to previously unseen language instructions. In this multi-agent framework, the teacher and the student agents learn simultaneously based on the student's current skill level. We further demonstrate the necessity for training the student agent with not just one, but multiple teacher agents. Experiments on a complex sparse reward environment validates the effectiveness of our proposed approach.

Title: Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced Analysis. (arXiv:2401.02992v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02992
Code URL: null
Copy Paste: [[2401.02992]] Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced Analysis(http://arxiv.org/abs/2401.02992)
Summary:
In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable formats. Our approach significantly advances the existing research by offering high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports. Emphasizing its capability to handle diverse data types, including text, images, and tables, the method adeptly manages the nuances of differing page layouts and report styles across industries. This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment, paving the way for the application of advanced NLP technologies and large language models in the analysis of corporate governance and sustainability. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git.

Title: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion. (arXiv:2401.02993v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02993
Code URL: null
Copy Paste: [[2401.02993]] Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion(http://arxiv.org/abs/2401.02993)
Summary:
Retrieval-based augmentations that aim to incorporate knowledge from an external database into language models have achieved great success in various knowledge-intensive (KI) tasks, such as question-answering and text generation. However, integrating retrievals in non-knowledge-intensive (NKI) tasks, such as text classification, is still challenging. Existing works focus on concatenating retrievals to inputs as context to form the prompt-based inputs. Unfortunately, such methods require language models to have the capability to handle long texts. Besides, inferring such concatenated data would also consume a significant amount of computational resources.

To solve these challenges, we propose \textbf{ReFusion} in this paper, a computation-efficient \textbf{Re}trieval representation \textbf{Fusion} with neural architecture search. The main idea is to directly fuse the retrieval representations into the language models. Specifically, we first propose an online retrieval module that retrieves representations of similar sentences. Then, we present a retrieval fusion module including two effective ranking schemes, i.e., reranker-based scheme and ordered-mask-based scheme, to fuse the retrieval representations with hidden states. Furthermore, we use Neural Architecture Search (NAS) to seek the optimal fusion structure across different layers. Finally, we conduct comprehensive experiments, and the results demonstrate our ReFusion can achieve superior and robust performance on various NKI tasks.

Title: Blar-SQL: Faster, Stronger, Smaller NL2SQL. (arXiv:2401.02997v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02997
Code URL: null
Copy Paste: [[2401.02997]] Blar-SQL: Faster, Stronger, Smaller NL2SQL(http://arxiv.org/abs/2401.02997)
Summary:
Large Language Models (LLMs) have gained considerable notoriety in the field of natural language to SQL tasks (NL2SQL). In this study, we show how task decomposition can greatly benefit LLMs in database understanding and query generation in order to answer human questions with an SQL query.

We fined-tuned open source models, specifically Llama-2 and Code Llama, by combining 2 different models each designated to focus on one of two tasks in order to leverage each model's core competency to further increase the accuracy of the final SQL query.

We propose a new framework to divide the schema into chunks in order to fit more information into a limited context. Our results are comparable with those obtained by GPT-4 at the same time being 135 times smaller, 90 times faster and more than 100 times cheaper than GPT-4.

Title: Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification. (arXiv:2401.03158v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03158
Code URL: null
Copy Paste: [[2401.03158]] Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification(http://arxiv.org/abs/2401.03158)
Summary:
Short Text Classification (STC) is crucial for processing and comprehending the brief but substantial content prevalent on contemporary digital platforms. The STC encounters difficulties in grasping semantic and syntactic intricacies, an issue that is apparent in traditional pre-trained language models. Although Graph Convolutional Networks enhance performance by integrating external knowledge bases, these methods are limited by the quality and extent of the knowledge applied. Recently, the emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks. However, some studies have highlighted the limitations of their application in fundamental NLP tasks. Consequently, this study sought to employ CoT to investigate the capabilities of LLMs in STC tasks. This study introduces Quartet Logic: A Four-Step Reasoning (QLFR) framework. This framework primarily incorporates Syntactic and Semantic Enrichment CoT, effectively decomposing the STC task into four distinct steps: (i) essential concept identification, (ii) common-sense knowledge retrieval, (iii) text rewriting, and (iv) classification. This elicits the inherent knowledge and abilities of LLMs to address the challenges in STC. Surprisingly, we found that QLFR can also improve the performance of smaller models. Therefore, we developed a CoT-Driven Multi-task learning (QLFR-CML) method to facilitate the knowledge transfer from LLMs to smaller models. Extensive experimentation across six short-text benchmarks validated the efficacy of the proposed methods. Notably, QLFR achieved state-of-the-art performance on all datasets, with significant improvements, particularly on the Ohsumed and TagMyNews datasets.

Title: Part-of-Speech Tagger for Bodo Language using Deep Learning approach. (arXiv:2401.03175v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03175
Code URL: null
Copy Paste: [[2401.03175]] Part-of-Speech Tagger for Bodo Language using Deep Learning approach(http://arxiv.org/abs/2401.03175)
Summary:
Language Processing systems such as Part-of-speech tagging, Named entity recognition, Machine translation, Speech recognition, and Language modeling (LM) are well-studied in high-resource languages. Nevertheless, research on these systems for several low-resource languages, including Bodo, Mizo, Nagamese, and others, is either yet to commence or is in its nascent stages. Language model plays a vital role in the downstream tasks of modern NLP. Extensive studies are carried out on LMs for high-resource languages. Nevertheless, languages such as Bodo, Rabha, and Mising continue to lack coverage. In this study, we first present BodoBERT, a language model for the Bodo language. To the best of our knowledge, this work is the first such effort to develop a language model for Bodo. Secondly, we present an ensemble DL-based POS tagging model for Bodo. The POS tagging model is based on combinations of BiLSTM with CRF and stacked embedding of BodoBERT with BytePairEmbeddings. We cover several language models in the experiment to see how well they work in POS tagging tasks. The best-performing model achieves an F1 score of 0.8041. A comparative experiment was also conducted on Assamese POS taggers, considering that the language is spoken in the same region as Bodo.

Title: Enhancing Context Through Contrast. (arXiv:2401.03314v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03314
Code URL: null
Copy Paste: [[2401.03314]] Enhancing Context Through Contrast(http://arxiv.org/abs/2401.03314)
Summary:
Neural machine translation benefits from semantically rich representations. Considerable progress in learning such representations has been achieved by language modelling and mutual information maximization objectives using contrastive learning. The language-dependent nature of language modelling introduces a trade-off between the universality of the learned representations and the model's performance on the language modelling tasks. Although contrastive learning improves performance, its success cannot be attributed to mutual information alone. We propose a novel Context Enhancement step to improve performance on neural machine translation by maximizing mutual information using the Barlow Twins loss. Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations, eradicating the risk of disrupting semantic information. Further, our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings. Finally, we evaluate the language-agnosticism of our embeddings through language classification and use them for neural machine translation to compare with state-of-the-art approaches.

Title: Examining Forgetting in Continual Pre-training of Aligned Large Language Models. (arXiv:2401.03129v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03129
Code URL: null
Copy Paste: [[2401.03129]] Examining Forgetting in Continual Pre-training of Aligned Large Language Models(http://arxiv.org/abs/2401.03129)
Summary:
Recent advances in Large Language Models (LLMs) have exhibited remarkable proficiency across various tasks. Given the potent applications of LLMs in numerous fields, there has been a surge in LLM development. In developing LLMs, a common practice involves continual pre-training on previously fine-tuned models. However, this can lead to catastrophic forgetting. In our work, we investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM. We evaluate the impact of continuous pre-training on the fine-tuned LLM across various dimensions, including output format, knowledge, and reliability. Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training, especially the repetition issue.

Title: A Joint-Reasoning based Disease Q&A System. (arXiv:2401.03181v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03181
Code URL: null
Copy Paste: [[2401.03181]] A Joint-Reasoning based Disease Q&A System(http://arxiv.org/abs/2401.03181)
Summary:
Medical question answer (QA) assistants respond to lay users' health-related queries by synthesizing information from multiple sources using natural language processing and related techniques. They can serve as vital tools to alleviate issues of misinformation, information overload, and complexity of medical language, thus addressing lay users' information needs while reducing the burden on healthcare professionals. QA systems, the engines of such assistants, have typically used either language models (LMs) or knowledge graphs (KG), though the approaches could be complementary. LM-based QA systems excel at understanding complex questions and providing well-formed answers, but are prone to factual mistakes. KG-based QA systems, which represent facts well, are mostly limited to answering short-answer questions with pre-created templates. While a few studies have jointly used LM and KG approaches for text-based QA, this was done to answer multiple-choice questions. Extant QA systems also have limitations in terms of automation and performance. We address these challenges by designing a novel, automated disease QA system which effectively utilizes both LM and KG techniques through a joint-reasoning approach to answer disease-related questions appropriate for lay users. Our evaluation of the system using a range of quality metrics demonstrates its efficacy over benchmark systems, including the popular ChatGPT.

Title: {\delta}-CAUSAL: Exploring Defeasibility in Causal Reasoning. (arXiv:2401.03183v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03183
Code URL: null
Copy Paste: [[2401.03183]] {\delta}-CAUSAL: Exploring Defeasibility in Causal Reasoning(http://arxiv.org/abs/2401.03183)
Summary:
Defeasibility in causal reasoning implies that the causal relationship between cause and effect can be strengthened or weakened. Namely, the causal strength between cause and effect should increase or decrease with the incorporation of strengthening arguments (supporters) or weakening arguments (defeaters), respectively. However, existing works ignore defeasibility in causal reasoning and fail to evaluate existing causal strength metrics in defeasible settings. In this work, we present {\delta}-CAUSAL, the first benchmark dataset for studying defeasibility in causal reasoning. {\delta}-CAUSAL includes around 11K events spanning ten domains, featuring defeasible causality pairs, i.e., cause-effect pairs accompanied by supporters and defeaters. We further show current causal strength metrics fail to reflect the change of causal strength with the incorporation of supporters or defeaters in {\delta}-CAUSAL. To this end, we propose CESAR (Causal Embedding aSsociation with Attention Rating), a metric that measures causal strength based on token-level causal relationships. CESAR achieves a significant 69.7% relative improvement over existing metrics, increasing from 47.2% to 80.1% in capturing the causal strength change brought by supporters and defeaters. We further demonstrate even Large Language Models (LLMs) like GPT-3.5 still lag 4.5 and 10.7 points behind humans in generating supporters and defeaters, emphasizing the challenge posed by {\delta}-CAUSAL.

Title: The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models. (arXiv:2401.03205v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03205
Code URL: https://github.com/rucaibox/halueval-2.0
Copy Paste: [[2401.03205]] The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models(http://arxiv.org/abs/2401.03205)
Summary:
In the era of large language models (LLMs), hallucination (i.e., the tendency to generate factually incorrect content) poses great challenge to trustworthy and reliable deployment of LLMs in real-world applications. To tackle the LLM hallucination, three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation). To address these challenges, this work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation. Specially, we construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet effective detection method for LLM hallucination. Furthermore, we zoom into the different training or utilization stages of LLMs and extensively analyze the potential factors that lead to the LLM hallucination. Finally, we implement and examine a series of widely used techniques to mitigate the hallucinations in LLMs. Our work has led to several important findings to understand the hallucination origin and mitigate the hallucinations in LLMs. Our code and data can be accessed at https://github.com/RUCAIBox/HaluEval-2.0.

Title: PIXAR: Auto-Regressive Language Modeling in Pixel Space. (arXiv:2401.03321v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03321
Code URL: null
Copy Paste: [[2401.03321]] PIXAR: Auto-Regressive Language Modeling in Pixel Space(http://arxiv.org/abs/2401.03321)
Summary:
Recent works showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations and are implemented as encoder-decoder models that reconstruct masked image patches of rendered text. However, these pixel-based LLMs are limited to autoencoding tasks and cannot generate new text as images. As such, they cannot be used for open-answer or generative language tasks. In this work, we overcome this limitation and introduce PIXAR, the first pixel-based autoregressive LLM that does not rely on a pre-defined vocabulary for both input and output text. Consisting of only a decoder, PIXAR can answer free-form generative tasks while keeping the text representation learning performance on par with previous encoder-decoder models. Furthermore, we highlight the challenges to autoregressively generate non-blurred text as images and link this to the usual maximum likelihood objective. We propose a simple adversarial pretraining that significantly improves the readability and performance of PIXAR making it comparable to GPT2 on short text generation tasks. This paves the way to building open-vocabulary LLMs that are usable for free-form generative tasks and questions the necessity of the usual symbolic input representation -- text as tokens -- for these challenging tasks.

Title: On the Convergence of Semi Unsupervised Calibration through Prior Adaptation Algorithm. (arXiv:2401.03051v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03051
Code URL: null
Copy Paste: [[2401.03051]] On the Convergence of Semi Unsupervised Calibration through Prior Adaptation Algorithm(http://arxiv.org/abs/2401.03051)
Summary:
Calibration is an essential key in machine leaning. Semi Unsupervised Calibration through Prior Adaptation (SUCPA) is a calibration algorithm used in (but not limited to) large-scale language models defined by a {system of first-order difference equation. The map derived by this system} has the peculiarity of being non-hyperbolic {with a non-bounded set of non-isolated fixed points}. In this work, we prove several convergence properties of this algorithm from the perspective of dynamical systems. For a binary classification problem, it can be shown that the algorithm always converges, {more precisely, the map is globally asymptotically stable, and the orbits converge} to a single line of fixed points. Finally, we perform numerical experiments on real-world application to support the presented results. Experiment codes are available online.

gpt

Title: Trace and Edit Relation Associations in GPT. (arXiv:2401.02976v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02976
Code URL: null
Copy Paste: [[2401.02976]] Trace and Edit Relation Associations in GPT(http://arxiv.org/abs/2401.02976)
Summary:
This study introduces a novel approach for analyzing and modifying entity relationships in GPT models, diverging from ROME's entity-focused methods. We develop a relation tracing technique to understand the influence of language model computations on relationship judgments. Using the FewRel dataset, we identify key roles of MLP modules and attention mechanisms in processing relationship information. Our method, tested against ROME on a new dataset, shows improved balance in specificity and generalization, underscoring the potential of manipulating early-layer modules for enhanced model understanding and accuracy.

Title: Identification of Regulatory Requirements Relevant to Business Processes: A Comparative Study on Generative AI, Embedding-based Ranking, Crowd and Expert-driven Methods. (arXiv:2401.02986v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02986
Code URL: null
Copy Paste: [[2401.02986]] Identification of Regulatory Requirements Relevant to Business Processes: A Comparative Study on Generative AI, Embedding-based Ranking, Crowd and Expert-driven Methods(http://arxiv.org/abs/2401.02986)
Summary:
Organizations face the challenge of ensuring compliance with an increasing amount of requirements from various regulatory documents. Which requirements are relevant depends on aspects such as the geographic location of the organization, its domain, size, and business processes. Considering these contextual factors, as a first step, relevant documents (e.g., laws, regulations, directives, policies) are identified, followed by a more detailed analysis of which parts of the identified documents are relevant for which step of a given business process. Nowadays the identification of regulatory requirements relevant to business processes is mostly done manually by domain and legal experts, posing a tremendous effort on them, especially for a large number of regulatory documents which might frequently change. Hence, this work examines how legal and domain experts can be assisted in the assessment of relevant requirements. For this, we compare an embedding-based NLP ranking method, a generative AI method using GPT-4, and a crowdsourced method with the purely manual method of creating relevancy labels by experts. The proposed methods are evaluated based on two case studies: an Australian insurance case created with domain experts and a global banking use case, adapted from SAP Signavio's workflow example of an international guideline. A gold standard is created for both BPMN2.0 processes and matched to real-world textual requirements from multiple regulatory documents. The evaluation and discussion provide insights into strengths and weaknesses of each method regarding applicability, automation, transparency, and reproducibility and provide guidelines on which method combinations will maximize benefits for given characteristics such as process usage, impact, and dynamics of an application scenario.

Title: AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis. (arXiv:2401.03040v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03040
Code URL: null
Copy Paste: [[2401.03040]] AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis(http://arxiv.org/abs/2401.03040)
Summary:
Traffic accident analysis is pivotal for enhancing public safety and developing road regulations. Traditional approaches, although widely used, are often constrained by manual analysis processes, subjective decisions, uni-modal outputs, as well as privacy issues related to sensitive data. This paper introduces the idea of AccidentGPT, a foundation model of traffic accident analysis, which incorporates multi-modal input data to automatically reconstruct the accident process video with dynamics details, and furthermore provide multi-task analysis with multi-modal outputs. The design of the AccidentGPT is empowered with a multi-modality prompt with feedback for task-oriented adaptability, a hybrid training schema to leverage labelled and unlabelled data, and a edge-cloud split configuration for data privacy. To fully realize the functionalities of this model, we proposes several research opportunities. This paper serves as the stepping stone to fill the gaps in traditional approaches of traffic accident analysis and attract the research community attention for automatic, objective, and privacy-preserving traffic accident analysis.

llm

Title: Fine-tuning and Utilization Methods of Domain-specific LLMs. (arXiv:2401.02981v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02981
Code URL: null
Copy Paste: [[2401.02981]] Fine-tuning and Utilization Methods of Domain-specific LLMs(http://arxiv.org/abs/2401.02981)
Summary:
Recent releases of pre-trained Large Language Models (LLMs) have gained considerable traction, yet research on fine-tuning and employing domain-specific LLMs remains scarce. This study investigates approaches for fine-tuning and leveraging domain-specific LLMs, highlighting trends in LLMs, foundational models, and methods for domain-specific pre-training. Focusing on the financial sector, it details dataset selection, preprocessing, model choice, and considerations crucial for LLM fine-tuning in finance. Addressing the unique characteristics of financial data, the study explores the construction of domain-specific vocabularies and considerations for security and regulatory compliance. In the practical application of LLM fine-tuning, the study outlines the procedure and implementation for generating domain-specific LLMs in finance. Various financial cases, including stock price prediction, sentiment analysis of financial news, automated document processing, research, information extraction, and customer service enhancement, are exemplified. The study explores the potential of LLMs in the financial domain, identifies limitations, and proposes directions for improvement, contributing valuable insights for future research. Ultimately, it advances natural language processing technology in business, suggesting proactive LLM utilization in financial services across industries.

Title: Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM. (arXiv:2401.02994v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02994
Code URL: null
Copy Paste: [[2401.02994]] Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM(http://arxiv.org/abs/2401.02994)
Summary:
In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands.

Title: Reflections on Inductive Thematic Saturation as a potential metric for measuring the validity of an inductive Thematic Analysis with LLMs. (arXiv:2401.03239v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03239
Code URL: null
Copy Paste: [[2401.03239]] Reflections on Inductive Thematic Saturation as a potential metric for measuring the validity of an inductive Thematic Analysis with LLMs(http://arxiv.org/abs/2401.03239)
Summary:
This paper presents a set of reflections on saturation and the use of Large Language Models (LLMs) for performing Thematic Analysis (TA). The paper suggests that initial thematic saturation (ITS) could be used as a metric to assess part of the transactional validity of TA with LLM, focusing on the initial coding. The paper presents the initial coding of two datasets of different sizes, and it reflects on how the LLM reaches some form of analytical saturation during the coding. The procedure proposed in this work leads to the creation of two codebooks, one comprising the total cumulative initial codes and the other the total unique codes. The paper proposes a metric to synthetically measure ITS using a simple mathematical calculation employing the ratio between slopes of cumulative codes and unique codes. The paper contributes to the initial body of work exploring how to perform qualitative analysis with LLMs.

long context

lora

Title: The Rise of Diffusion Models in Time-Series Forecasting. (arXiv:2401.03006v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03006
Code URL: null
Copy Paste: [[2401.03006]] The Rise of Diffusion Models in Time-Series Forecasting(http://arxiv.org/abs/2401.03006)
Summary:
This survey delves into the application of diffusion models in time-series forecasting. Diffusion models are demonstrating state-of-the-art results in various fields of generative AI. The paper includes comprehensive background information on diffusion models, detailing their conditioning methods and reviewing their use in time-series forecasting. The analysis covers 11 specific time-series implementations, the intuition and theory behind them, the effectiveness on different datasets, and a comparison among each other. Key contributions of this work are the thorough exploration of diffusion models' applications in time-series forecasting and a chronologically ordered overview of these models. Additionally, the paper offers an insightful discussion on the current state-of-the-art in this domain and outlines potential future research directions. This serves as a valuable resource for researchers in AI and time-series analysis, offering a clear view of the latest advancements and future potential of diffusion models.

Title: UMIE: Unified Multimodal Information Extraction with Instruction Tuning. (arXiv:2401.03082v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.03082
Code URL: null
Copy Paste: [[2401.03082]] UMIE: Unified Multimodal Information Extraction with Instruction Tuning(http://arxiv.org/abs/2401.03082)
Summary:
Multimodal information extraction (MIE) gains significant attention as the popularity of multimedia content increases. However, current MIE methods often resort to using task-specific model structures, which results in limited generalizability across tasks and underutilizes shared knowledge across MIE tasks. To address these issues, we propose UMIE, a unified multimodal information extractor to unify three MIE tasks as a generation problem using instruction tuning, being able to effectively extract both textual and visual mentions. Extensive experiments show that our single UMIE outperforms various state-of-the-art (SoTA) methods across six MIE datasets on three tasks. Furthermore, in-depth analysis demonstrates UMIE's strong generalization in the zero-shot setting, robustness to instruction variants, and interpretability. Our research serves as an initial step towards a unified MIE model and initiates the exploration into both instruction tuning and large language models within the MIE domain. Our code, data, and model are available at https://github.com/ZUCC-AI/UMIE

Title: Human as AI Mentor: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving. (arXiv:2401.03160v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03160
Code URL: null
Copy Paste: [[2401.03160]] Human as AI Mentor: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving(http://arxiv.org/abs/2401.03160)
Summary:
Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents' policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor's cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios. The code and demo videos for this paper can be accessed at: https://zilin-huang.github.io/HAIM-DRL-website/}{https://zilin-huang.github.io/HAIM-DRL-website/.

Title: Exploration of Adolescent Depression Risk Prediction Based on Census Surveys and General Life Issues. (arXiv:2401.03171v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03171
Code URL: null
Copy Paste: [[2401.03171]] Exploration of Adolescent Depression Risk Prediction Based on Census Surveys and General Life Issues(http://arxiv.org/abs/2401.03171)
Summary:
In contemporary society, the escalating pressures of life and work have propelled psychological disorders to the forefront of modern health concerns, an issue that has been further accentuated by the COVID-19 pandemic. The prevalence of depression among adolescents is steadily increasing, and traditional diagnostic methods, which rely on scales or interviews, prove particularly inadequate for detecting depression in young people. Addressing these challenges, numerous AI-based methods for assisting in the diagnosis of mental health issues have emerged. However, most of these methods center around fundamental issues with scales or use multimodal approaches like facial expression recognition. Diagnosis of depression risk based on everyday habits and behaviors has been limited to small-scale qualitative studies. Our research leverages adolescent census data to predict depression risk, focusing on children's experiences with depression and their daily life situations. We introduced a method for managing severely imbalanced high-dimensional data and an adaptive predictive approach tailored to data structure characteristics. Furthermore, we proposed a cloud-based architecture for automatic online learning and data updates. This study utilized publicly available NSCH youth census data from 2020 to 2022, encompassing nearly 150,000 data entries. We conducted basic data analyses and predictive experiments, demonstrating significant performance improvements over standard machine learning and deep learning algorithms. This affirmed our data processing method's broad applicability in handling imbalanced medical data. Diverging from typical predictive method research, our study presents a comprehensive architectural solution, considering a wider array of user needs.

Title: On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond. (arXiv:2401.03301v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03301
Code URL: null
Copy Paste: [[2401.03301]] On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond(http://arxiv.org/abs/2401.03301)
Summary:
We seek to understand what facilitates sample-efficient learning from historical datasets for sequential decision-making, a problem that is popularly known as offline reinforcement learning (RL). Further, we are interested in algorithms that enjoy sample efficiency while leveraging (value) function approximation. In this paper, we address these fundamental questions by (i) proposing a notion of data diversity that subsumes the previous notions of coverage measures in offline RL and (ii) using this notion to {unify} three distinct classes of offline RL algorithms based on version spaces (VS), regularized optimization (RO), and posterior sampling (PS). We establish that VS-based, RO-based, and PS-based algorithms, under standard assumptions, achieve \emph{comparable} sample efficiency, which recovers the state-of-the-art sub-optimality bounds for finite and linear model classes with the standard assumptions. This result is surprising, given that the prior work suggested an unfavorable sample complexity of the RO-based algorithm compared to the VS-based algorithm, whereas posterior sampling is rarely considered in offline RL due to its explorative nature. Notably, our proposed model-free PS-based algorithm for offline RL is {novel}, with sub-optimality bounds that are {frequentist} (i.e., worst-case) in nature.

hallucination

prompt

Title: Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance. (arXiv:2401.02979v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02979
Code URL: https://github.com/cpjku/performance_embeddings_fire23
Copy Paste: [[2401.02979]] Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance(http://arxiv.org/abs/2401.02979)
Summary:
Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement.

code

Title: Deep Anomaly Detection in Text. (arXiv:2401.02971v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02971
Code URL: null
Copy Paste: [[2401.02971]] Deep Anomaly Detection in Text(http://arxiv.org/abs/2401.02971)
Summary:
Deep anomaly detection methods have become increasingly popular in recent years, with methods like Stacked Autoencoders, Variational Autoencoders, and Generative Adversarial Networks greatly improving the state-of-the-art. Other methods rely on augmenting classical models (such as the One-Class Support Vector Machine), by learning an appropriate kernel function using Neural Networks. Recent developments in representation learning by self-supervision are proving to be very beneficial in the context of anomaly detection. Inspired by the advancements in anomaly detection using self-supervised learning in the field of computer vision, this thesis aims to develop a method for detecting anomalies by exploiting pretext tasks tailored for text corpora. This approach greatly improves the state-of-the-art on two datasets, 20Newsgroups, and AG News, for both semi-supervised and unsupervised anomaly detection, thus proving the potential for self-supervised anomaly detectors in the field of natural language processing.

Title: TimeGraphs: Graph-based Temporal Reasoning. (arXiv:2401.03134v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03134
Code URL: null
Copy Paste: [[2401.03134]] TimeGraphs: Graph-based Temporal Reasoning(http://arxiv.org/abs/2401.03134)
Summary:
Many real-world systems exhibit temporal, dynamic behaviors, which are captured as time series of complex agent interactions. To perform temporal reasoning, current methods primarily encode temporal dynamics through simple sequence-based models. However, in general these models fail to efficiently capture the full spectrum of rich dynamics in the input, since the dynamics is not uniformly distributed. In particular, relevant information might be harder to extract and computing power is wasted for processing all individual timesteps, even if they contain no significant changes or no new information. Here we propose TimeGraphs, a novel approach that characterizes dynamic interactions as a hierarchical temporal graph, diverging from traditional sequential representations. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. Adopting a self-supervised method, TimeGraphs constructs a multi-level event hierarchy from a temporal input, which is then used to efficiently reason about the unevenly distributed dynamics. This construction process is scalable and incremental to accommodate streaming data. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset. The results demonstrate both robustness and efficiency of TimeGraphs on a range of temporal reasoning tasks. Our approach obtains state-of-the-art performance and leads to a performance increase of up to 12.2% on event prediction and recognition tasks over current approaches. Our experiments further demonstrate a wide array of capabilities including zero-shot generalization, robustness in case of data sparsity, and adaptability to streaming data flow.

Title: Learning Persistent Community Structures in Dynamic Networks via Topological Data Analysis. (arXiv:2401.03194v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.03194
Code URL: null
Copy Paste: [[2401.03194]] Learning Persistent Community Structures in Dynamic Networks via Topological Data Analysis(http://arxiv.org/abs/2401.03194)
Summary:
Dynamic community detection methods often lack effective mechanisms to ensure temporal consistency, hindering the analysis of network evolution. In this paper, we propose a novel deep graph clustering framework with temporal consistency regularization on inter-community structures, inspired by the concept of minimal network topological changes within short intervals. Specifically, to address the representation collapse problem, we first introduce MFC, a matrix factorization-based deep graph clustering algorithm that preserves node embedding. Based on static clustering results, we construct probabilistic community networks and compute their persistence homology, a robust topological measure, to assess structural similarity between them. Moreover, a novel neural network regularization TopoReg is introduced to ensure the preservation of topological similarity between inter-community structures over time intervals. Our approach enhances temporal consistency and clustering accuracy on real-world datasets with both fixed and varying numbers of communities. It is also a pioneer application of TDA in temporally persistent community detection, offering an insightful contribution to field of network analysis. Code and data are available at the public git repository: https://github.com/kundtx/MFC_TopoReg

Title: Attention and Autoencoder Hybrid Model for Unsupervised Online Anomaly Detection. (arXiv:2401.03322v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03322
Code URL: null
Copy Paste: [[2401.03322]] Attention and Autoencoder Hybrid Model for Unsupervised Online Anomaly Detection(http://arxiv.org/abs/2401.03322)
Summary:
This paper introduces a hybrid attention and autoencoder (AE) model for unsupervised online anomaly detection in time series. The autoencoder captures local structural patterns in short embeddings, while the attention model learns long-term features, facilitating parallel computing with positional encoding. Unique in its approach, our proposed hybrid model combines attention and autoencoder for the first time in time series anomaly detection. It employs an attention-based mechanism, akin to the deep transformer model, with key architectural modifications for predicting the next time step window in the autoencoder's latent space. The model utilizes a threshold from the validation dataset for anomaly detection and introduces an alternative method based on analyzing the first statistical moment of error, improving accuracy without dependence on a validation dataset. Evaluation on diverse real-world benchmark datasets and comparing with other well-established models, confirms the effectiveness of our proposed model in anomaly detection.

Title: Rule-Guided Joint Embedding Learning of Knowledge Graphs. (arXiv:2401.02968v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02968
Code URL: null
Copy Paste: [[2401.02968]] Rule-Guided Joint Embedding Learning of Knowledge Graphs(http://arxiv.org/abs/2401.02968)
Summary:
In recent studies, the focus has been on enhancing knowledge graph embedding learning, which encodes entities and relations in knowledge graphs into low-dimensional vector spaces. While current models mainly consider the structural aspects of these graphs, there's a wealth of contextual and literal information in knowledge graphs that can be utilized for more effective embeddings. This paper introduces a novel model that incorporates both contextual and literal information into entity and relation embeddings, utilizing graph convolutional networks. Specifically, for contextual information, we assess its significance through confidence and relatedness metrics. A unique rule-based method is developed to calculate the confidence metric, and the relatedness metric is derived from the literal information's representations. We validated our model's performance with thorough experiments on two established benchmark datasets.

Title: FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning. (arXiv:2401.03230v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03230
Code URL: https://github.com/tsingz0/fedtgp
Copy Paste: [[2401.03230]] FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning(http://arxiv.org/abs/2401.03230)
Summary:
Recently, Heterogeneous Federated Learning (HtFL) has attracted attention due to its ability to support heterogeneous models and data. To reduce the high communication cost of transmitting model parameters, a major challenge in HtFL, prototype-based HtFL methods are proposed to solely share class representatives, a.k.a, prototypes, among heterogeneous clients while maintaining the privacy of clients' models. However, these prototypes are naively aggregated into global prototypes on the server using weighted averaging, resulting in suboptimal global knowledge which negatively impacts the performance of clients. To overcome this challenge, we introduce a novel HtFL approach called FedTGP, which leverages our Adaptive-margin-enhanced Contrastive Learning (ACL) to learn Trainable Global Prototypes (TGP) on the server. By incorporating ACL, our approach enhances prototype separability while preserving semantic meaning. Extensive experiments with twelve heterogeneous models demonstrate that our FedTGP surpasses state-of-the-art methods by up to 9.08% in accuracy while maintaining the communication and privacy advantages of prototype-based HtFL. Our code is available at https://github.com/TsingZ0/FedTGP.

Title: Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection. (arXiv:2401.03341v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03341
Code URL: null
Copy Paste: [[2401.03341]] Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection(http://arxiv.org/abs/2401.03341)
Summary:
Due to their unsupervised training and uncertainty estimation, deep Variational Autoencoders (VAEs) have become powerful tools for reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based TSAD methods, either statistical or deep, tune meta-priors to estimate the likelihood probability for effectively capturing spatiotemporal dependencies in the data. However, these methods confront the challenge of inherent data scarcity, which is often the case in anomaly detection tasks. Such scarcity easily leads to latent holes, discontinuous regions in latent space, resulting in non-robust reconstructions on these discontinuous spaces. We propose a novel generative framework that combines VAEs with self-supervised learning (SSL) to address this issue.

chat

retrieval augmented generation

retrieval-augmented generation

rag

Title: TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling. (arXiv:2401.03138v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03138
Code URL: null
Copy Paste: [[2401.03138]] TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling(http://arxiv.org/abs/2401.03138)
Summary:
To address the limitations of traffic prediction from location-bound detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data source that leverages the extensive coverage of cellular traffic to capture mobility patterns. Our extensive analysis validates its potential for transportation. Focusing on vehicle-related GCT flow prediction, we propose a graph neural network that integrates multivariate, temporal, and spatial facets for improved accuracy. Experiments reveal our model's superiority over baselines, especially in long-term predictions. We also highlight the potential for GCT flow integration into transportation systems.

Title: A Survey on Verification and Validation, Testing and Evaluations of Neurosymbolic Artificial Intelligence. (arXiv:2401.03188v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.03188
Code URL: null
Copy Paste: [[2401.03188]] A Survey on Verification and Validation, Testing and Evaluations of Neurosymbolic Artificial Intelligence(http://arxiv.org/abs/2401.03188)
Summary:
Neurosymbolic artificial intelligence (AI) is an emerging branch of AI that combines the strengths of symbolic AI and sub-symbolic AI. A major drawback of sub-symbolic AI is that it acts as a "black box", meaning that predictions are difficult to explain, making the testing & evaluation (T&E) and validation & verification (V&V) processes of a system that uses sub-symbolic AI a challenge. Since neurosymbolic AI combines the advantages of both symbolic and sub-symbolic AI, this survey explores how neurosymbolic applications can ease the V&V process. This survey considers two taxonomies of neurosymbolic AI, evaluates them, and analyzes which algorithms are commonly used as the symbolic and sub-symbolic components in current applications. Additionally, an overview of current techniques for the T&E and V&V processes of these components is provided. Furthermore, it is investigated how the symbolic part is used for T&E and V&V purposes in current neurosymbolic applications. Our research shows that neurosymbolic AI as great potential to ease the T&E and V&V processes of sub-symbolic AI by leveraging the possibilities of symbolic AI. Additionally, the applicability of current T&E and V&V methods to neurosymbolic AI is assessed, and how different neurosymbolic architectures can impact these methods is explored. It is found that current T&E and V&V techniques are partly sufficient to test, evaluate, verify, or validate the symbolic and sub-symbolic part of neurosymbolic applications independently, while some of them use approaches where current T&E and V&V methods are not applicable by default, and adjustments or even new approaches are needed. Our research shows that there is great potential in using symbolic AI to test, evaluate, verify, or validate the predictions of a sub-symbolic model, making neurosymbolic AI an interesting research direction for safe, secure, and trustworthy AI.

Title: MPN: Leveraging Multilingual Patch Neuron for Cross-lingual Model Editing. (arXiv:2401.03190v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.03190
Code URL: null
Copy Paste: [[2401.03190]] MPN: Leveraging Multilingual Patch Neuron for Cross-lingual Model Editing(http://arxiv.org/abs/2401.03190)
Summary:
Large language models are known for encoding a vast amount of factual knowledge, but they often becomes outdated due to the ever-changing nature of external information. A promising solution to this challenge is the utilization of model editing methods to update the knowledge in an efficient manner. However, the majority of existing model editing techniques are limited to monolingual frameworks, thus failing to address the crucial issue of cross-lingual knowledge synchronization for multilingual models. To tackle this problem, we propose a simple yet effective method that trains multilingual patch neuron to store cross-lingual knowledge. It can be easily adapted to existing approaches to enhance their cross-lingual editing capabilities. To evaluate our method, we conduct experiments using both the XNLI dataset and a self-constructed XFEVER dataset. Experimental results demonstrate that our proposed method achieves improved performance in cross-lingual editing tasks without requiring excessive modifications to the original methodology, thereby showcasing its user-friendly characteristics. Codes will be released soon.

Title: SeqNAS: Neural Architecture Search for Event Sequence Classification. (arXiv:2401.03246v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03246
Code URL: null
Copy Paste: [[2401.03246]] SeqNAS: Neural Architecture Search for Event Sequence Classification(http://arxiv.org/abs/2401.03246)
Summary:
Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.

Title: A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design. (arXiv:2401.02961v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.02961
Code URL: null
Copy Paste: [[2401.02961]] A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design(http://arxiv.org/abs/2401.02961)
Summary:
Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that deep learning has great potential to accelerate and refine metasurface designs. Here, we present XGAN, an extended generative adversarial network (GAN) with a surrogate for high-quality free-form metasurface designs. The proposed surrogate provides a physical constraint to XGAN so that XGAN can accurately generate metasurfaces monolithically from input spectral responses. In comparative experiments involving 20000 free-form metasurface designs, XGAN achieves 0.9734 average accuracy and is 500 times faster than the conventional methodology. This method facilitates the metasurface library building for specific spectral responses and can be extended to various inverse design problems, including optical metamaterials, nanophotonic devices, and drug discovery.

Title: UnetTSF: A Better Performance Linear Complexity Time Series Prediction Model. (arXiv:2401.03001v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03001
Code URL: https://github.com/lichuustc/unettsf
Copy Paste: [[2401.03001]] UnetTSF: A Better Performance Linear Complexity Time Series Prediction Model(http://arxiv.org/abs/2401.03001)
Summary:
Recently, Transformer-base models have made significant progress in the field of time series prediction which have achieved good results and become baseline models beyond Dlinear. The paper proposes an U-Net time series prediction model (UnetTSF) with linear complexity, which adopts the U-Net architecture. We are the first to use FPN technology to extract features from time series data, replacing the method of decomposing time series data into trend and seasonal terms, while designing a fusion structure suitable for time series data. After testing on 8 open-source datasets, compared to the best linear model DLiner. Out of 32 testing projects, 31 achieved the best results. The average decrease in mse is 10.1%, while the average decrease in mae is 9.1%. Compared with the complex transformer-base PatchTST, UnetTSF obtained 9 optimal results for mse and 15 optimal results for mae in 32 testing projects.

Title: Data-Dependent Stability Analysis of Adversarial Training. (arXiv:2401.03156v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03156
Code URL: null
Copy Paste: [[2401.03156]] Data-Dependent Stability Analysis of Adversarial Training(http://arxiv.org/abs/2401.03156)
Summary:
Stability analysis is an essential aspect of studying the generalization ability of deep learning, as it involves deriving generalization bounds for stochastic gradient descent-based training algorithms. Adversarial training is the most widely used defense against adversarial example attacks. However, previous generalization bounds for adversarial training have not included information regarding the data distribution. In this paper, we fill this gap by providing generalization bounds for stochastic gradient descent-based adversarial training that incorporate data distribution information. We utilize the concepts of on-average stability and high-order approximate Lipschitz conditions to examine how changes in data distribution and adversarial budget can affect robust generalization gaps. Our derived generalization bounds for both convex and non-convex losses are at least as good as the uniform stability-based counterparts which do not include data distribution information. Furthermore, our findings demonstrate how distribution shifts from data poisoning attacks can impact robust generalization.

multi-run

chain-of-thought

tree-of-thought

agent

Title: Learning from a Generative AI Predecessor -- The Many Motivations for Interacting with Conversational Agents. (arXiv:2401.02978v1 [cs.CL])

Paper URL: http://arxiv.org/abs/2401.02978
Code URL: null
Copy Paste: [[2401.02978]] Learning from a Generative AI Predecessor -- The Many Motivations for Interacting with Conversational Agents(http://arxiv.org/abs/2401.02978)
Summary:
For generative AI to succeed, how engaging a conversationalist must it be? For almost sixty years, some conversational agents have responded to any question or comment to keep a conversation going. In recent years, several utilized machine learning or sophisticated language processing, such as Tay, Xiaoice, Zo, Hugging Face, Kuki, and Replika. Unlike generative AI, they focused on engagement, not expertise. Millions of people were motivated to engage with them. What were the attractions? Will generative AI do better if it is equally engaging, or should it be less engaging? Prior to the emergence of generative AI, we conducted a large-scale quantitative and qualitative analysis to learn what motivated millions of people to engage with one such 'virtual companion,' Microsoft's Zo. We examined the complete chat logs of 2000 anonymized people. We identified over a dozen motivations that people had for interacting with this software. Designers learned different ways to increase engagement. Generative conversational AI does not yet have a clear revenue model to address its high cost. It might benefit from being more engaging, even as it supports productivity and creativity. Our study and analysis point to opportunities and challenges.

Title: Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit Approach. (arXiv:2401.03059v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03059
Code URL: null
Copy Paste: [[2401.03059]] Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit Approach(http://arxiv.org/abs/2401.03059)
Summary:
Ultra-reliable low-latency communication (URLLC) is the cornerstone for a broad range of emerging services in next-generation wireless networks. URLLC fundamentally relies on the network's ability to proactively determine whether sufficient resources are available to support the URLLC traffic, and thus, prevent so-called cell overloads. Nonetheless, achieving accurate quality-of-service (QoS) predictions for URLLC user equipment (UEs) and preventing cell overloads are very challenging tasks. This is due to dependency of the QoS metrics (latency and reliability) on traffic and channel statistics, users' mobility, and interdependent performance across UEs. In this paper, a new QoS-aware UE admission control approach is developed to proactively estimate QoS for URLLC UEs, prior to associating them with a cell, and accordingly, admit only a subset of UEs that do not lead to a cell overload. To this end, an optimization problem is formulated to find an efficient UE admission control policy, cognizant of UEs' QoS requirements and cell-level load dynamics. To solve this problem, a new machine learning based method is proposed that builds on (deep) neural contextual bandits, a suitable framework for dealing with nonlinear bandit problems. In fact, the UE admission controller is treated as a bandit agent that observes a set of network measurements (context) and makes admission control decisions based on context-dependent QoS (reward) predictions. The simulation results show that the proposed scheme can achieve near-optimal performance and yield substantial gains in terms of cell-level service reliability and efficient resource utilization.

Title: Decision Making in Non-Stationary Environments with Policy-Augmented Search. (arXiv:2401.03197v1 [cs.AI])

Paper URL: http://arxiv.org/abs/2401.03197
Code URL: null
Copy Paste: [[2401.03197]] Decision Making in Non-Stationary Environments with Policy-Augmented Search(http://arxiv.org/abs/2401.03197)
Summary:
Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.

Title: MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning. (arXiv:2401.03306v1 [cs.LG])

Paper URL: http://arxiv.org/abs/2401.03306
Code URL: null
Copy Paste: [[2401.03306]] MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning(http://arxiv.org/abs/2401.03306)
Summary:
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have achieved significant progress in sample efficiency and the complexity of the tasks they can solve, yet remain under-utilized in the fine-tuning setting. In this work, we argue that existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains due to issues with distribution shifts, off-dynamics data, and non-stationary rewards. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization, while preventing model exploitation by controlling epistemic uncertainty. We find that our approach successfully solves tasks from the MetaWorld benchmark, as well as the Franka Kitchen robot manipulation environment completely from images. To the best of our knowledge, MOTO is the first method to solve this environment from pixels.