2025-12-02

Title: Text Annotation via Inductive Coding: Comparing Human Experts to LLMs in Qualitative Data Analysis

Title: Emergent Convergence in Multi-Agent LLM Annotation

Title: Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis

Title: Minimal-Edit Instruction Tuning for Low-Resource Indic GEC

Title: OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

Title: Lost without translation -- Can transformer (language models) understand mood states?

Title: EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education

Title: Assertion-Conditioned Compliance: A Provenance-Aware Vulnerability in Multi-Turn Tool-Calling Agents

Title: IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages

Title: Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance Judgments via Personality Infusing

Title: A Taxonomy of Errors in English as she is spoke: Toward an AI-Based Method of Error Analysis for EFL Writing Instruction

Title: CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency

Title: SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

Title: G-KV: Decoding-Time KV Cache Eviction with Global Attention

Title: Catch Me If You Can: How Smaller Reasoning Models Pretend to Reason with Mathematical Fidelity

Title: Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models

Title: Prism: A Minimal Compositional Metalanguage for Specifying Agent Behavior

Title: ART: Adaptive Response Tuning Framework -- A Multi-Agent Tournament-Based Approach to LLM Response Optimization

Title: Sycophancy Claims about Language Models: The Missing Human-in-the-Loop

Title: Graphing the Truth: Structured Visualizations for Automated Hallucination Detection in LLMs

Title: A Comparison of Human and ChatGPT Classification Performance on Complex Social Media Data

Title: Auxiliary-Hyperparameter-Free Sampling: Entropy Equilibrium for Text Generation

Title: WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models

Title: Less is More: Resource-Efficient Low-Rank Adaptation

Title: Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios

Title: Mitigating Hallucinations in Zero-Shot Scientific Summarisation: A Pilot Study

Title: Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data

Title: Table as a Modality for Large Language Models

Title: Dr.Mi-Bench: A Modular-integrated Benchmark for Scientific Deep Research Agent

Title: Advancing Academic Chatbots: Evaluation of Non Traditional Outputs

Title: When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals

Title: ELR-1000: A Community-Generated Dataset for Endangered Indic Indigenous Languages

Title: DrawingBench: Evaluating Spatial Reasoning and UI Interaction Capabilities of Large Language Models through Mouse-Based Drawing Tasks

Title: TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness

Title: Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks

Title: Conveying Imagistic Thinking in Traditional Chinese Medicine Translation: A Prompt Engineering and LLM-Based Evaluation Framework

Title: SUPERChem: A Multimodal Reasoning Benchmark in Chemistry

Title: Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning

Title: PromptBridge: Cross-Model Prompt Transfer for Large Language Models

Title: Multilingual Conversational AI for Financial Assistance: Bridging Language Barriers in Indian FinTech

Title: Enhancing BERT Fine-Tuning for Sentiment Analysis in Lower-Resourced Languages

Title: MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages

Title: Language Diversity: Evaluating Language Usage and AI Performance on African Languages in Digital Spaces

Title: MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

Title: Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems

Title: MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

Title: Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks

Title: InnoGym: Benchmarking the Innovation Potential of AI Agents

Title: Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability

Title: BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian Languages

Title: Cross-Lingual Interleaving for Speech Language Models

Title: Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

Title: OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation

Title: Latent Debate: A Surrogate Framework for Interpreting LLM Thinking

Title: Rectifying LLM Thought from Lens of Optimization

Title: How Far Are We from Genuinely Useful Deep Research Agents?

Title: The Art of Scaling Test-Time Compute for Large Language Models

Title: Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling