Skip to yearly menu bar
Skip to main content
Main Navigation
NeurIPS
Help/FAQ
Contact NeurIPS
Code of Ethics
Code of Conduct
Create Profile
Journal To Conference Track
Diversity & Inclusion
Proceedings
Future Meetings
Press
Exhibitor Information
Privacy Policy
Downloads
My Stuff
Login
San Diego
Mexico City
Select Year: (2025)
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
Earlier Conferences
Start Here
Schedule
Tutorials
Main Conference
Invited Talks
Orals
Papers
Paper Visualization
Competitions
Datasets & Benchmarks
Journal Track
Creative AI Track
Outstanding Paper Awards
Creative AI
Spotlights
Community
Affinity Events
Socials
Careers
Workshops
Exhibitors
Help
Help via Chat
FAQ
Organizers
Expo
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Massive Sound Embedding Benchmark (MSEB)
Efficient Adaptive Experimentation with Noncompliance
Spectral Graph Neural Networks are Incomplete on Graphs with a Simple Spectrum
ARM: Adaptive Reasoning Model
Depth-Width Tradeoffs for Transformers on Graph Tasks
A Provable Approach for End-to-End Safe Reinforcement Learning
The emergence of sparse attention: impact of data distribution and benefits of repetition
PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation
TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval
Can We Infer Confidential Properties of Training Data from LLMs?
ProDAG: Projected Variational Inference for Directed Acyclic Graphs
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization
SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism
What Expressivity Theory Misses: Message Passing Complexity for GNNs
Preference-Based Dynamic Ranking Structure Recognition
Geometry-Aware Collaborative Multi-Solutions Optimizer for Model Fine-Tuning with Parameter Efficiency
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension
Curly Flow Matching for Learning Non-gradient Field Dynamics
Probably Approximately Precision and Recall Learning
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance
GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
UGoDIT: Unsupervised Group Deep Image Prior Via Transferable Weights
Dynamic Algorithm for Explainable $k$-medians Clustering under $\ell_p$ Norm
ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Acceleration via silver step-size on Riemannian manifolds with applications to Wasserstein space
CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting
RADAR: Benchmarking Language Models on Imperfect Tabular Data
Auto-Compressing Networks
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach
Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models
Energy Loss Functions for Physical Systems
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
DualFocus: Depth from Focus with Spatio-Focal Dual Variational Constraints
CF-VLM:CounterFactual Vision-Language Fine-tuning
Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation
MAT-Agent: Adaptive Multi-Agent Training Optimization
RePO: Understanding Preference Learning Through ReLU-Based Optimization
MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling
Less is More: Unlocking Specialization of Time Series Foundation Models via Structured Pruning
GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning
Linearly Constrained Diffusion Implicit Models
CellCLIP - Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning
A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection
DiffLiG: Diffusion-enhanced Liquid Graph with Attention Propagation for Grid-to-Station Precipitation Correction
Curriculum Abductive Learning
Where Graph Meets Heterogeneity: Multi-View Collaborative Graph Experts
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
Practical do-Shapley Explanations with Estimand-Agnostic Causal Inference
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
Fair Deepfake Detectors Can Generalize
CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates
Exploration via Feature Perturbation in Contextual Bandits
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
Repurposing AlphaFold3-like Protein Folding Models for Antibody Sequence and Structure Co-design
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Selective Omniprediction and Fair Abstention
SWE-bench Goes Live!
Gradient-Guided Epsilon Constraint Method for Online Continual Learning
Purifying Approximate Differential Privacy with Randomized Post-processing
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation
SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Coresets for Clustering Under Stochastic Noise
Continual Gaussian Mixture Distribution Modeling for Class Incremental Semantic Segmentation
Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems
Rethinking Losses for Diffusion Bridge Samplers
Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning
NeurIPT: Foundation Model for Neural Interfaces
Causal Discovery over Clusters of Variables in Markovian Systems
MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation
LLMs Encode Harmfulness and Refusal Separately
Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search
Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits
Zero-shot World Models via Search in Memory
Generalization Guarantees for Learning Score-Based Branch-and-Cut Policies in Integer Programming
Size-adaptive Hypothesis Testing for Fairness
SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
A Generalist Intracortical Motor Decoder
A faster training algorithm for regression trees with linear leaves, and an analysis of its complexity
Support Vector Generation: Kernelizing Large Language Models for Efficient Zero‑Shot NLP
Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection
Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms
Faithful Dynamic Imitation Learning from Human Intervention with Dynamic Regret Minimization
LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis
Group-Level Data Selection for Efficient Pretraining
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
Fix False Transparency by Noise Guided Splatting
MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Cultural Learning
Rethinking Verification for LLM Code Generation: From Generation to Testing
Graph-Theoretic Insights into Bayesian Personalized Ranking for Recommendation
LT-Soups: Bridging Head and Tail Classes via Subsampled Model Soups
Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning
Constrained Diffusers for Safe Planning and Control
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
The Flood Complex: Large-Scale Persistent Homology on Millions of Points
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective
Disentangling Latent Shifts of In-Context Learning with Weak Supervision
TractoTransformer: Diffusion MRI Streamline Tractography using CNN and Transformer Networks
Interpreting vision transformers via residual replacement model
Rare Text Semantics Were Always There in Your Diffusion Transformer
Mixture of Scope Experts at Test: Generalizing Deeper Graph Neural Networks with Shallow Variants
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions
DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting
Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
Locality in Image Diffusion Models Emerges from Data Statistics
Procurement Auctions with Predictions: Improved Frugality for Facility Location
Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
The Logical Expressiveness of Temporal GNNs via Two-Dimensional Product Logics
Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness
What do you know? Bayesian knowledge inference for navigating agents
Optimal Regret of Bandits under Differential Privacy
PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum
Language‑Bias‑Resilient Visual Question Answering via Adaptive Multi‑Margin Collaborative Debiasing
Role Bias in Diffusion Models: Diagnosing and Mitigating through Intermediate Decomposition
NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation
HyperMixup: Hypergraph-Augmented with Higher-order Information Mixup
Jamais Vu: Exposing the Generalization Gap in Supervised Semantic Correspondence
Evolutionary Prediction Games
A Bayesian Approach to Contextual Dynamic Pricing using the Proportional Hazards Model with Discrete Price Data
Inference of Whole Brain Electrophysiological Networks Through Multimodal Integration of Simultaneous Scalp and Intracranial EEG
Event-based HDR Structured Light
Atom of Thoughts for Markov LLM Test-Time Scaling
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Pareto Optimal Risk-Agnostic Distributional Bandits with Heavy-Tail Rewards
Tight Generalization Bounds for Large-Margin Halfspaces
Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation
Consistent Story Generation: Unlocking the Potential of Zigzag Sampling
Cross City Traffic Flow Generation via Retrieval Augmented Diffusion Model
Strategic Hypothesis Testing
The Price of Sparsity: Sufficient Conditions for Sparse Recovery using Sparse and Sparsified Measurements
SDPGO: Efficient Self-Distillation Training Meets Proximal Gradient Optimization
Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve
Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
Neural Evolution Strategy for Black-box Pareto Set Learning
JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics
Network two-sample test for block models
Self-Challenging Language Model Agents
Dense SAE Latents Are Features, Not Bugs
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function
Information-Theoretic Reward Decomposition for Generalizable RLHF
Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization
Memory-Enhanced Neural Solvers for Routing Problems
Streaming Attention Approximation via Discrepancy Theory
Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$
TANDEM: Bi-Level Data Mixture Optimization with Twin Networks
Improved Training Technique for Shortcut Models
Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers
Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems
Learning to Condition: A Neural Heuristic for Scalable MPE Inference
DMWM: Dual-Mind World Model with Long-Term Imagination
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
HiFC: High-efficiency Flash-based KV Cache Swapping for Scaling LLM Inference
SuperCLIP: CLIP with Simple Classification Supervision
Taming Adversarial Constraints in CMDPs
Markov Persuasion Processes: Learning to Persuade From Scratch
Gradient Multi-Normalization for Efficient LLM Training
No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!
Bayesian Concept Bottleneck Models with LLM Priors
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking
G-Net: A Provably Easy Construction of High-Accuracy Random Binary Neural Networks
Language Models (Mostly) Know When to Stop Reading
Sparta Alignment: Collectively Aligning Multiple Language Models through Combat
Improving Video Generation with Human Feedback
Fair Minimum Labeling: Efficient Temporal Network Activations for Reachability and Equity
Locally Optimal Private Sampling: Beyond the Global Minimax
ADPretrain: Advancing Industrial Anomaly Detection via Anomaly Representation Pretraining
$\mu$PC: Scaling Predictive Coding to 100+ Layer Networks
VoxDet: Rethinking 3D Semantic Scene Completion as Dense Object Detection
REDOUBT: Duo Safety Validation for Autonomous Vehicle Motion Planning
AutoJudge: Judge Decoding Without Manual Annotation
Unlearning-Aware Minimization
$\mathcal{X}^2$-DFD: A framework for e$\mathcal{X}$plainable and e$\mathcal{X}$tendable Deepfake Detection
Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
ImgEdit: A Unified Image Editing Dataset and Benchmark
Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs
Planning and Learning in Average Risk-aware MDPs
Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS
Learning Shared Representations from Unpaired Data
Constrained Posterior Sampling: Time Series Generation with Hard Constraints
Orochi: Versatile Biomedical Image Processor
Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model
Mitigating Instability in High Residual Adaptive Sampling for PINNs via Langevin Dynamics
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Functional Virtual Adversarial Training for Semi-Supervised Time Series Classification
On the Coexistence and Ensembling of Watermarks
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Real-DRL: Teach and Learn in Reality
Chain-of-Retrieval Augmented Generation
STAR: Spatial-Temporal Tracklet Matching for Multi-Object Tracking
HairFree: Compositional 2D Head Prior for Text-Driven 360° Bald Texture Synthesis
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
PhysDiff-VTON: Cross-Domain Physics Modeling and Trajectory Optimization for Virtual Try-On
Sim-LLM: Optimizing LLM Inference at the Edge through Inter-Task KV Reuse
Right for the Right Reasons: Avoiding Reasoning Shortcuts via Prototypical Neurosymbolic AI
BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading
Accelerated Distance-adaptive Methods for Hölder Smooth and Convex Optimization
Rig3R: Rig-Aware Conditioning and Discovery for 3D Reconstruction
Pairwise Calibrated Rewards for Pluralistic Alignment
System-Embedded Diffusion Bridge Models
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Towards 3D Objectness Learning in an Open World
GeoVideo: Introducing Geometric Regularization into Video Generation Model
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models
Shape-Informed Clustering of Multi-Dimensional Functional Data via Deep Functional Autoencoders
Homogeneous Algorithms Can Reduce Competition in Personalized Pricing
Per-Architecture Training-Free Metric Optimization for Neural Architecture Search
Meta CLIP 2: A Worldwide Scaling Recipe
MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning
Directed-Tokens: A Robust Multi-Modality Alignment Approach to Large Language-Vision Models
Exploring and Exploiting Model Uncertainty in Bayesian Optimization
Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning
Predictive Coding Enhances Meta-RL To Achieve Interpretable Bayes-Optimal Belief Representation Under Partial Observability
$O(\sqrt{T})$ Static Regret and Instance Dependent Constraint Violation for Constrained Online Convex Optimization
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Depth-Bounds for Neural Networks via the Braid Arrangement
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Learning to price with resource constraints: from full information to machine-learned prices
Convex Approximation of Two-Layer ReLU Networks for Hidden State Differential Privacy
Certifying Deep Network Risks and Individual Predictions with PAC-Bayes Loss via Localized Priors
BioCG: Constrained Generative Modeling for Biochemical Interaction Prediction
An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models
Flash Invariant Point Attention
PAC-Bayes Bounds for Multivariate Linear Regression and Linear Autoencoders
GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
Deep learning for continuous-time stochastic control with jumps
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics
Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks
MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics
Chain-of-Model Learning for Language Model
Let LRMs Break Free from Overthinking via Self-Braking Tuning
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
Counterfactual Implicit Feedback Modeling
Estimating Model Performance Under Covariate Shift Without Labels
Implicit Generative Property Enhancer
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
Online Strategic Classification With Noise and Partial Feedback
Single-Step Operator Learning for Conditioned Time-Series Diffusion Models
Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks
An Investigation of Memorization Risk in Healthcare Foundation Models
Learning (Approximately) Equivariant Networks via Constrained Optimization
Under the Shadow: Exploiting Opacity Variation for Fine-grained Shadow Detection
Incentivizing LLMs to Self-Verify Their Answers
Residual Stream Analysis of Overfitting And Structural Disruptions
Towards Fully FP8 GEMM LLM Training at Scale
Contrastive Learning with Data Misalignment: Feature Purity, Training Dynamics and Theoretical Generalization Guarantees
A Principled Path to Fitted Distributional Evaluation
PlayerOne: Egocentric World Simulator
WolBanking77: Wolof Banking Speech Intent Classification Dataset
FoGE: Fock Space inspired encoding for graph prompting
Incremental Sequence Classification with Temporal Consistency
NPN: Non-Linear Projections of the Null-Space for Imaging Inverse Problems
Spatial-Aware Decision-Making with Ring Attractors in Reinforcement Learning Systems
DesignX: Human-Competitive Algorithm Designer for Black-Box Optimization
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy
Aha! - Predicting What Matters Next: Online Highlight Detection Without Looking Ahead
Puppeteer: Rig and Animate Your 3D Models
FlashMoE: Fast Distributed MoE in a Single Kernel
The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination
Decomposing stimulus-specific sensory neural information via diffusion models
Over-squashing in Spatiotemporal Graph Neural Networks
Failure Prediction at Runtime for Generative Robot Policies
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
🎧MOSPA: Human Motion Generation Driven by Spatial Audio
CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels
SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Risk-Averse Total-Reward Reinforcement Learning
IBGS: Image-Based Gaussian Splatting
Self-Guided Hierarchical Exploration for Generalist Foundation Model Web Agents
Inference-Time Hyper-Scaling with KV Cache Compression
Neural Tangent Knowledge Distillation for Optical Convolutional Networks
DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images
Are Greedy Task Orderings Better Than Random in Continual Linear Regression?
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Learning Chern Numbers of Multiband Topological Insulators with Gauge Equivariant Neural Networks
On Agnostic PAC Learning in the Small Error Regime
Eliciting Reasoning in Language Models with Cognitive Tools
Diversity-Aware Policy Optimization for Large Language Model Reasoning
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL
Connecting Neural Models Latent Geometries with Relative Geodesic Representations
dKV-Cache: The Cache for Diffusion Language Models
Thinkless: LLM Learns When to Think
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Learning “Partner-Aware” Collaborators in Multi-Party Collaboration
Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames
Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Fast constrained sampling in pre-trained diffusion models
Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Token-Level Self-Play with Importance-Aware Guidance for Large Language Models
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms
Deno-IF: Unsupervised Noisy Visible and Infrared Image Fusion Method
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
On Extending Direct Preference Optimization to Accommodate Ties
RUAGO: Effective and Practical Retain-Free Unlearning via Adversarial Attack and OOD Generator
An Efficient Local Search Approach for Polarized Community Discovery in Signed Networks
Self-Supervised Discovery of Neural Circuits in Spatially Patterned Neural Responses with Graph Neural Networks
AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy
Torch-Uncertainty: Deep Learning Uncertainty Quantification
C-SEO Bench: Does Conversational SEO Work?
ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection
IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers
PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?
CausalDynamics: A large‐scale benchmark for structural discovery of dynamical causal models
Split conformal classification with unsupervised calibration
VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations
GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights
EngiBench: A Framework for Data-Driven Engineering Design Research
CaMiT: A Time-Aware Car Model Dataset for Classification and Generation
AutoOpt: A Dataset and a Unified Framework for Automating Optimization Problem Solving
MLIP Arena: Advancing Fairness and Transparency in Machine Learning Interatomic Potentials via an Open, Accessible Benchmark Platform
Improving Deep Learning for Accelerated MRI With Data Filtering
Towards Automated Petrography
GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data
PF∆: A Benchmark Dataset for Power Flow under Load, Generation, and Topology Variations
FAIR Universe HiggsML Uncertainty Dataset and Competition
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
3D Interaction Geometric Pre-training for Molecular Relational Learning
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition
Imitation Beyond Expectation Using Pluralistic Stochastic Dominance
NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Seg-VAR:Image Segmentation with Visual Autoregressive Modeling
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
Unleashing Diffusion Transformers for Visual Correspondence by Modulating Massive Activations
ROSE: Remove Objects with Side Effects in Videos
Overcoming Challenges of Long-Horizon Prediction in Driving World Models
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework
Position: Bridge the Gaps between Machine Unlearning and AI Regulation
RIGNO: A Graph-based Framework For Robust And Accurate Operator Learning For PDEs On Arbitrary Domains
LLM Unlearning via Neural Activation Redirection
Reverse Engineering Human Preferences with Reinforcement Learning
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Evaluating LLMs in Open-Source Games
FlexOLMo: Open Language Models for Flexible Data Use
Causal Differentiating Concepts: Interpreting LM Behavior via Causal Representation Learning
Symmetry-Preserving Conformer Ensemble Networks for Molecular Representation Learning
CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling
Vision Transformers with Self-Distilled Registers
Beyond Value Functions: Single-Loop Bilevel Optimization under Flatness Conditions
BEDLAM2.0: Synthetic humans and cameras in motion
Inpainting the Neural Picture: Inferring Unrecorded Brain Area Dynamics from Multi-Animal Datasets
Know Thyself by Knowing Others: Learning Neuron Identity from Population Context
When Causal Dynamics Matter: Adapting Causal Strategies through Meta-Aware Interventions
MIHC: Multi-View Interpretable Hypergraph Neural Networks with Information Bottleneck for Chip Congestion Prediction
OmniBench: Towards The Future of Universal Omni-Language Models
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Reconstruction and Secrecy under Approximate Distance Queries
General-Reasoner: Advancing LLM Reasoning Across All Domains
Schrödinger Bridge Matching for Tree-Structured Costs and Entropic Wasserstein Barycentres
Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive
Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies
SD-KDE: Score-Debiased Kernel Density Estimation
Adaptive Inference-Time Scaling via Cyclic Diffusion Search
Simple Distillation for One-Step Diffusion Models
Inference-time Alignment in Continuous Space
FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Stable Matching with Ties: Approximation Ratios and Learning
Quantifying Elicitation of Latent Capabilities in Language Models
Finding Low-Rank Matrix Weights in DNNs via Riemannian Optimization: RAdaGrad and RAdamW
CATransformers: Carbon Aware Transformers Through Joint Model-Hardware Optimization
UniTok: a Unified Tokenizer for Visual Generation and Understanding
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation
Deep Legendre Transform
Learning-Augmented Online Bidding in Stochastic Settings
Non-Convex Tensor Recovery from Tube-Wise Sensing
Tensor-Parallelism with Partially Synchronized Activations
Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Spike-timing-dependent Hebbian learning as noisy gradient descent
Faster Video Diffusion with Trainable Sparse Attention
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
CAMILA: Context-Aware Masking for Image Editing with Language Alignment
Structured Reinforcement Learning for Combinatorial Decision-Making
Accurate and Efficient Low-Rank Model Merging in Core Space
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling
Subsampled Ensemble Can Improve Generalization Tail Exponentially
Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
Dynamic Diameter in High-Dimensions against Adaptive Adversary and Beyond
Replicable Online pricing
Non-monotone Submodular Optimization: $p$-Matchoid Constraints and Fully Dynamic Setting
Reasoning Beyond Points: A Visual Introspective Approach for Few-Shot 3D Segmentation
Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
Thinker: Learning to Think Fast and Slow
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
Concept Incongruence: An Exploration of Time and Death in Role Playing
Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion
Mamba Only Glances Once (MOGO): A Lightweight Framework for Efficient Video Action Detection
Score-Based Diffusion Modeling for Nonparametric Empirical Bayes in Heteroscedastic Gaussian Mixtures
An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction
PurpCode: Reasoning for Safer Code Generation
VMDT: Decoding the Trustworthiness of Video Foundation Models
On the Hardness of Approximating Distributions with Tractable Probabilistic Models
The Temporal Graph of Bitcoin Transactions
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
Multivariate Latent Recalibration for Conditional Normalizing Flows
Flow-GRPO: Training Flow Matching Models via Online RL
Ultrametric Cluster Hierarchies: I Want ‘em All!
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
TV-Rec: Time-Variant Convolutional Filter for Sequential Recommendation
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation
Amortized Active Generation of Pareto Sets
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
4KAgent: Agentic Any Image to 4K Super-Resolution
Root Cause Analysis of Outliers with Missing Structural Knowledge
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Learning Dense Hand Contact Estimation from Imbalanced Data
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
Stochastic Principal-Agent Problems: Computing and Learning Optimal History-Dependent Policies
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
Learning Task-Agnostic Representations through Multi-Teacher Distillation
CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing
Strategyproof Reinforcement Learning from Human Feedback
HiMoLE: Towards OOD-Robust LoRA via Hierarchical Mixture of Experts
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab
HoT-VI: Reparameterizable Variational Inference for Capturing Instance-Level High-Order Correlations
Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimization
Conformal Prediction under Lévy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations
CPO: Condition Preference Optimization for Controllable Image Generation
GradMetaNet: An Equivariant Architecture for Learning on Gradients
OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Memory Injection Attacks on LLM Agents via Query-Only Interaction
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs
Approximately Aligned Decoding
Bits Leaked per Query: Information-Theoretic Bounds for Adversarial Attacks on LLMs
Antidistillation Sampling
Scaling RL to Long Videos
Kinaema: a recurrent sequence model for memory and pose in motion
Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Beyond Last-Click: An Optimal Mechanism for Ad Attribution
Accelerating 3D Molecule Generative Models with Trajectory Diagnosis
Stable Coresets via Posterior Sampling: Aligning Induced and Full Loss Landscapes
Multi-Scale Finetuning for Encoder-based Time Series Foundation Models
SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes
Minimizing False-Positive Attributions in Explanations of Non-Linear Models
Protein Inverse Folding From Structure Feedback
Do-PFN: In-Context Learning for Causal Effect Estimation
AGENTIF: Benchmarking Large Language Models Instruction Following Ability in Agentic Scenarios
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
ProteinConformers: Benchmark Dataset for Simulating Protein Conformational Landscape Diversity and Plausibility
Approximation theory for 1-Lipschitz ResNets
Unified Transferability Metrics for Time Series Foundation Models
Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation
TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising
On the Convergence of Stochastic Smoothed Multi-Level Compositional Gradient Descent Ascent
Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
When No Paths Lead to Rome: Benchmarking Systematic Neural Relational Reasoning
Sekai: A Video Dataset towards World Exploration
Evaluating Program Semantics Reasoning with Type Inference in System $F$
FlySearch: Exploring how vision-language models explore
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?
PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?
FEEL: Quantifying Heterogeneity in Physiological Signals for Generalizable Emotion Recognition
Towards A Generalist Code Embedding Model Based On Massive Data Synthesis
Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation
CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
UMU-Bench: Closing the Modality Gap in Multimodal Unlearning Evaluation
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark
BRACE: A Benchmark for Robust Audio Caption Quality Evaluation
UniHG: A Large-scale Universal Heterogeneous Graph Dataset and Benchmark for Representation Learning and Cross-Domain Transferring
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
Is Artificial Intelligence Generated Image Detection a Solved Problem?
Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs
QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design
Robo2VLM: Improving Visual Question Answering using Large-Scale Robot Manipulation Data
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class
IA-GGAD: Zero-shot Generalist Graph Anomaly Detection via Invariant and Affinity Learning
Decoupled Entropy Minimization
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models
WritingBench: A Comprehensive Benchmark for Generative Writing
ODG: Occupancy Prediction Using Dual Gaussians
BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models
Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents
CrypticBio: A Large Multimodal Dataset for Visually Confusing Species
Demystifying Network Foundation Models
SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning
MUniverse: A Simulation and Benchmarking Suite for Motor Unit Decomposition
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Long-term Intracortical Neural activity and Kinematics (LINK): An intracortical neural dataset for chronic brain-machine interfaces, neuroscience, and machine learning
LTD-Bench: Evaluating Large Language Models by Letting Them Draw
ML4CFD Competition: Results and Retrospective Analysis
CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
CogPhys: Assessing Cognitive Load via Multimodal Remote and Contact-based Physiological Sensing
Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks
Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering
Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation
LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers
MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
BO4Mob: Bayesian Optimization Benchmarks for High-Dimensional Urban Mobility Problem
NS-Gym: A Comprehensive and Open-Source Simulation Framework for Non-Stationary Markov Decision Processes
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
SMMILE: An expert-driven benchmark for multimodal medical in-context learning
BurstDeflicker: A Benchmark Dataset for Flicker Removal in Dynamic Scenes
Is This Tracker On? A Benchmark Protocol for Dynamic Tracking
Solving Inequality Proofs with Large Language Models
WorldModelBench: Judging Video Generation Models As World Models
Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models
EgoBlind: Towards Egocentric Visual Assistance for the Blind
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology
DermaCon-IN: A Multiconcept-Annotated Dermatological Image Dataset of Indian Skin Disorders for Clinical AI Research
A Controllable Examination for Long-Context Language Models
THUNDER: Tile-level Histopathology image UNDERstanding benchmark
IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset
Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications
What’s in Common? Multimodal Models Hallucinate When Reasoning Across Scenes
Results of the Big ANN: NeurIPS’23 competition
DGCBench: A Deep Graph Clustering Benchmark
Comprehensive Assessment and Analysis for NSFW Content Erasure in Text-to-Image Diffusion models
STAR: A Benchmark for Astronomical Star Fields Super-Resolution
vHector and HeisenVec: Scalable Vector Graphics Generation Through Large Language Models
DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models
UniEdit: A Unified Knowledge Editing Benchmark for Large Language Models
UniFoil: A Universal Dataset of Airfoils in Transitional and Turbulent Regimes for Subsonic and Transonic Flows
A Standardized Benchmark for Multilabel Antimicrobial Peptide Classification
Why Do Multi-Agent LLM Systems Fail?
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems
ExAct: A Video-Language Benchmark for Expert Action Analysis
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
NeuroRenderedFake: A Challenging Benchmark to Detect Fake Images Generated by Advanced Neural Rendering Methods
Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception
Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity
IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
TabArena: A Living Benchmark for Machine Learning on Tabular Data
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
ChemPile: A 250 GB Diverse and Curated Dataset for Chemical Foundation Models
NAVIX: Scaling MiniGrid Environments with JAX
COGNAC: Cooperative Graph-based Networked Agent Challenges for Multi-Agent Reinforcement Learning
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
CLEAR: Command Level Annotated Dataset for Ransomware Detection
DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction
CLIMB: Class-imbalanced Learning Benchmark on Tabular Data
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
PolyGuard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset
MS-Bench: Evaluating LMMs in Ancient Manuscript Study through a Dunhuang Case Study
RBench-V: A Primary Assessment for Visual Reasoning Models with Multimodal Outputs
Two Causally Related Needles in a Video Haystack
Towards precision protein-ligand affinity prediction benchmark: A Complete and Modification-Aware DAVIS Dataset
3EED: Ground Everything Everywhere in 3D
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Absence Bench: Language Models Can’t See What’s Missing
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
RESPIN-S1.0: A read speech corpus of 10000+ hours in dialects of nine Indian Languages
Gymnasium: A Standard Interface for Reinforcement Learning Environments
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following
STARC-9: A Large-scale Dataset for Multi-Class Tissue Classification for CRC Histopathology
SynTSBench: Rethinking Temporal Pattern Learning in Deep Learning Models for Time Series
ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
Parameterized Synthetic Text Generation with SimpleStories
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
ArchPower: Dataset for Architecture-Level Power Modeling of Modern CPU Design
FLiP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning
TimE: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios
The Impact of Coreset Selection on Spurious Correlations and Group Robustness
Seeking and Updating with Live Visual Knowledge
Dynamic Risk Assessments for Offensive Cybersecurity Agents
TransferBench: Benchmarking Ensemble-based Black-box Transfer Attacks
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science
Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation
DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding
SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
ReinAD: Towards Real-world Industrial Anomaly Detection with a Comprehensive Contrastive Dataset
OceanBench: A Benchmark for Data-Driven Global Ocean Forecasting systems
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
BikeBench: A Bicycle Design Benchmark for Generative Models with Objectives and Constraints
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
LexiCon: a Benchmark for Planning under Temporal Constraints in Natural Language
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem
PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding
KL Penalty Control via Perturbation for Direct Preference Optimization
ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
Reproducing Kernel Banach Space Models for Neural Networks with Application to Rademacher Complexity Analysis
BrainMoE: Cognition Joint Embedding via Mixture-of-Expert Towards Robust Brain Foundation Model
Graph Neural Network Based Action Ranking for Planning
Quantum Doubly Stochastic Transformers
Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings
Differentiable Constraint-Based Causal Discovery
Improved Approximation Algorithms for Chromatic and Pseudometric-Weighted Correlation Clustering
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
Mitigating Sexual Content Generation via Embedding Distortion in Text-conditioned Diffusion Models
Embodied Cognition Augmented End2End Autonomous Driving
A Driving-Style-Adaptive Framework for Vehicle Trajectory Prediction
RidgeLoRA: Matrix Ridge Enhanced Low-Rank Adaptation of Large Language Models
Distributional LLM-as-a-Judge
Rescaled Influence Functions: Accurate Data Attribution in High Dimension
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding
Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
Privacy Reasoning in Ambiguous Contexts
Table2LaTeX-RL: High-Fidelity LaTeX Code Generation from Table Images via Reinforced Multimodal Language Models
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
Latent Chain-of-Thought for Visual Reasoning
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Sample-Efficient Multi-Round Generative Data Augmentation for Long-Tail Instance Segmentation
H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition
DyFlow: Dynamic Workflow Framework for Agentic Reasoning
A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding
CALM-PDE: Continuous and Adaptive Convolutions for Latent Space Modeling of Time-dependent PDEs
Towards a Geometric Understanding of Tensor Learning via the t-Product
Efficient $k$-Sparse Band–Limited Interpolation with Improved Approximation Ratio
metaTextGrad: Automatically optimizing language model optimizers
Fully Autonomous Neuromorphic Navigation and Dynamic Obstacle Avoidance
Audio-Sync Video Generation with Multi-Stream Temporal Control
Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization
BlurDM: A Blur Diffusion Model for Image Deblurring
DISCO: DISCrete nOise for Conditional Control in Text-to-Image Diffusion Models
Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders
Diffusion Transformers for Imputation: Statistical Efficiency and Uncertainty Quantification
CALM: Culturally Self-Aware Language Models
Adaptive Quantization in Generative Flow Networks for Probabilistic Sequential Prediction
Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization
Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion
Diffusion Federated Dataset
Non-stationary Bandit Convex Optimization: A Comprehensive Study
Why Do Some Language Models Fake Alignment While Others Don't?
MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series
Automated Composition of Agents: A Knapsack Approach for Agentic Component Selection
Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Stackelberg Self-Annotation: A Robust Approach to Data-Efficient LLM Alignment
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
Dual-Space Semantic Synergy Distillation for Continual Learning of Unlabeled Streams
Grounded Reinforcement Learning for Visual Reasoning
UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting
Flow Field Reconstruction with Sensor Placement Policy Learning
CMoB: Modality Valuation via Causal Effect for Balanced Multimodal Learning
Automatic Visual Instrumental Variable Learning for Confounding-Resistant Domain Generalization
Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
Provable Gradient Editing of Deep Neural Networks
Connecting Jensen–Shannon and Kullback–Leibler Divergences: A New Bound for Representation Learning
Generalizable, real-time neural decoding with hybrid state-space models
On the Existence and Complexity of Core-Stable Data Exchanges
This Time is Different: An Observability Perspective on Time Series Foundation Models
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning
Multi-Agent Reinforcement Learning with Communication-Constrained Priors
ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
Conformal Prediction for Causal Effects of Continuous Treatments
FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation
Identifying multi-compartment Hodgkin-Huxley models with high-density extracellular voltage recordings
Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples
Redefining Experts: Interpretable Decomposition of Language Models for Toxicity Mitigation
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Forecasting in Offline Reinforcement Learning for Non-stationary Environments
EGGS: Exchangeable 2D/3D Gaussian Splatting for Geometry-Appearance Balanced Novel View Synthesis
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Detecting Generated Images by Fitting Natural Image Distributions
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
Characterizing the Expressivity of Fixed-Precision Transformer Language Models
Spectral Estimation with Free Decompression
Correcting misinterpretations of additive models
CoUn: Empowering Machine Unlearning via Contrastive Learning
Data Efficient Adaptation in Large Language Models via Continuous Low-Rank Fine-Tuning
DOTA: Distributional Test-time Adaptation of Vision-Language Models
Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
Bootstrapping Hierarchical Autoregressive Formal Reasoner with Chain-of-Proxy-Autoformalization
Understanding Generalization in Physics Informed Models through Affine Variety Dimensions
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
$\textit{HiMaCon:}$ Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data
Improving Formal Reasoning of Transformer with State Stack
Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization
Large language models can learn and generalize steganographic chain-of-thought under process supervision
AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping
Information Retrieval Induced Safety Degradation in AI Agents
Bridging Scales: Spectral Theory Reveals How Local Connectivity Rules Sculpt Global Neural Dynamics in Spatially Extended Networks
Online Inverse Linear Optimization: Efficient Logarithmic-Regret Algorithm, Robustness to Suboptimality, and Lower Bound
FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning
Learning from A Single Markovian Trajectory: Optimality and Variance Reduction
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Prediction with expert advice under additive noise
SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding
Unveiling Transformer Perception by Exploring Input Manifolds
pLSTM: parallelizable Linear Source Transition Mark networks
Model Reconciliation via Cost-Optimal Explanations in Probabilistic Logic Programming
Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection
A Beyond-Worst-Case Analysis of Greedy k-means++
Random Search Neural Networks for Efficient and Expressive Graph Learning
ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection
Retro-R1: LLM-based Agentic Retrosynthesis
Learn2Mix: Training Neural Networks Using Adaptive Data Integration
GRIP: A Graph-Based Reasoning Instruction Producer
When Data Can't Meet: Estimating Correlation Across Privacy Barriers
KSP: Kolmogorov-Smirnov metric-based Post-Hoc Calibration for Survival Analysis
Cost-aware LLM-based Online Dataset Annotation
CURV: Coherent Uncertainty-Aware Reasoning in Vision-Language Models for X-Ray Report Generation
Transductive Conformal Inference for Full Ranking
ENMA: Tokenwise Autoregression for Continuous Neural PDE Operators
A Physics-preserved Transfer Learning Method for Differential Equations
Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models
Deep Tree Tensor Networks
HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs
Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation
Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework
Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality
SHF: Symmetrical Hierarchical Forest with Pretrained Vision Transformer Encoder for High-Resolution Medical Segmentation
ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector
VLMs can Aggregate Scattered Training Patches
ErrorTrace: A Black-Box Traceability Mechanism Based on Model Family Error Space
Exploring the Translation Mechanism of Large Language Models
AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees
Decoding Causal Structure: End-to-End Mediation Pathways Inference
The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels
Diversifying Parallel Ergodic Search: A Signature Kernel Evolution Strategy
Quantization-Free Autoregressive Action Transformer
Structural Causal Bandits under Markov Equivalence
Disentangling Superpositions: Interpretable Brain Encoding Model with Sparse Concept Atoms
The Bias-Variance Tradeoff in Data-Driven Optimization: A Local Misspecification Perspective
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents
Learning to Generalize: An Information Perspective on Neural Processes
ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos
Improving Progressive Generation with Decomposable Flow Matching
FlowNet: Modeling Dynamic Spatio-Temporal Systems via Flow Propagation
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
Fast Data Attribution for Text-to-Image Models
HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis
Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models
Autoregressive Motion Generation with Gaussian Mixture-Guided Latent Sampling
Correlated Low-Rank Adaptation for ConvNets
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
Restricted Global-Aware Graph Filters Bridging GNNs and Transformer for Node Classification
A Multimodal BiMamba Network with Test-Time Adaptation for Emotion Recognition Based on Physiological Signals
Exploring the Noise Robustness of Online Conformal Prediction
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
Tree-Sliced Entropy Partial Transport
Provably Efficient Online RLHF with One-Pass Reward Modeling
NEP: Autoregressive Image Editing via Next Editing Token Prediction
PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation
Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving
A Cramér–von Mises Approach to Incentivizing Truthful Data Sharing
Robust Explanations of Graph Neural Networks via Graph Curvatures
Mitigating Overthinking in Large Reasoning Models via Manifold Steering
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
Compress Large Language Models via Collaboration Between Learning and Matrix Approximation
Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans
Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models
Self-Verifying Reflection Helps Transformers with CoT Reasoning
Model-Based Policy Adaptation for Closed-Loop End-to-end Autonomous Driving
Cascaded Language Models for Cost-Effective Human–AI Decision-Making
From Softmax to Score: Transformers Can Effectively Implement In-Context Denoising Steps
SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
Cost-Aware Contrastive Routing for LLMs
RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models
Uncovering the Spectral Bias in Diagonal State Space Models
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Complete Structure Guided Point Cloud Completion via Cluster- and Instance-Level Contrastive Learning
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
The Best Instruction-Tuning Data are Those That Fit
WISA: World simulator assistant for physics-aware text-to-video generation
Brain-tuning Improves Generalizability and Efficiency of Brain Alignment in Speech Models
WaLRUS: Wavelets for Long range Representation Using State Space Methods
MIND: Material Interface Generation from UDFs for Non-Manifold Surface Reconstruction
Protocols for Verifying Smooth Strategies in Bandits and Games
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
RAPTR: Radar-based 3D Pose Estimation using Transformer
Leveraging Conditional Dependence for Efficient World Model Denoising
Multiplication-Free Parallelizable Spiking Neurons with Efficient Spatio-Temporal Dynamics
QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code
Incentivizing Truthful Language Models via Peer Elicitation Games
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
MJ-Video: Benchmarking and Rewarding Video Generation with Fine-Grained Video Preference
Discretization-free Multicalibration through Loss Minimization over Tree Ensembles
Brain-like Variational Inference
Distributionally Robust Performative Optimization
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Towards Unsupervised Training of Matching-based Graph Edit Distance Solver via Preference-aware GAN
Unraveling Metameric Dilemma for Spectral Reconstruction: A High-Fidelity Approach via Semi-Supervised Learning
Provable Sample-Efficient Transfer Learning Conditional Diffusion Models via Representation Learning
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?
MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching
KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows
The Computational Advantage of Depth in Learning High-Dimensional Hierarchical Targets
StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models
Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
On Learning Verifiers and Implications to Chain-of-Thought Reasoning
Multipole Attention for Efficient Long Context Reasoning
FedRAM: Federated Reweighting and Aggregation for Multi-Task Learning
Epistemic Uncertainty for Generated Image Detection
Consistency Conditions for Differentiable Surrogate Losses
FaCT: Faithful Concept Traces for Explaining Neural Network Decisions
Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound
The Generative Leap: Tight Sample Complexity for Efficiently Learning Gaussian Multi-Index Models
Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models
OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
Audits Under Resource, Data, and Access Constraints: Scaling Laws For Less Discriminatory Alternatives
DLoFT: Gradient-Decoupled Fine-Tuning for Generalizable Long Chain-of-Thought Reasoning
GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility
ViewCraft3D: High-fidelity and View-Consistent 3D Vector Graphics Synthesis
Self-diffusion for Solving Inverse Problems
Self-Assembling Graph Perceptrons
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Generalizable Reasoning through Compositional Energy Minimization
Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values
Efficient PAC Learning for Realizable-Statistic Models via Convex Surrogates
SteerConf: Steering LLMs for Confidence Elicitation
DMol: A Highly Efficient and Chemical Motif-Preserving Molecule Generation Platform
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics
How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs
Bit-swapping Oriented Twin-memory Multi-view Clustering in Lifelong Incomplete Scenarios
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
CCS: Controllable and Constrained Sampling with Diffusion Models via Initial Noise Perturbation
Scaling Up Parameter Generation: A Recurrent Diffusion Approach
Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing
How to Train Your LLM Web Agent: A Statistical Diagnosis
Towards Reliable LLM-based Robots Planning via Combined Uncertainty Estimation
InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
Train on Pins and Test on Obstacles for Rectilinear Steiner Minimum Tree
Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Aligning Compound AI Systems via System-level DPO
Impact of Layer Norm on Memorization and Generalization in Transformers
Value Gradient Guidance for Flow Matching Alignment
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding
Explaining the Law of Supply and Demand via Online Learning
Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference
Stabilizing LTI Systems under Partial Observability: Sample Complexity and Fundamental Limits
HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting
Efficiently Verifiable Proofs of Data Attribution
ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model
Puzzles: Unbounded Video-Depth Augmentation for Scalable End-to-End 3D Reconstruction
RFMPose: Generative Category-level Object Pose Estimation via Riemannian Flow Matching
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
AION-1: Omnimodal Foundation Model for Astronomical Sciences
On Union-Closedness of Language Generation
HoliTom: Holistic Token Merging for Fast Video Large Language Models
OpenCUA: Open Foundations for Computer-Use Agents
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Look-Ahead Reasoning on Learning Platforms
FedFACT: A Provable Framework for Controllable Group-Fairness Calibration in Federated Learning
Can NeRFs "See" without Cameras?
QBasicVSR: Temporal Awareness Adaptation Quantization for Video Super-Resolution
From Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks
Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies
Distributional Autoencoders Know the Score
A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective
Manipulating 3D Molecules in a Fixed-Dimensional E(3)-Equivariant Latent Space
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
Hierarchical Shortest-Path Graph Kernel Network
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
InvFusion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems
Deep Continuous-Time State-Space Models for Marked Event Sequences
The Complexity of Symmetric Equilibria in Min-Max Optimization and Team Zero-Sum Games
Understanding Parametric and Contextual Knowledge Reconciliation within Large Language Models
HeroFilter: Adaptive Spectral Graph Filter for Varying Heterophilic Relations
DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers
Towards Robust Parameter-Efficient Fine-Tuning for Federated Learning
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers, and Gradient Clipping
Tight Bounds for Answering Adaptively Chosen Concentrated Queries
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks
SIFusion: A Unified Fusion Framework for Multi-granularity Arctic Sea Ice Forecasting
On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
FlashBias: Fast Computation of Attention with Bias
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation
Better Training Data Attribution via Better Inverse Hessian-Vector Products
GauSAM: Contour‑Guided 2D Gaussian Fields for Multi‑Scale Medical Image Segmentation with Segment Anything
Infinite Neural Operators: Gaussian processes on functions
FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network
Accident Anticipation via Temporal Occurrence Prediction
LoSplit: Loss-Guided Dynamic Split for Training-Time Defense Against Graph Backdoor Attacks
On the Loss of Context Awareness in General Instruction Fine-tuning
Faster Algorithms for Structured John Ellipsoid Computation
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification
HIDISC: A Hyperbolic Framework for Domain Generalization with Generalized Category Discovery
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
Learning Counterfactual Outcomes Under Rank Preservation
Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Test Time Scaling for Neural Processes
Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks
Democratizing Clinical Risk Prediction with Cross-Cohort Cross-Modal Knowledge Transfer
Understanding the Gain from Data Filtering in Multimodal Contrastive Learning
Flick: Empowering Federated Learning with Commonsense Knowledge
OpenBox: Annotate Any Bounding Boxes in 3D
FlareX: A Physics-Informed Dataset for Lens Flare Removal via 2D Synthesis and 3D Rendering
Discovering Important Experts for Mixture-of-Experts Models Pruning Through a Theoretical Perspective
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
Adversarial Robustness of Nonparametric Regression
Differentially Private Gomory-Hu Trees
Graph Your Own Prompt
Unified Reinforcement and Imitation Learning for Vision-Language Models
BrainODE: Neural Shape Dynamics for Age- and Disease-aware Brain Trajectories
Spectral Learning for Infinite-Horizon Average-Reward POMDPs
Bivariate Matrix-valued Linear Regression (BMLR): Finite-sample performance under Identifiability and Sparsity Assumptions
Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning
GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Unsupervised Trajectory Optimization for 3D Registration in Serial Section Electron Microscopy using Neural ODEs
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws
REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving
CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning
Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models
LoMix: Learnable Weighted Multi-Scale Logits Mixing for Medical Image Segmentation
Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
Switchable Token-Specific Codebook Quantization For Face Image Compression
Tail-Optimized Caching for LLM Inference
InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
LILO: Learning to Reason at the Frontier of Learnability
Beyond Node-Centric Modeling: Sketching Signed Networks with Simplicial Complexes
SNAP: Low-Latency Test-Time Adaptation with Sparse Updates
Enhancing LLM Planning for Robotics Manipulation through Hierarchical Procedural Knowledge Graphs
VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction, Characterization and Recognition
Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Why Popular MOEAs are Popular: Proven Advantages in Approximating the Pareto Front
A Clean Slate for Offline Reinforcement Learning
Faster Generic Identification in Tree-Shaped Structural Causal Models
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Learning conformational ensembles of proteins based on backbone geometry
Heterogeneous Adversarial Play in Interactive Environments
Contextual Dynamic Pricing with Heterogeneous Buyers
Spike-RetinexFormer: Rethinking Low-light Image Enhancement with Spiking Neural Networks
JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensemble Generation
Cue3D: Quantifying the Role of Image Cues in Single-Image 3D Generation
VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models
Training Language Models to Generate Quality Code with Program Analysis Feedback
SegGraph: Leveraging Graphs of SAM Segments for Few-Shot 3D Part Segmentation
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning
Certifying Concavity and Monotonicity in Games via Sum-of-Squares Hierarchies
Heterogeneous Diffusion Structure Inference for Network Cascade
PIVNO: Particle Image Velocimetry Neural Operator
Transferring Causal Effects using Proxies
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
Gaze Beyond the Frame: Forecasting Egocentric 3D Visual Span
Iterative Foundation Model Fine-Tuning on Multiple Rewards
RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers
Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models
Learning from Disjoint Views: A Contrastive Prototype Matching Network for Fully Incomplete Multi-View Clustering
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval
Feedback Guidance of Diffusion Models
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Hierarchical Implicit Neural Emulators
Safety Pretraining: Toward the Next Generation of Safe AI
Best-of-N Jailbreaking
Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs
Differentiable Cyclic Causal Discovery Under Unmeasured Confounders
Clustering via Hedonic Games: New Concepts and Algorithms
Self supervised learning for in vivo localization of microelectrode arrays using raw local field potential
Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations
GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters
Optimal kernel regression bounds under energy-bounded noise
Hierarchical Demonstration Order Optimization for Many-shot In-Context Learning
Guarantees for Alternating Least Squares in Overparameterized Tensor Decompositions
Isotropic Noise in Stochastic and Quantum Convex Optimization
Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization
SGAR: Structural Generative Augmentation for 3D Human Motion Retrieval
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Multiplayer Federated Learning: Reaching Equilibrium with Less Communication
Simple and Optimal Sublinear Algorithms for Mean Estimation
TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model
FedRACE: A Hierarchical and Statistical Framework for Robust Federated Learning
Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning
Eulerian Neural Network Informed by Chemical Transport for Air Quality Forecasting
SRA-CL: Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation
AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections
From Pretraining to Pathology: How Noise Leads to Catastrophic Inheritance in Medical Models
MODEL SHAPLEY: Find Your Ideal Parameter Player via One Gradient Backpropagation
Exploring the Design Space of Diffusion Bridge Models
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning
CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving
Confounding Robust Deep Reinforcement Learning: A Causal Approach
Temporal-Difference Variational Continual Learning
Parsimonious Predictions for Strategyproof Scheduling
CAMO: Convergence-Aware Multi-Fidelity Bayesian Optimization
How Well Can Differential Privacy Be Audited in One Run?
Secure and Confidential Certificates of Online Fairness
Multitask Learning with Stochastic Interpolants
OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels
Object-Centric Concept-Bottlenecks
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
$i$MIND: Insightful Multi-subject Invariant Neural Decoding
Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
An Adaptive Algorithm for Bilevel Optimization on Riemannian Manifolds
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation
DroneAudioset: An Audio Dataset for Drone-based Search and Rescue
Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
TraffiDent: A Dataset for Understanding the Interplay Between Traffic Dynamics and Incidents
Precise Asymptotics and Refined Regret of Variance-Aware UCB
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
On the Optimality of the Median-of-Means Estimator under Adversarial Contamination
What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models
Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation
Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining
Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel–Young Losses
Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means
FAPEX: Fractional Amplitude-Phase Expressor for Robust Cross-Subject Seizure Prediction
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception
Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data
Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms
UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics
Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks
A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging
Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration
Robust Label Proportions Learning
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Multimodal Tabular Reasoning with Privileged Structured Information
AliO: Output Alignment Matters in Long-Term Time Series Forecasting
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
The Persistence of Neural Collapse Despite Low-Rank Bias
Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations
In Silico Mapping of Visual Categorical Selectivity Across the Whole Brain
Measuring Fingerprints of Web-filtered Text Datasets and Fingerprint Propagation Through Training
Information-Theoretic Discrete Diffusion
InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy
Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?
GenIR: Generative Visual Feedback for Mental Image Retrieval
Convergent Functions, Divergent Forms
EvoLM: In Search of Lost Language Model Training Dynamics
SentinelKilnDB: A Large-Scale Dataset and Benchmark for OBB Brick Kiln Detection in South Asia Using Satellite Imagery
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
SEAL: Semantic-Aware Hierarchical Learning for Generalized Category Discovery
Decomposing motor units through elimination for real-time intention driven assistive neurotechnology
FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation
Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization
3DOT: Texture Transfer for 3DGS Objects from a Single Reference Image
TaiwanVQA: Benchmarking and Enhancing Cultural Understanding in Vision-Language Models
FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model
Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
FHGS: Feature-Homogenized Gaussian Splatting
Modelling the control of offline processing with reinforcement learning
Contextual Tokenization for Graph Inverted Indices
HyPlaneHead: Rethinking Tri-plane-like Representations in Full-Head Image Synthesis
MetaKoopman: Bayesian Meta-Learning of Koopman Operators for Modeling Structured Dynamics under Distribution Shifts
Rethinking Entropy in Test-Time Adaptation: The Missing Piece from Energy Duality
SignFlow Bipartite Subgraph Network For Large-Scale Graph Link Sign Prediction
SceneForge: Enhancing 3D-text alignment with Structured Scene Compositions
Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs
Periodic Skill Discovery
UniteFormer: Unifying Node and Edge Modalities in Transformers for Vehicle Routing Problems
EVAAA: A Virtual Environment Platform for Essential Variables in Autonomous and Adaptive Agents
Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training
Pruning-Robust Mamba with Asymmetric Multi-Scale Scanning Paths
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
Identifying interactions across brain areas while accounting for individual-neuron dynamics with a Transformer-based variational autoencoder
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision
Flow based approach for Dynamic Temporal Causal models with non-Gaussian or Heteroscedastic Noises
A Data-Driven Prism: Multi-View Source Separation with Diffusion Model Priors
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM
OpenLex3D: A Tiered Benchmark for Open-Vocabulary 3D Scene Representations
NUTS: Eddy-Robust Reconstruction of Surface Ocean Nutrients via Two-Scale Modeling
Treatment Effect Estimation for Optimal Decision-Making
Evaluating multiple models using labeled and unlabeled data
Near-Optimal Quantum Algorithms for Computing (Coarse) Correlated Equilibria of General-Sum Games
Latent Space Factorization in LoRA
MLEP: Multi-granularity Local Entropy Patterns for Generalized AI-generated Image Detection
Recurrent Memory for Online Interdomain Gaussian Processes
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Deployment Efficient Reward-Free Exploration with Linear Function Approximation
IRRISIGHT: A Large-Scale Multimodal Dataset and Scalable Pipeline to Address Irrigation and Water Management in Agriculture
An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR
The Rashomon Set Has It All: Analyzing Trustworthiness of Trees under Multiplicity
Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts
MUSTAFAR: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Text to Sketch Generation with Multi-Styles
Emergence and Evolution of Interpretable Concepts in Diffusion Models
CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations
Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360° Firefighting Video
Multi-order Orchestrated Curriculum Distillation for Model-Heterogeneous Federated Graph Learning
CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning
REOBench: Benchmarking Robustness of Earth Observation Foundation Models
LABridge: Text–Image Latent Alignment Framework via Mean-Conditioned OU Process
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
CADGrasp: Learning Contact and Collision Aware General Dexterous Grasping in Cluttered Scenes
AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web
Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models
Latent Refinement via Flow Matching for Training-free Linear Inverse Problem Solving
Measuring AI Ability to Complete Long Software Tasks
QSCA: Quantization with Self-Compensating Auxiliary for Monocular Depth Estimation
TADA: Improved Diffusion Sampling with Training-free Augmented DynAmics
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
ProfiX: Improving Profile-Guided Optimization in Compilers with Graph Neural Networks
PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination
MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
PUO-Bench: A Panel Understanding and Operation Benchmark with A Privacy-Preserving Framework
SeasonBench-EA: A Multi-Source Benchmark for Seasonal Prediction and Numerical Model Post-Processing in East Asia
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics
Solving the Asymmetric Traveling Salesman Problem via Trace-Guided Cost Augmentation
Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models
Redundancy-Aware Test-Time Graph Out-of-Distribution Detection
Scalable Fingerprinting of Large Language Models
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning
DUAL: Learning Diverse Kernels for Aggregated Two-sample and Independence Testing
Exploring Diffusion Transformer Designs via Grafting
Training-Free Efficient Video Generation via Dynamic Token Carving
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking
COLA: Towards Efficient Multi-Objective Reinforcement Learning with Conflict Objective Regularization in Latent Space
Post Hoc Regression Refinement via Pairwise Rankings
DeepDiver: Adaptive Web-Search Intensity Scaling via Reinforcement Learning
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
Multi-Agent Learning under Uncertainty: Recurrence vs. Concentration
Accurately Predicting Protein Mutational Effects via a Hierarchical Many-Body Attention Network
Searching Latent Program Spaces
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Learn and Ensemble Bridge Adapters for Multi-domain Task Incremental Learning
Video Perception Models for 3D Scene Synthesis
Robust LLM Alignment via Distributionally Robust Direct Preference Optimization
Cooperative Retrieval-Augmented Generation for Question Answering: Mutual Information Exchange and Ranking by Contrasting Layers
Constructing an Optimal Behavior Basis for the Option Keyboard
GAMMA: Gated Multi-hop Message Passing for Homophily-Agnostic Node Representation in GNNs
Any-stepsize Gradient Descent for Separable Data under Fenchel–Young Losses
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
Autoencoding Random Forests
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
Non-stationary Equivariant Graph Neural Networks for Physical Dynamics Simulation
Spatiotemporal Consensus with Scene Prior for Unsupervised Domain Adaptive Person Search
UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis
DOVTrack: Data-Efficient Open-Vocabulary Tracking
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
The Right to Red-Team: Adversarial AI Literacy as a Civic Imperative in K-12 Education
Automaton Constrained Q-Learning
Monoculture or Multiplicity: Which Is It?
Discovering Data Structures: Nearest Neighbor Search and Beyond
What Do Latent Action Models Actually Learn?
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Real-Time Hyper-Personalized Generative AI Should Be Regulated to Prevent the Rise of "Digital Heroin"
Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
PBR-SR: Mesh PBR Texture Super Resolution from 2D Image Priors
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Vector Database Watermarking
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Time-Embedded Algorithm Unrolling for Computational MRI
REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA
Cooperative Bargaining Games Without Utilities: Mediated Solutions from Direction Oracles
FedEL: Federated Elastic Learning for Heterogeneous Devices
How to build a consistency model: Learning flow maps via self-distillation
Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Guiding LLM Decision-Making with Fairness Reward Models
Convergence Rates of Constrained Expected Improvement
WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Machine Unlearning under Overparameterization
Frequency-Aware Token Reduction for Efficient Vision Transformer
Statistical Analysis of the Sinkhorn Iterations for Two-Sample Schr\"{o}dinger Bridge Estimation
$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning
Differentiation Through Black-Box Quadratic Programming Solvers
Zero-shot Denoising via Neural Compression: Theoretical and algorithmic framework
On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Learning Interactive World Model for Object-Centric Reinforcement Learning
Deciphering the Extremes: A Novel Approach for Pathological Long-tailed Recognition in Scientific Discovery
OCN: Effectively Utilizing Higher-Order Common Neighbors for Better Link Prediction
GSRF: Complex-Valued 3D Gaussian Splatting for Efficient Radio-Frequency Data Synthesis
FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts
Efficient Data Selection at Scale via Influence Distillation
LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation
Anchored Diffusion Language Model
SAINT: Sequence-Aware Integration for Spatial Transcriptomics Multi-View Clustering
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Tensor Product Attention Is All You Need
CARE: Decoding-Time Safety Alignment via Rollback and Introspection Intervention
The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis
ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models
MiniMax-Remover: Taming Bad Noise Helps Video Object Removal
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
Geometric Algorithms for Neural Combinatorial Optimization with Constraints
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Generative Distribution Embeddings
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Enhancing Training Data Attribution with Representational Optimization
Controlling Thinking Speed in Reasoning Models
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Abstract Rendering: Certified Rendering Under 3D Semantic Uncertainty
T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models
Align-DA: Align Score-based Atmospheric Data Assimilation with Multiple Preferences
PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis
Characterization and Learning of Causal Graphs from Hard Interventions
How to Scale Second-Order Optimization
OptiTree: Hierarchical Thoughts Generation with Tree Search for LLM Optimization Modeling
Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
Polyline Path Masked Attention for Vision Transformer
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation Models
Optimal Mistake Bounds for Transductive Online Learning
A Private Approximation of the 2nd-Moment Matrix of Any Subsamplable Input
Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification
Out-of-Distribution Generalized Graph Anomaly Detection with Homophily-aware Environment Mixup
Efficient Multimodal Dataset Distillation via Generative Models
Learning to cluster neuronal function
ViSPLA: Visual Iterative Self-Prompting for Language-Guided 3D Affordance Learning
Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration
Set-LLM: A Permutation-Invariant LLM
Text-to-Code Generation for Modular Building Layouts in Building Information Modeling
Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets
Differentiable Decision Tree via "ReLU+Argmin" Reformulation
Zero-shot protein stability prediction by inverse folding models: a free energy interpretation
Improved Algorithms for Overlapping and Robust Clustering of Edge-Colored Hypergraphs: An LP-Based Combinatorial Approach
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
Attention! Your Vision Language Model Could Be Maliciously Manipulated
Non-Stationary Structural Causal Bandits
Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE
Improving Decision Trees through the Lens of Parameterized Local Search
Optimal Estimation of the Best Mean in Multi-Armed Bandits
4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Promptable 3-D Object Localization with Latent Diffusion Models
Learning from Delayed Feedback in Games via Extra Prediction
Adaptive Discretization for Consistency Models
EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data
Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
Revisiting Logit Distributions for Reliable Out-of-Distribution Detection
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
SpectraLDS: Provable Distillation for Linear Dynamical Systems
Whitened Score Diffusion: A Structured Prior for Imaging Inverse Problems
Efficient Kernelized Learning in Polyhedral Games beyond Full Information: From Colonel Blotto to Congestion Games
Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution
FRBNet: Revisiting Low-Light Vision through Frequency-Domain Radial Basis Network
BoltzNCE: Learning likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation
Edit Flows: Variable Length Discrete Flow Matching with Sequence-Level Edit Operations
Neural Mutual Information Estimation with Vector Copulas
Revolutionizing Training-Free NAS: Towards Efficient Automatic Proxy Discovery via Large Language Models
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
Semi-supervised Vertex Hunting, with Applications in Network and Text Analysis
A-Mem: Agentic Memory for LLM Agents
Learning Parameterized Skills from Demonstrations
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning
A Gradient Guided Diffusion Framework for Chance Constrained Programming
A Counterfactual Semantics for Hybrid Dynamical Systems
You Only Communicate Once: One-shot Federated Low-Rank Adaptation of MLLM
Contextual Thompson Sampling via Generation of Missing Data
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields
OASIS: One-Shot Federated Graph Learning via Wasserstein Assisted Knowledge Integration
SimWorld: An Open-ended Simulator for Agents in Physical and Social Worlds
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Learning World Models for Interactive Video Generation
UEPI: Universal Energy-Behavior-Preserving Integrators for Energy Conservative/Dissipative Differential Equations
Gains: Fine-grained Federated Domain Adaptation in Open Set
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Foundations of Top-$k$ Decoding for Language Models
Transforming Gaps into Gains: Bridging Model and Data Heterogeneity in Federated Learning via Knowledge Weak-Aware Zones
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
A Closer Look at NTK Alignment: Linking Phase Transitions in Deep Image Regression
FlowRefiner: A Robust Traffic Classification Framework against Label Noise
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Optimized Minimal 3D Gaussian Splatting
Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape
ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets
Multimodal LiDAR-Camera Novel View Synthesis with Unified Pose-free Neural Fields
OOD-Barrier: Build a Middle-Barrier for Open-Set Single-Image Test Time Adaptation via Vision Language Models
Low Precision Streaming PCA
Improving Generative Behavior Cloning via Self-Guidance and Adaptive Chunking
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction-Editing Data and Long Captions
Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
AmorLIP: Efficient Language-Image Pretraining via Amortization
Optimal Online Change Detection via Random Fourier Features
SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration
Enhancing the Maximum Effective Window for Long-Term Time Series Forecasting
MiCADangelo: Fine-Grained Reconstruction of Constrained CAD Models from 3D Scans
Online Locally Differentially Private Conformal Prediction via Binary Inquiries
Quantifying and Alleviating Co-Adaptation in Sparse-View 3D Gaussian Splatting
Knowledge Distillation of Uncertainty using Deep Latent Factor Model
Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones
Least squares variational inference
HMVLM:Human Motion-Vision-Language Model via MoE LoRA
CoCoA: A Minimum Bayes Risk Framework Bridging Confidence and Consistency for Uncertainty Quantification in LLMs
Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation
CRRL: Learning Channel-invariant Neural Representations for High-performance Cross-day Decoding
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking
Informed Correctors for Discrete Diffusion Models
Uncertainty Quantification for Physics-Informed Neural Networks with Extended Fiducial Inference
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Probabilistic Reasoning with LLMs for Privacy Risk Estimation
Principled Long-Tailed Generative Modeling via Diffusion Models
Model Merging in Pre-training of Large Language Models
Geometry Aware Operator Transformer as an efficient and accurate neural surrogate for PDEs on arbitrary domains
Video World Models with Long-term Spatial Memory
A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias
Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale
Conformal Prediction for Time-series Forecasting with Change Points
Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon
Fractional Diffusion Bridge Models
Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs
SPMDM: Enhancing Masked Diffusion Models through Simplifing Sampling Path
Permissioned LLMs: Enforcing Access Control in Large Language Models
Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance
Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents
Confidence-Aware With Prototype Alignment for Partial Multi-label Learning
FairDD: Fair Dataset Distillation
Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies
Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
SONAR: Long-Range Graph Propagation Through Information Waves
Tree-Guided Diffusion Planner
DSRF: A Dynamic and Scalable Reasoning Framework for Solving RPMs
Scalable Neural Incentive Design with Parameterized Mean-Field Approximation
Focus-Then-Reuse: Fast Adaptation in Visual Perturbation Environments
Multimodal Causal Reasoning for UAV Object Detection
Characterizing control between interacting subsystems with deep Jacobian estimation
Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective
Adaptive Variance Inflation in Thompson Sampling: Efficiency, Safety, Robustness, and Beyond
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces
Nonparametric Quantile Regression with ReLU-Activated Recurrent Neural Networks
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
SPFL: Sequential updates with Parallel aggregation for Enhanced Federated Learning under Category and Domain Shifts
FlowMixer: A Depth-Agnostic Neural Architecture for Interpretable Spatiotemporal Forecasting
ASDSV: Multimodal Generation Made Efficient with Approximate Speculative Diffusion and Speculative Verification
How many measurements are enough? Bayesian recovery in inverse problems with general distributions
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
ZeroSep: Separate Anything in Audio with Zero Training
Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets
EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths
Fast Local Search Algorithms for Clustering with Adaptive Sampling and Bandit Strategies
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
A Single-Loop First-Order Algorithm for Linearly Constrained Bilevel Optimization
Perception Encoder: The best visual embeddings are not at the output of the network
Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning
Towards the Resistance of Neural Network Fingerprinting to Fine-tuning
ELDET: Early-Learning Distillation with Noisy Labels for Object Detection
Pro3D-Editor: A Progressive Framework for Consistent and Precise 3D Editing
System Prompt Optimization with Meta-Learning
Error Broadcast and Decorrelation as a Potential Artificial and Natural Learning Mechanism
Estimation of Treatment Effects in Extreme and Unobserved Data
Simultaneous Swap Regret Minimization via KL-Calibration
SE-GUI: Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Global Minimizers of Sigmoid Contrastive Loss
Memory Mosaics at scale
BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization
Anytime-valid, Bayes-assisted, Prediction-Powered Inference
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Quantum speedup of non-linear Monte Carlo problems
With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You
Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks
Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering
Benford’s Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Surprise3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
On the sample complexity of semi-supervised multi-objective learning
Multi-agent Markov Entanglement
I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions
Diffusion Guided Adversarial State Perturbations in Reinforcement Learning
Slow Transition to Low-Dimensional Chaos in Heavy-Tailed Recurrent Neural Networks
PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration
No Loss, No Gain: Gated Refinement and Adaptive Compression for Prompt Optimization
DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing
Scalable Policy-Based RL Algorithms for POMDPs
StegoZip: Enhancing Linguistic Steganography Payload in Practice with Large Language Models
Unified all-atom molecule generation with neural fields
Feature Unlearning: Theoretical Foundations and Practical Applications with Shuffling
Fast Monte Carlo Tree Diffusion: 100× Speedup via Parallel and Sparse Planning
FracFace: Breaking The Visual Clues—Fractal-Based Privacy-Preserving Face Recognition
FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies
Generalizable Hand-Object Modeling from Monocular RGB Images via 3D Gaussians
Quantum Speedups for Minimax Optimization and Beyond
High-order Interactions Modeling for Interpretable Multi-Agent Q-Learning
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models
Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Scalable Signature Kernel Computations via Local Neumann Series Expansions
NeuSymEA: Neuro-symbolic Entity Alignment via Variational Inference
TopER: Topological Embeddings in Graph Representation Learning
SMARTraj$^2$: A Stable Multi-City Adaptive Method for Multi-View Spatio-Temporal Trajectory Representation Learning
Environment Inference for Learning Generalizable Dynamical System
xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories
Quantum Visual Fields with Neural Amplitude Encoding
Reparameterized LLM Training via Orthogonal Equivalence Transformation
Self-Adapting Language Models
Agents Robust to Distribution Shifts Learn Causal World Models Even Under Mediation
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing
Scaling Speculative Decoding with Lookahead Reasoning
Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization
Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning
LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale
Beyond Average Value Function in Precision Medicine: Maximum Probability-Driven Reinforcement Learning for Survival Analysis
Efficient Last-Iterate Convergence in Solving Extensive-Form Games
Instance-Optimality for Private KL Distribution Estimation
Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness
From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models
Understanding and Rectifying Safety Perception Distortion in VLMs
Reinforcement learning for one-shot DAG scheduling with comparability identification and dense reward
FastJAM: a Fast Joint Alignment Model for Images
WHAT MAKES MATH PROBLEMS HARD FOR REINFORCEMENT LEARNING: A CASE STUDY
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection
RoomEditor: High-Fidelity Furniture Synthesis with Parameter-Sharing U-Net
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values
Fourier Clouds: Fast Bias Correction for Imbalanced Semi-Supervised Learning
Hybrid Boundary Physics-Informed Neural Networks for Solving Navier-Stokes Equations with Complex Boundary
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Preference Learning with Response Time: Robust Losses and Guarantees
MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Fixed-Point RNNs: Interpolating from Diagonal to Dense
SEGA: Shaping Semantic Geometry for Robust Hashing under Noisy Supervision
Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation
Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data
Explaining and Mitigating Crosslingual Tokenizer Inequities
Self-Improving Embodied Foundation Models
Multilevel neural simulation-based inference
OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation
Tackling Continual Offline RL through Selective Weights Activation on Aligned Spaces
Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Tree of Preferences for Diversified Recommendation
Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme
Asymmetric Dual-Lens Video Deblurring
Large Language Diffusion Models
Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models
Amortized Variational Transdimensional Inference
Zero-Shot Blind-Spot Image Denoising via Cross-Scale Non-Local Pixel Refilling
Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Observation Delays
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward
Generalized Linear Mode Connectivity for Transformers
QuanDA: Quantile-Based Discriminant Analysis for High-Dimensional Imbalanced Classification
Continuous Domain Generalization
Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds
Advancing Interpretability of CLIP Representations with Concept Surrogate Model
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Multi-Agent Collaboration via Evolving Orchestration
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
NEED: Cross-Subject and Cross-Task Generalization for Video and Image Reconstruction from EEG Signals
Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding
Inverse Methods for Missing Data Imputation
Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding
GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization
NormFit: A Lightweight Solution for Few-Shot Federated Learning with Non-IID Data
REMI: Reconstructing Episodic Memory During Internally Driven Path Planning
Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning
Flexible Realignment of Language Models
Towards Robust Uncertainty Calibration for Composed Image Retrieval
Meta Guidance: Incorporating Inductive Biases into Deep Time Series Imputers
DGH: Dynamic Gaussian Hair
UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset
OSKAR: Omnimodal Self-supervised Knowledge Abstraction and Representation
HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models
Planning with Quantized Opponent Models
Topology-Aware Conformal Prediction for Stream Networks
BiggerGait: Unlocking Gait Recognition with Layer-wise Representations from Large Vision Models
EAG3R: Event-Augmented 3D Geometry Estimation for Dynamic and Extreme-Lighting Scenes
DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization
Hybrid Latent Reasoning via Reinforcement Learning
Learning non-equilibrium diffusions with Schrödinger bridges: from exactly solvable to simulation-free
DEGauss: Defending Against Malicious 3D Editing for Gaussian Splatting
Learning Latent Variable Models via Jarzynski-adjusted Langevin Algorithm
$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning
RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo
Mechanism Design via the Interim Relaxation
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees
A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning
PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
A Theory for Worst-Case vs. Average-Case Guarantees for LLMs
Toward Relative Positional Encoding in Spiking Transformers
Generative Caching for Structurally Similar Prompts and Responses
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error
The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Ambient Diffusion Omni: Training Good Models with Bad Data
Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations
LVLM-Driven Attribute-Aware Modeling for Visible-Infrared Person Re-Identification
No Experts, No Problem: Avoidance Learning from Bad Demonstrations
LinEAS: End-to-end Learning of Activation Steering with a Distributional Loss
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback
Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling
Towards Generalizable Detector for Generated Image
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
Sharp Matrix Empirical Bernstein Inequalities
Privacy amplification by random allocation
Bi-Level Decision-Focused Causal Learning for Large-Scale Marketing Optimization: Bridging Observational and Experimental Data
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
A Bayesian Fast-Slow Framework to Mitigate Interference in Non-Stationary Reinforcement Learning
Protein Design with Dynamic Protein Vocabulary
Variational Uncertainty Decomposition for In-Context Learning
Identifiability of Deep Polynomial Neural Networks
ZeroS: Zero‑Sum Linear Attention for Efficient Transformers
PointTruss: K-Truss for Point Cloud Registration
Deep Taxonomic Networks for Unsupervised Hierarchical Prototype Discovery
Centralized Reward Agent for Knowledge Sharing and Transfer in Multi-Task Reinforcement Learning
Attribution-Driven Adaptive Token Pruning for Transformers
GraphTOP: Graph Topology-Oriented Prompting for Graph Neural Networks
SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Learning Neural Exposure Fields for View Synthesis
MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation
Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks
UniViT: Unifying Image and Video Understanding in One Vision Encoder
Permutation Equivariant Neural Controlled Differential Equations for Dynamic Graph Representation Learning
FreeInv: Free Lunch for Improving DDIM Inversion
Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Bootstrap Your Uncertainty: Adaptive Robust Classification Driven by Optimal-Transport
Lifelong Test-Time Adaptation via Online Learning in Tracked Low-Dimensional Subspace
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning
V-CECE: Visual Counterfactual Explanations via Conceptual Edits
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry
Learning from Demonstrations via Capability-Aware Goal Sampling
HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models
Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data
ROVER: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
The Computational Complexity of Counting Linear Regions in ReLU Neural Networks
KAIROS: Scalable Model-Agnostic Data Valuation
Attention-based clustering
Neural Fractional Attention Differential Equations
The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses
URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training
PDEfuncta: Spectrally-Aware Neural Representation for PDE Solution Modeling
SmokeViz: A Large-Scale Satellite Dataset for Wildfire Smoke Detection and Segmentation
Random Forest Autoencoders for Guided Representation Learning
The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Elucidated Rolling Diffusion Models for Probabilistic Forecasting of Complex Dynamics
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Optimizing the Unknown: Black Box Bayesian Optimization with Energy-Based Model and Reinforcement Learning
Influence Functions for Edge Edits in Non-Convex Graph Neural Networks
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset
MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
Conformal Online Learning of Deep Koopman Linear Embeddings
WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild
Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism
Partition to Evolve: Niching-enhanced Evolution with LLMs for Automated Algorithm Discovery
IOSTOM: Offline Imitation Learning from Observations via State Transition Occupancy Matching
Fair Continuous Resource Allocation with Equality of Impact
Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
HubGT: Fast Graph Transformer with Decoupled Hierarchy Labeling
Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment
On the necessity of adaptive regularisation: Optimal anytime online learning on $\boldsymbol{\ell_p}$-balls
HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning
CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems
What are you sinking? A geometric approach on attention sink
OMiSO: Adaptive optimization of state-dependent brain stimulation to shape neural population states
BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces
Controllable Human-centric Keyframe Interpolation with Generative Prior
Co-Regularization Enhances Knowledge Transfer in High Dimensions
AccuQuant: Simulating Multiple Denoising Steps for Quantizing Diffusion Models
The Curse of Depth in Large Language Models
Towards Self-Refinement of Vision-Language Models with Triangular Consistency
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
LeVo: High-Quality Song Generation with Multi-Preference Alignment
SymMaP: Improving Computational Efficiency in Linear Solvers through Symbolic Preconditioning
A$^3$E: Towards Compositional Model Editing
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations
HAODiff: Human-Aware One-Step Diffusion via Dual-Prompt Guidance
Hierarchical Information Aggregation for Incomplete Multimodal Alzheimer's Disease Diagnosis
Learning Dynamics of RNNs in Closed-Loop Environments
StruDiCO: Structured Denoising Diffusion with Gradient-free Inference-stage Boosting for Memory and Time Efficient Combinatorial Optimization
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
Image as a World: Generating Interactive World from Single Image via Panoramic Video Generation
Shape it Up! Restoring LLM Safety during Finetuning
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Uncertainty-Informed Meta Pseudo Labeling for Surrogate Modeling with Limited Labeled Data
SAMPO: Scale-wise Autoregression with Motion Prompt for Generative World Models
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training
Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning
Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning
Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes
Scaffolding Dexterous Manipulation with Vision-Language Models
Hadamard Test is Sufficient for Efficient Quantum Gradient Estimation with Lie Algebraic Symmetries
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
Discovering Latent Graphs with GFlowNets for Diverse Conditional Image Generation
Semi-Supervised Regression with Heteroscedastic Pseudo-Labels
NOBLE - Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models
UniMotion: A Unified Motion Framework for Simulation, Prediction and Planning
Boundary-Value PDEs Meet Higher-Order Differential Topology-aware GNNs
Neural Stochastic Flows: Solver-Free Modelling and Inference for SDE Solutions
Robustness in Both Domains: CLIP Needs a Robust Text Encoder
Dynamic Focused Masking for Autoregressive Embodied Occupancy Prediction
KOALA++: Efficient Kalman-Based Optimization with Gradient-Covariance Products
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Understanding and Mitigating Numerical Sources of Nondeterminism in LLM Inference
Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
Set Smoothness Unlocks Clarke Hyper-stationarity in Bilevel Optimization
Mamba Modulation: On the Length Generalization of Mamba Models
Object Concepts Emerge from Motion
Reasoning Planning for Language Models
LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation
PyraMotion: Attentional Pyramid-Structured Motion Integration for Co-Speech 3D Gesture Synthesis
Practical Bayes-Optimal Membership Inference Attacks
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
Parallelizing MCMC Across the Sequence Length
Personalized Federated Conformal Prediction with Localization
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Safely Learning Controlled Stochastic Dynamics
Continual Knowledge Adaptation for Reinforcement Learning
Dual Prototype-Enhanced Contrastive Framework for Class-Imbalanced Graph Domain Adaptation
Learning to Steer: Input-dependent Steering for Multimodal LLMs
Optical Coherence Tomography Harmonization with Anatomy-Guided Latent Metric Schrödinger Bridges
Learning Preferences without Interaction for Cooperative AI: A Hybrid Offline-Online Approach
Native-Resolution Image Synthesis
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values
Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos
MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
REP: Resource-Efficient Prompting for Rehearsal-Free Continual Learning
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
Beyond Scores: Proximal Diffusion Models
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
List-Level Distribution Coupling with Applications to Speculative Decoding and Lossy Compression
Seeing the Wind from a Falling Leaf
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA
PairEdit: Learning Semantic Variations for Exemplar-based Image Editing
T-norm Selection for Object Detection in Autonomous Driving with Logical Constraints
Visual Sync: Multi‑Camera Synchronization via Cross‑View Object Motion
Minimax-Optimal Univariate Function Selection in Sparse Additive Models: Rates, Adaptation, and the Estimation-Selection Gap
Fine-grained List-wise Alignment for Generative Medication Recommendation
Stochastic Process Learning via Operator Flow Matching
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
ShiQ: Bringing back Bellman to LLMs
MemEIC: A Step Toward Continual and Compositional Knowledge Editing
A Difference-of-Convex Functions Approach to Energy-Based Iterative Reasoning
Transferring Linear Features Across Language Models With Model Stitching
Towards Reliable and Holistic Visual In-Context Learning Prompt Selection
RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Knowledge Distillation Detection for Open-weights Models
Registration is a Powerful Rotation-Invariance Learner for 3D Anomaly Detection
Distances for Markov chains from sample streams
Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment
FedLPA: Local Prior Alignment for Heterogeneous Federated Generalized Category Discovery
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
DynaAct: Large Language Model Reasoning with Dynamic Action Spaces
Less is More: Improving LLM Alignment via Preference Data Selection
Boosting the Uniqueness of Neural Networks Fingerprints with Informative Triggers
Role-aware Multi-agent Reinforcement Learning for Coordinated Emergency Traffic Control
DualMPNN: Harnessing Structural Alignments for High-Recovery Inverse Protein Folding
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
Beyond Expectations: Quantile-Guided Alignment for Risk-Calibrated Language Models
Learning-Augmented Streaming Algorithms for Correlation Clustering
VisualLens: Personalization through Task-Agnostic Visual History
Spectral Conditioning of Attention Improves Transformer Performance
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions
TimePerceiver: An Encoder-Decoder Framework for Generalized Time-Series Forecasting
AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation
When majority rules, minority loses: bias amplification of gradient descent
Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning
Variational Polya Tree
GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection
On Linear Mode Connectivity of Mixture-of-Experts Architectures
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
In Search of Adam’s Secret Sauce
Monotone and Separable Set Functions: Characterizations and Neural Models
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training
Holistic Order Prediction in Natural Scenes
GRIT: Teaching MLLMs to Think with Images
Stochastic Forward-Forward Learning through Representational Dimensionality Compression
Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation
Shaping Sequence Attractor Schema in Recurrent Neural Networks
Multi-Agent Imitation by Learning and Sampling from Factorized Soft Q-Function
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models
Private Set Union with Multiple Contributions
GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
Learning Interestingness in Automated Mathematical Theory Formation
Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Video Temporal Grounding
Reliable Decision‑Making via Calibration‑Oriented Retrieval‑Augmented Generation
WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization
Hybrid Re-matching for Continual Learning with Parameter-Efficient Tuning
QuARI: Query Adaptive Retrieval Improvement
Understanding while Exploring: Semantics-driven Active Mapping
Real-World Reinforcement Learning of Active Perception Behaviors
Kernel-based Equalized Odds: A Quantification of Accuracy-Fairness Trade-off in Fair Representation Learning
MoBA: Mixture of Block Attention for Long-Context LLMs
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
GeoRemover: Removing Objects and Their Causal Visual Artifacts
Codifying Character Logic in Role-Playing
Practical and Effective Code Watermarking for Large Language Models
Projection-Manifold Regularized Latent Diffusion for Robust General Image Fusion
Analogy-based Multi-Turn Jailbreak against Large Language Models
Time-o1: Time-Series Forecasting Needs Transformed Label Alignment
Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies
Omnidirectional 3D Scene Reconstruction from Single Image
Valid Selection among Conformal Sets
Minimum Width for Deep, Narrow MLP: A Diffeomorphism Approach
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
MoCha: Towards Movie-Grade Talking Character Generation
Riemannian Consistency Model
Curl Descent : Non-Gradient Learning Dynamics with Sign-Diverse Plasticity
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
STEAD: Robust Provably Secure Linguistic Steganography with Diffusion Language Model
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Fully Dynamic Algorithms for Chamfer Distance
Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning
LLM Meeting Decision Trees on Tabular Data
A Few Moments Please: Scalable Graphon Learning via Moment Matching
Multidimensional Bayesian Utility Maximization: Tight Approximations to Welfare
Aligning Transformers with Continuous Feedback via Energy Rank Alignment
MOSDT: Self-Distillation-Based Decision Transformer for Multi-Agent Offline Safe Reinforcement Learning
DyMoDreamer: World Modeling with Dynamic Modulation
Efficient Rectified Flow for Image Fusion
Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World
SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning
Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning
Open-World Drone Active Tracking with Goal-Centered Rewards
Corporate Needs You to Find the Difference: Revisiting Submodular and Supermodular Ratio Optimization Problems
MultiScale Contextual Bandits for Long Term Objectives
Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling
CAGE: Continuity-Aware edGE Network Unlocks Robust Floorplan Reconstruction
State Space Prompting via Gathering and Spreading Spatio-Temporal Information for Video Understanding
NopeRoomGS: Indoor 3D Gaussian Splatting Optimization without Camera Pose Input
Two‑Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion
Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion
Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty
The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition
UFM: A Simple Path towards Unified Dense Correspondence with Flow
UMA: A Family of Universal Models for Atoms
Enhancing Bioactivity Prediction via Spatial Emptiness Representation of Protein-ligand Complex and Union of Multiple Pockets
ReSim: Reliable World Simulation for Autonomous Driving
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
RAPID Hand: Robust, Affordable, Perception-Integrated, Dexterous Manipulation Platfrom for Embodied Intelligence
Multiscale guidance of protein structure prediction with heterogeneous cryo-EM data
SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures
Activation-Informed Merging of Large Language Models
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking
Learning to Zoom with Anatomical Relations for Medical Structure Detection
Partition-Then-Adapt: Combating Prediction Bias for Reliable Multi-Modal Test-Time Adaptation
A Unified Framework for Fair Graph Generation: Theoretical Guarantees and Empirical Advances
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization
Novel Class Discovery for Point Cloud Segmentation via Joint Learning of Causal Representation and Reasoning
Replicable Online Learning
From Pixels to Views: Learning Angular-Aware and Physics-Consistent Representations for Light Field Microscopy
Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference
Universal Causal Inference in a Topos
zip2zip: Inference-Time Adaptive Tokenization via Online Compression
Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation
Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations
How Patterns Dictate Learnability in Sequential Data
Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation
Computational Efficiency under Covariate Shift in Kernel Ridge Regression
AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation
In-context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-separation
Continual Release Moment Estimation with Differential Privacy
HYPERION: Fine-Grained Hypersphere Alignment for Robust Federated Graph Learning
WMCopier: Forging Invisible Watermarks on Arbitrary Images
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix
Preserving LLM Capabilities through Calibration Data Curation: From Analysis to Optimization
Tackling Biased Evaluators in Dueling Bandits
Association-Focused Path Aggregation for Graph Fraud Detection
Regularized least squares learning with heavy-tailed noise is minimax optimal
Hyperbolic Fine-Tuning for Large Language Models
FedWMSAM: Fast and Flat Federated Learning via Weighted Momentum and Sharpness-Aware Minimization
On the Hardness of Conditional Independence Testing In Practice
DKDR: Dynamic Knowledge Distillation for Reliability in Federated Learning
Aligning Text to Image in Diffusion Models is Easier Than You Think
Enhancing Tactile-based Reinforcement Learning for Robotic Control
SensorLM: Learning the Language of Wearable Sensors
Training-Free Constrained Generation With Stable Diffusion Models
Transformer brain encoders explain human high-level visual responses
Self-Calibrating BCIs: Ranking and Recovery of Mental Targets Without Labels
Coupled Data and Measurement Space Dynamics for Enhanced Diffusion Posterior Sampling
MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition
Towards Building Model/Prompt-Transferable Attackers against Large Vision-Language Models
Monitoring Risks in Test-Time Adaptation
Order-Level Attention Similarity Across Language Models: A Latent Commonality
Split Gibbs Discrete Diffusion Posterior Sampling
An Optimized Franz-Parisi Criterion and its Equivalence with SQ Lower Bounds
Neural-Driven Image Editing
Distributional Training Data Attribution: What do Influence Functions Sample?
Long-Tailed Recognition via Information-Preservable Two-Stage Learning
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code
SPOT-Trip: Dual-Preference Driven Out-of-Town Trip Recommendation
FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation
A Temporal Difference Method for Stochastic Continuous Dynamics
The Complexity of Correlated Equilibria in Generalized Games
Joint‑Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self‑Supervised Learning
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Emergence of Linear Truth Encodings in Language Models
Distillation Robustifies Unlearning
Causally Reliable Concept Bottleneck Models
Neural Combinatorial Optimization for Time Dependent Traveling Salesman Problem
MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
OPMapper: Enhancing Open-Vocabulary Semantic Segmentation with Multi-Guidance Information
Contrastive Representations for Temporal Reasoning
HiPoNet: A Multi-View Simplicial Complex Network for High Dimensional Point-Cloud and Single-Cell data
Computational Budget Should Be Considered in Data Selection
Many Minds, One Goal: Time Series Forecasting via Sub-task Specialization and Inter-agent Cooperation
Non-Asymptotic Analysis Of Data Augmentation For Precision Matrix Estimation
You Only Spectralize Once: Taking a Spectral Detour to Accelerate Graph Neural Network
RGNMR: A Gauss-Newton method for robust matrix completion with theoretical guarantees
Soft Task-Aware Routing of Experts for Equivariant Representation Learning
Breaking the Gradient Barrier: Unveiling Large Language Models for Strategic Classification
MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs
On Universality Classes of Equivariant Networks
Risk-aware Direct Preference Optimization under Nested Risk Measure
HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis
Native Segmentation Vision Transformers
On the Universal Near Optimality of Hedge in Combinatorial Settings
IF-Guide: Influence Function-Guided Detoxification of LLMs
Exploiting Task Relationships in Continual Learning via Transferability-Aware Task Embeddings
The quest for the GRAph Level autoEncoder (GRALE)
Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning
Reward Reasoning Models
Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models
LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models
Generative Trajectory Stitching through Diffusion Composition
Non-Adaptive Adversarial Face Generation
When Does Curriculum Learning Help? A Theoretical Perspective
CORAL: Disentangling Latent Representations in Long-Tailed Diffusion
Mixture-of-Experts Meets In-Context Reinforcement Learning
Unlabeled Data Improves Fine-Grained Image Zero-shot Classification with Multimodal LLMs
Improved Confidence Regions and Optimal Algorithms for Online and Offline Linear MNL Bandits
Blackbox Model Provenance via Palimpsestic Membership Inference
ROGR: Relightable 3D Objects using Generative Relighting
Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
BitMark: Watermarking Bitwise Autoregressive Image Generative Models
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law
On the Emergence of Linear Analogies in Word Embeddings
Policy Gradient Methods Converge Globally in Imperfect-Information Extensive-Form Games
R$^2$ec: Towards Large Recommender Models with Reasoning
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills
Scalable Neural Network Geometric Robustness Validation via Hölder Optimisation
Alignment of Large Language Models with Constrained Learning
Scaling Diffusion Transformers Efficiently via $\mu$P
Learned Prefix Caching for Efficient LLM Inference
Tabula: A Tabular Self-Supervised Foundation Model for Single-Cell Transcriptomics
Distilling LLM Agent into Small Models with Retrieval and Code Tools
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers
Second-Order Convergence in Private Stochastic Non-Convex Optimization
Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Reinforcement Learning
Language Modeling by Language Models
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
Greedy Algorithms for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure
Small Resamples, Sharp Guarantees: Convergence Rates for Resampled Studentized Quantile Estimators
CREA: A Collaborative Multi-Agent Framework for Creative Image Editing and Generation
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions
Language Models can Self-Improve at State-Value Estimation for Better Search
Conditional Distribution Compression via the Kernel Conditional Mean Embedding
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Strategic Costs of Perceived Bias in Fair Selection
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
Optimal Graph Clustering without Edge Density Signals
A multiscale analysis of mean-field transformers in the moderate interaction regime
A Circular Argument: Does RoPE need to be Equivariant for Vision?
NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs
Multi-Token Prediction Needs Registers
Path Gradients after Flow Matching
TOMCAT: Test-time Comprehensive Knowledge Accumulation for Compositional Zero-Shot Learning
DiffE2E: Rethinking End-to-End Driving with a Hybrid Diffusion-Regression-Classification Policy
FuXi-Ocean: A Global Ocean Forecasting System with Sub-Daily Resolution
MAP Estimation with Denoisers: Convergence Rates and Guarantees
Composition and Alignment of Diffusion Models using Constrained Learning
Multi-agent KTO: Enhancing Strategic Interactions of Large Language Model in Language Game
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
TimeXL: Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop
Bridging Theory and Practice in Link Representation with Graph Neural Networks
Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
CamSAM2: Segment Anything Accurately in Camouflaged Videos
TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs
Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models
TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation
Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
Improving Monte Carlo Tree Search for Symbolic Regression
Hankel Singular Value Regularization for Highly Compressible State Space Models
FORLA: Federated Object-centric Representation Learning with Slot Attention
RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts
Unleashing Foundation Vision Models: Adaptive Transfer for Diverse Data-Limited Scientific Domains
E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
Is the acquisition worth the cost? Surrogate losses for Consistent Two-stage Classifiers
Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions
DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos
Increasing the Utility of Synthetic Images through Chamfer Guidance
Learning Relative Gene Expression Trends from Pathology Images in Spatial Transcriptomics
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
TRAP: Targeted Redirecting of Agentic Preferences
Domain Adaptive Hashing Retrieval via VLM Assisted Pseudo-Labeling and Dual Space Adaptation
AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
CTRL-ALT-DECEIT Sabotage Evaluations for Automated AI R&D
Imagine Beyond ! Distributionally Robust Autoencoding for State Space Coverage in Online Reinforcement Learning
Hamiltonian Neural PDE Solvers through Functional Approximation
Sketched Adaptive Distributed Deep Learning: A Sharp Convergence Analysis
CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic
Pay Attention to Small Weights
Compact Memory for Continual Logistic Regression
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness
DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance
SAFE: Multitask Failure Detection for Vision-Language-Action Models
TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Neurons as Detectors of Coherent Sets in Sensory Dynamics
Learning 3D Persistent Embodied World Models
Reinforcement Learning with Action Chunking
FFN Fusion: Rethinking Sequential Computation in Large Language Models
GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning
Semi-infinite Nonconvex Constrained Min-Max Optimization
Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
Two Heads are Better than One: Simulating Large Transformers with Small Ones
3DPE-Gaze:Unlocking the Potential of 3D Facial Priors for Generalized Gaze Estimation
Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction
Training Robust Graph Neural Networks by Modeling Noise Dependencies
EvoBrain: Dynamic Multi-Channel EEG Graph Modeling for Time-Evolving Brain Networks
Rethinking Joint Maximum Mean Discrepancy for Visual Domain Adaptation
Learning to Generate Human-Human-Object Interactions from Textual Descriptions
When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product
Smooth and Flexible Camera Movement Synthesis via Temporal Masked Generative Modeling
Fourier Analysis Network
LLM-PySC2: Starcraft II learning environment for Large Language Models
Coloring Learning for Heterophilic Graph Representation
The Structure of Relation Decoding Linear Operators in Large Language Models
Once Upon an Input: Reasoning via Per-Instance Program Synthesis
Rethinking the Role of Verbatim Memorization in LLM Privacy
Sampling by averaging: A multiscale approach to score estimation
Linear Transformers Implicitly Discover Unified Numerical Algorithms
Exploring Landscapes for Better Minima along Valleys
CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization
Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis on the Role of Model Complexity
RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases
Strategic Classification with Non-Linear Classifiers
MiNT: Multi-Network Transfer Benchmark for Temporal Graph Learning
One SPACE to Rule Them All: Jointly Mitigating Factuality and Faithfulness Hallucinations in LLMs
$\texttt{AVROBUSTBENCH}$: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
On the Empirical Power of Goodness-of-Fit Tests in Watermark Detection
Learning Differential Pyramid Representation for Tone Mapping
GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
Opinion Maximization in Social Networks by Modifying Internal Opinions
Enhancing Interpretability in Deep Reinforcement Learning through Semantic Clustering
PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-forward Planar Splatting
Orientation Matters: Making 3D Generative Models Orientation-Aligned
SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Vision-centric Token Compression in Large Language Model
Metric Automata Theory: A Unifying Theory of RNNs
To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning
MixSignGraph: A Sign Sequence is Worth Mixed Graphs of Nodes
Adaptive Gradient Masking for Balancing ID and MLLM-based Representations in Recommendation
Dataset Distillation of 3D Point Clouds via Distribution Matching
Aligning Text-to-Image Diffusion Models to Human Preference by Classification
REINFORCE Converges to Optimal Policies with Any Learning Rate
Learning to Control Free-Form Soft Swimmers
Is Your Diffusion Model Actually Denoising?
Joint Relational Database Generation via Graph-Conditional Diffusion Models
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
GOATex: Geometry & Occlusion-Aware Texturing
Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge
DiffBreak: Is Diffusion-Based Purification Robust?
Optimal Neural Compressors for the Rate-Distortion-Perception Tradeoff
ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation
Improved Representation Steering for Language Models
Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning
Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation
Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Towards Understanding Transformers in Learning Random Walks
Avoiding exp(R) scaling in RLHF through Preference-based Exploration
Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting
Dense Associative Memory with Epanechnikov Energy
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism
Make Information Diffusion Explainable: LLM-based Causal Framework for Diffusion Prediction
REVE: A Foundation Model for EEG - Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
Online Mixture of Experts: No-Regret Learning for Optimal Collective Decision-Making
Gate to the Vessel: Residual Experts Restore What SAM Overlooks
Sequential Monte Carlo for Policy Optimization in Continuous POMDPs
LBMKGC: Large Model-Driven Balanced Multimodal Knowledge Graph Completion
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Robust learning of halfspaces under log-concave marginals
Q3R: Quadratic Reweighted Rank Regularizer for Effective Low-Rank Training
Conformal Mixed-Integer Constraint Learning with Feasibility Guarantees
QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training
DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Wasserstein Convergence of Critically Damped Langevin Diffusions
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Fast Training of Large Kernel Models with Delayed Projections
Adv-SSL: Adversarial Self-Supervised Representation Learning with Theoretical Guarantees
Density Ratio-Free Doubly Robust Proxy Causal Learning
Stability and Oracle Inequalities for Optimal Transport Maps between General Distributions
KScope: A Framework for Characterizing the Knowledge Status of Language Models
Differentiable extensions with rounding guarantees for combinatorial optimization over permutations
Spectral Graph Coarsening Using Inner Product Preservation and the Grassmann Manifold
Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Extrapolation by Association: Length Generalization Transfer In Transformers
ControlFusion: A Controllable Image Fusion Network with Language-Vision Degradation Prompts
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents
RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving
From Indicators to Insights: Diversity-Optimized for Medical Series-Text Decoding via LLMs
Class-wise Balancing Data Replay for Federated Class-Incremental Learning
Balanced Active Inference
Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models
Learning Multi-Source and Robust Representations for Continual Learning
AI Debate Aids Assessment of Controversial Claims
CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding
Learning Expandable and Adaptable Representations for Continual Learning
TimeWak: Temporal Chained-Hashing Watermark for Time Series Data
Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs
NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge
Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Bootstrapping
Transfer Learning for Benign Overfitting in High-Dimensional Linear Regression
Balancing Multimodal Training Through Game-Theoretic Regularization
AlignedGen: Aligning Style Across Generated Images
LLM Safety Alignment is Divergence Estimation in Disguise
PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs
Dyn-O: Building Structured World Models with Object-Centric Representations
Generalization Bounds for Kolmogorov-Arnold Networks (KANs) and Enhanced KANs with Lower Lipschitz Complexity
Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization
Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Private Training Large-scale Models with Efficient DP-SGD
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Vision Transformers Don't Need Trained Registers
Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms
Learning in Stackelberg Mean Field Games: A Non-Asymptotic Analysis
Each Complexity Deserves a Pruning Policy
TRACE: Contrastive learning for multi-trial time series data in neuroscience
Learning Theory for Kernel Bilevel Optimization
Rectified Point Flow: Generic Point Cloud Pose Estimation
Fast Projection-Free Approach (without Optimization Oracle) for Optimization over Compact Convex Set
Towards Understanding the Mechanisms of Classifier-Free Guidance
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts
Bézier Splatting for Fast and Differentiable Vector Graphics Rendering
Exponential Dynamic Energy Network for High Capacity Sequence Memory
MATCH: Multi-faceted Adaptive Topo-Consistency for Semi-Supervised Histopathology Segmentation
ReCon-GS: Continuum-Preserved Guassian Streaming for Fast and Compact Reconstruction of Dynamic Scenes
Quantifying Cross-Modality Memorization in Vision-Language Models
Graph Diffusion that can Insert and Delete
DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models
Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go
LOPT: Learning Optimal Pigovian Tax in Sequential Social Dilemmas
QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation
Scaling Up Active Testing to Large Language Models
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
Advancing Wasserstein Convergence Analysis of Score-Based Models: Insights from Discretization and Second-Order Acceleration
Fairness-aware Bayes Optimal Functional Classification
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification
3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs
Thresholds for sensitive optimality and Blackwell optimality in stochastic games
Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences
SHGR: A Generalized Maximal Correlation Coefficient
WorldMem: Long-term Consistent World Simulation with Memory
DISCO: Disentangled Communication Steering for Large Language Models
Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost
Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-Index Models
Restricted Spectral Gap Decomposition for Simulated Tempering Targeting Mixture Distributions
Restoring Pruned Large Language Models via Lost Component Compensation
FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models
How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?
Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models
Flexible MOF Generation with Torsion-Aware Flow Matching
Axial Neural Networks for Dimension-Free Foundation Models
Geometry-Aware Edge Pooling for Graph Neural Networks
From Euler to AI: Unifying Formulas for Mathematical Constants
IntrinsiX: High-Quality PBR Generation using Image Priors
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Event-Driven Dynamic Scene Depth Completion
Combinatorial Ski Rental Problem: Robust and Learning-Augmented Algorithms
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Is Limited Participant Diversity Impeding EEG-based Machine Learning?
Imagine360: Immersive 360 Video Generation from Perspective Anchor
OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction
Preventing Shortcuts in Adapter Training via Providing the Shortcuts
Towards A Translative Model of Sperm Whale Vocalization
Quantifying Task-relevant Similarities in Representations Using Decision Variable Correlations
An Adaptive Quantum Circuit of Dempster's Rule of Combination for Uncertain Pattern Classification
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Finite-Time Analysis of Stochastic Nonconvex Nonsmooth Optimization on the Riemannian Manifolds
Optimal Regret Bounds via Low-Rank Structured Variation in Non-Stationary Reinforcement Learning
Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints
Towards Reliable Identification of Diffusion-based Image Manipulations
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
$\text{G}^2\text{M}$: A Generalized Gaussian Mirror Method to Boost Feature Selection Power
DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis
What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Projective Equivariant Networks via Second-order Fundamental Differential Invariants
Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs
Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
SparseMVC: Probing Cross-view Sparsity Variations for Multi-view Clustering
DAPO : Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage-Based Policy Optimization
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models
F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning
GoRA: Gradient-driven Adaptive Low Rank Adaptation
Learning to Factorize Spatio-Temporal Foundation Models
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models
A Unified Approach to Submodular Maximization Under Noise
Seemingly Redundant Modules Enhance Robust Odor Learning in Fruit Flies
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Towards Unsupervised Domain Bridging via Image Degradation in Semantic Segmentation
Online Experimental Design With Estimation-Regret Trade-off Under Network Interference
A Unified Framework for the Transportability of Population-Level Causal Measures
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Latency NMS Attacks: Is It Real Life or Is It Just Fantasy?
Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
Unveiling Concept Attribution in Diffusion Models
Less is More: Local Intrinsic Dimensions of Contextual Language Models
RF-Agent: Automated Reward Function Design via Language Agent Tree Search
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator
Joint Hierarchical Representation Learning of Samples and Features via Informed Tree-Wasserstein Distance
EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions
Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems
Efficient RAW Image Deblurring with Adaptive Frequency Modulation
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies
Convergence of the Gradient Flow for Shallow ReLU Networks on Weakly Interacting Data
Human-assisted Robotic Policy Refinement via Action Preference Optimization
Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect Storage
Efficient Adaptive Federated Optimization
Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
Enhanced Self-Distillation Framework for Efficient Spiking Neural Network Training
Fading to Grow: Growing Preference Ratios via Preference Fading Discrete Diffusion for Recommendation
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Robust Sampling for Active Statistical Inference
Omni-Mol: Multitask Molecular Model for Any-to-any Modalities
Kuramoto Orientation Diffusion Models
Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning
Free-Lunch Color-Texture Disentanglement for Stylized Image Generation
RiboFlow: Conditional De Novo RNA Co-Design via Synergistic Flow Matching
Sequence Modeling with Spectral Mean Flows
Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video
Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning
NaDRO: Leveraging Dual-Reward Strategies for LLMs Training on Noisy Data
Online Optimization for Offline Safe Reinforcement Learning
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing
DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches
Place Cells as Multi-Scale Position Embeddings: Random Walk Transition Kernels for Path Planning
Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers
An Information-theoretical Framework for Understanding Out-of-distribution Detection with Pretrained Vision-Language Models
SOMBRL: Scalable and Optimistic Model-Based RL
Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits
ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding
OmniTalker: One-shot Real-time Text-Driven Talking Audio-Video Generation With Multimodal Style Mimicking
RayFusion: Ray Fusion Enhanced Collaborative Visual Perception
FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression
Plasticity as the Mirror of Empowerment
Negative Feedback Really Matters: Signed Dual-Channel Graph Contrastive Learning Framework for Recommendation
Causal-R: A Causal-Reasoning Geometry Problem Solver for Optimized Solution Exploration
Data-Free Model Extraction for Black-box Recommender Systems via Graph Convolutions
Contrastive Self-Supervised Learning As Neural Manifold Packing
Feasibility-Aware Decision-Focused Learning for Predicting Parameters in the Constraints
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models
Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals
OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation
Efficient Allocation of Working Memory Resource for Utility Maximization in Humans and Recurrent Neural Networks
Generating Physically Sound Designs from Text and a Set of Physical Constraints
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction
MaxSup: Overcoming Representation Collapse in Label Smoothing
DPA: A one-stop metric to measure bias amplification in classification datasets
Collaborative and Confidential Junction Trees for Hybrid Bayesian Networks
Universal Few-shot Spatial Control for Diffusion Models
When Can Model-Free Reinforcement Learning be Enough for Thinking?
Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes
Graph Alignment via Birkhoff Relaxation
Flatten Graphs as Sequences: Transformers are Scalable Graph Generators
ARIA: Training Language Agents with Intention-driven Reward Aggregation
3D Gaussian Splatting based Scene-independent Relocalization with Unidirectional and Bidirectional Feature Fusion
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Preference Distillation via Value based Reinforcement Learning
Low-degree evidence for computational transition of recovery rate in stochastic block model
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
STaRFormer: Semi-Supervised Task-Informed Representation Learning via Dynamic Attention-Based Regional Masking for Sequential Data
Think Only When You Need with Large Hybrid-Reasoning Models
Theoretical Benefit and Limitation of Diffusion Language Model
Stochastic Optimization in Semi-Discrete Optimal Transport: Convergence Analysis and Minimax Rate
Understanding protein function with a multimodal retrieval-augmented foundation model
Continuous Simplicial Neural Networks
Irrational Complex Rotations Empower Low-bit Optimizers
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Think before Recommendation: Autonomous Reasoning-enhanced Recommender
Less is More: an Attention-free Sequence Prediction Modeling for Offline Embodied Learning
STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text
HMARL-CBF – Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems
Automatic Auxiliary Task Selection and Adaptive Weighting Boost Molecular Property Prediction
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
RODS: Robust Optimization Inspired Diffusion Sampling for Detecting and Reducing Hallucination in Generative Models
Learning and Planning Multi-Agent Tasks via an MoE-based World Model
BrainFlow: A Holistic Pathway of Dynamic Neural System on Manifold
Learning the Plasticity: Plasticity-Driven Learning Framework in Spiking Neural Networks
SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes
Multi-head Temporal Latent Attention
Revolutionizing Graph Aggregation: From Suppression to Amplification via BoostGCN
Structured Initialization for Vision Transformers
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Test-Time Scaling of Diffusion Models via Noise Trajectory Search
Interactive Cross-modal Learning for Text-3D Scene Retrieval
Distributed mediation analysis with communication efficiency
ASGO: Adaptive Structured Gradient Optimization
A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning
IneqSearch: Hybrid Reasoning for Olympiad Inequality Proofs
Robustly Learning Monotone Single-Index Models
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Adaptive Cannistraci-Hebb Network Automata Modelling of Complex Networks for Path-based Link Prediction
Do different prompting methods yield a common task representation in language models?
Enhanced Expert Merging for Mixture-of-Experts in Graph Foundation Models
Tracking and Understanding Object Transformations
Cloud4D: Estimating Cloud Properties at a High Spatial and Temporal Resolution
Transfer Faster, Price Smarter: Minimax Dynamic Pricing under Cross-Market Preference Shift
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
REOrdering Patches Improves Vision Models
Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
YOLOv12: Attention-Centric Real-Time Object Detectors
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
MeCeFO: Enhancing LLM Training Robustness via Fault-Tolerant Optimization
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
Does Stochastic Gradient really succeed for bandits?
Statistical Inference for Gradient Boosting Regression
ProtoPairNet: Interpretable Regression through Prototypical Pair Reasoning
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points
Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search
DAA: Amplifying Unknown Discrepancy for Test-Time Discovery
Non-Clairvoyant Scheduling with Progress Bars
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs
Sample-Adaptivity Tradeoff in On-Demand Sampling
On the Robustness of Transformers against Context Hijacking for Linear Classification
Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings
BioOSS: A Bio-Inspired Oscillatory State System with Spatio-Temporal Dynamics
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration
DINO-Foresight: Looking into the Future with DINO
Enforcing Hard Linear Constraints in Deep Learning Models with Decision Rules
Fairness-aware Anomaly Detection via Fair Projection
Adaptive Preference Arithmetic: A Personalized Agent with Adaptive Preference Arithmetic for Dynamic Preference Modeling
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis
Training-free Online Video Step Grounding
WKV-sharing embraced random shuffle RWKV high-order modeling for pan-sharpening
Jury-and-Judge Chain-of-Thought for Uncovering Toxic Data in 3D Visual Grounding
Towards Accurate Time Series Forecasting via Implicit Decoding
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Curvature Tuning: Provable Training-free Model Steering From a Single Parameter
Online Feedback Efficient Active Target Discovery in Partially Observable Environments
Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization
Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos
Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning
Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models
DePass: Unified Feature Attributing by Simple Decomposed Forward Pass
Counterfactual Identifiability via Dynamic Optimal Transport
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
OmniTry: Virtual Try-On Anything without Masks
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization
Real-Time Scene-Adaptive Tone Mapping for High-Dynamic Range Object Detection
BAM-ICL: Causal Hijacking In-Context Learning with Budgeted Adversarial Manipulation
Consistency of the $k_n$-nearest neighbor rule under adaptive sampling
Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
Self-Perturbed Anomaly-Aware Graph Dynamics for Multivariate Time-Series Anomaly Detection
Kernel conditional tests from learning-theoretic bounds
Revisiting Frank-Wolfe for Structured Nonconvex Optimization
Formal Models of Active Learning from Contrastive Examples
PhysX-3D: Physical-Grounded 3D Asset Generation
Nonlinear Laplacians: Tunable principal component analysis under directional prior information
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
Controlling the Flow: Stability and Convergence for Stochastic Gradient Descent with Decaying Regularization
Vicinal Label Supervision for Reliable Aleatoric and Epistemic Uncertainty Estimation
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Inference-Time Reward Hacking in Large Language Models
Cross-Modal Representational Knowledge Distillation for Enhanced Spike-informed LFP Modeling
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs
On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling
U-CAN: Unsupervised Point Cloud Denoising with Consistency-Aware Noise2Noise Matching
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Radial Attention: $\mathcal O(n \log n)$ Sparse Attention for Long Video Generation
Reliable Lifelong Multimodal Editing: Conflict-Aware Retrieval Meets Multi-Level Guidance
RULE: Reinforcement UnLEarning Achieves Forget-retain Pareto Optimality
Scalable Evaluation and Neural Models for Compositional Generalization
Generalizing while preserving monotonicity in comparison-based preference learning models
Taming generative video models for zero-shot optical flow extraction
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Horizon Reduction Makes RL Scalable
Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
Optimal Minimum Width for the Universal Approximation of Continuously Differentiable Functions by Deep Narrow MLPs
PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement
Less Greedy Equivalence Search
TokMan:Tokenize Manhattan Mask Optimization for Inverse Lithography
Optimal Nuisance Function Tuning for Estimating a Doubly Robust Functional under Proportional Asymptotics
Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios
Gaussian-Augmented Physics Simulation and System Identification with Complex Colliders
FIGRDock: Fast Interaction-Guided Regression for Flexible Docking
Memory-Integrated Reconfigurable Adapters: A Unified Framework for Settings with Multiple Tasks
Reliably detecting model failures in deployment without labels
PIPE: Physics-Informed Position Encoding for Alignment of Satellite Images and Time Series in Typhoon Forecasting
D2SA: Dual-Stage Distribution and Slice Adaptation for Efficient Test-Time Adaptation in MRI Reconstruction
Uni-LoRA: One Vector is All You Need
Diffusion Models Meet Contextual Bandits
Know What You Don't Know: Uncertainty Calibration of Process Reward Models
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Language Models Can Predict Their Own Behavior
Fairness under Competition
Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities
Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere
BundleFlow: Deep Menus for Combinatorial Auctions by Diffusion-Based Optimization
TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming
High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction
FLOWING: Implicit Neural Flows for Structure-Preserving Morphing
VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image
Generating and Checking DNN Verification Proofs
Hadamax Encoding: Elevating Performance in Model-Free Atari
Modeling the Economic Impacts of AI Openness Regulation
A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
ShapeEmbed: a self-supervised learning framework for 2D contour quantification
Tight Bounds for Maximum Weight Matroid Independent Set and Matching in the Zero Communication Model
Causal LLM Routing: End-to-End Regret Minimization from Observational Data
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
High-Order Flow Matching: Unified Framework and Sharp Statistical Rates
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code
Product Distribution Learning with Imperfect Advice
Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments
CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images
AdaTS: Learning Adaptive Time Series Representations via Dynamic Soft Contrasts
DecompNet: Enhancing Time Series Forecasting Models with Implicit Decomposition
Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents
Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
Towards Realistic Earth-Observation Constellation Scheduling: Benchmark and Methodology
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
Sign-In to the Lottery: Reparameterizing Sparse Training
Advancing Expert Specialization for Better MoE
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Luminance-Aware Statistical Quantization: Unsupervised Hierarchical Learning for Illumination Enhancement
PALQO: Physics-informed model for Accelerating Large-scale Quantum Optimization
Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks
TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence
Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion
Less but More: Linear Adaptive Graph Learning Empowering Spatiotemporal Forecasting
CHiQPM: Calibrated Hierarchical Interpretable Image Classification
Long-tailed Recognition with Model Rebalancing
Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting
Efficient Spectral Control of Partially Observed Linear Dynamical Systems
Streaming Federated Learning with Markovian Data
Degradation-Aware Dynamic Schrödinger Bridge for Unpaired Image Restoration
Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting
On the Role of Hidden States of Modern Hopfield Network in Transformer
SAGE: A Unified Framework for Generalizable Object State Recognition with State-Action Graph Embedding
LaRes: Evolutionary Reinforcement Learning with LLM-based Adaptive Reward Search
Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
SparseDiT: Token Sparsification for Efficient Diffusion Transformer
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Generalized Top-k Mallows Model for Ranked Choices
Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization
Non-Uniform Multiclass Learning with Bandit Feedback
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
Improved Scaling Laws in Linear Regression via Data Reuse
Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression
D-VST: Diffusion Transformer for Pathology-Correct Tone-Controllable Cross-Dye Virtual Staining of Whole Slide Images
SING: SDE Inference via Natural Gradients
MixPrompt: Efficient Mixed Prompting for Multimodal Semantic Segmentation
Reverse-Annealed Sequential Monte Carlo for Efficient Bayesian Optimal Experiment Design
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning
Differentiable Structure Learning and Causal Discovery for General Binary Data
Deep Compositional Phase Diffusion for Long Motion Sequence Generation
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment
Bayesian Optimization with Preference Exploration using a Monotonic Neural Network Ensemble
Integration Matters for Learning PDEs with Backwards SDEs
Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Emergence and scaling laws in SGD learning of shallow neural networks
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model
AegisGuard: RL-Guided Adapter Tuning for TEE-Based Efficient & Secure On-Device Inference
Model–Behavior Alignment under Flexible Evaluation: When the Best-Fitting Model Isn’t the Right One
ConTextTab: A Semantics-Aware Tabular In-Context Learner
Future Link Prediction Without Memory or Aggregation
AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise
Cameras as Relative Positional Encoding
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Foundation Cures Personalization: Improving Personalized Models’ Prompt Consistency via Hidden Foundation Knowledge
Evaluating LLM-contaminated Crowdsourcing Data Without Ground Truth
R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO
DualEqui: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules
Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities
Not All Data are Good Labels: On the Self-supervised Labeling for Time Series Forecasting
Teaching Language Models to Reason with Tools
Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization
Learning Across the Gap: Hybrid Multi-armed Bandits with Heterogeneous Offline and Online Data
Large Language Models Think Too Fast To Explore Effectively
On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
3D Equivariant Visuomotor Policy Learning via Spherical Projection
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
PASS: Path-selective State Space Model for Event-based Recognition
Generating Informative Samples for Risk-Averse Fine-Tuning of Downstream Tasks
Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers
Subspace Networks: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Local-Global Coupling Spiking Graph Transformer for Brain Disorders Diagnosis from Two Perspectives
Value-Guided KV Compression for LLMs via Approximated CUR Decomposition
Stochastically Dominant Peer Prediction
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Fit the Distribution: Cross-Image/Prompt Adversarial Attacks on Multimodal Large Language Models
Mechanistic Interpretability of RNNs emulating Hidden Markov Models
Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
GSPN-2: Efficient Parallel Sequence Modeling
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
Object-centric 3D Motion Field for Robot Learning from Human Videos
un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
Balanced Conic Rectified Flow
Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
Where Does It Exist from the Low-Altitude: Spatial Aerial Video Grounding
FP4 All the Way: Fully Quantized Training of Large Language Models
RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Discrete Spatial Diffusion: Intensity-Preserving Diffusion Modeling
A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond
Adaptive Time Encoding for Irregular Multivariate Time-Series Classification
Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers
On Evaluating Policies for Robust POMDPs
AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings
On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study
A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics
Prior-Guided Diffusion Planning for Offline Reinforcement Learning
iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
ModHiFi: Identifying High Fidelity predictive components for Model Modification
Optimal Rates in Continual Linear Regression via Increasing Regularization
Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems
On Inductive Biases That Enable Generalization in Diffusion Transformers
Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Robust Hyperbolic Learning with Curvature-Aware Optimization
Approximating Shapley Explanations in Reinforcement Learning
SmartCache: Context-aware Semantic Cache for Efficient Multi-turn LLM Inference
Estimating cognitive biases with attention-aware inverse planning
SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing
From Human Attention to Diagnosis: Semantic Patch-Level Integration of Vision-Language Models in Medical Imaging
LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities
Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Conformal Prediction in The Loop: A Feedback-Based Uncertainty Model for Trajectory Optimization
Securing the Language of Life: Inheritable Watermarks from DNA Language Models to Proteins
Scaling Data-Driven Probabilistic Robustness Analysis for Semantic Segmentation Neural Networks
ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts
Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning
Towards a General Attention Framework on Gyrovector Spaces for Matrix Manifolds
On the creation of narrow AI: hierarchy and nonlocality of neural network skills
Prompt-guided Disentangled Representation for Action Recognition
VTON-VLLM: Aligning Virtual Try-On Models with Human Preferences
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
OPTFM: A Scalable Multi-View Graph Transformer for Hierarchical Pre-Training in Combinatorial Optimization
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens
Approximate Domain Unlearning for Vision-Language Models
Pruning Spurious Subgraphs for Graph Out-of-Distribution Generalization
Efficient Large Language Model Inference with Neural Block Linearization
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework
Predictive Preference Learning from Human Interventions
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning
Glocal Information Bottleneck for Time Series Imputation
Gatekeeper: Improving Model Cascades Through Confidence Tuning
Caption This, Reason That: VLMs Caught in the Middle
Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution
From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications
FEEDBACK FRICTION: LLMs Struggle to Fully Incorporate External Feedback
Thousand Voices of Trauma: A Large-Scale Synthetic Dataset for Modeling Prolonged Exposure Therapy Conversations
Magical: Medical Lay Language Generation via Semantic Invariance and Layperson-tailored Adaptation
Exploring Polyglot Harmony: On Multilingual Data Allocation for Large Language Models Pretraining
BIPNN: Learning to Solve Binary Integer Programming via Hypergraph Neural Networks
AGI-Elo: How Far Are We From Mastering A Task?
The Complexity of Finding Local Optima in Contrastive Learning
The Omni-Expert: A Computationally Efficient Approach to Achieve a Mixture of Experts in a Single Expert Model
From stability of Langevin diffusion to convergence of proximal MCMC for non-log-concave sampling
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
RoMa: A Robust Model Watermarking Scheme for Protecting IP in Diffusion Models
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning
DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel
Low-Rank Head Avatar Personalization with Registers
Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness
ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing
AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement
A Unified Analysis of Stochastic Gradient Descent with Arbitrary Data Permutations and Beyond
From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring
CTSketch: Compositional Tensor Sketching for Scalable Neurosymbolic Learning
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Preference Learning with Lie Detectors can Induce Honesty or Evasion
Vertical Federated Feature Screening
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding
Behavior Injection: Preparing Language Models for Reinforcement Learning
The Narrow Gate: Localized Image-Text Communication in Native Multimodal Models
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
The Price of Opportunity Fairness in Matroid Allocation Problems
Enhancing Graph Classification Robustness with Singular Pooling
Orient Anything V2: Unifying Orientation and Rotation Understanding
CSGO: Content-Style Composition in Text-to-Image Generation
Principled Data Augmentation for Learning to Solve Quadratic Programming Problems
Targeted Maximum Likelihood Learning: An Optimization Perspective
Reconciling Geospatial Prediction and Retrieval via Sparse Representations
GenColor: Generative and Expressive Color Enhancement with Pixel-Perfect Texture Preservation
Active Measurement: Efficient Estimation at Scale
Bipolar Self-attention for Spiking Transformers
FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering
Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations
Variational Learning Finds Flatter Solutions at the Edge of Stability
Walking the Schrödinger Bridge: A Direct Trajectory for Text-to-3D Generation
Minimax Adaptive Online Nonparametric Regression over Besov spaces
Dynamics of Spontaneous Topic Changes in Next Token Prediction with Self-Attention
C-SafeGen: Certified Safe LLM Generation with Claim-Based Streaming Guardrails
OSTAR: Optimized Statistical Text-classifier with Adversarial Resistance
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
Hamiltonian Descent Algorithms for Optimization: Accelerated Rates via Randomized Integration Time
Diff-ICMH: Harmonizing Machine and Human Vision in Image Compression with Generative Prior
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees
SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding
Fantastic Bugs and Where to Find Them in AI Benchmarks
Representational Difference Explanations
STNet: Spectral Transformation Network for Solving Operator Eigenvalue Problem
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation
Adaptive Latent-Space Constraints in Personalized Federated Learning
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
FLUX: Efficient Descriptor-Driven Clustered Federated Learning under Arbitrary Distribution Shifts
Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
Don’t Forget the Enjoin: FocalLoRA for Instruction Hierarchical Alignment in Large Language Models
How Benchmark Prediction from Fewer Data Misses the Mark
Federated Multi-armed Bandits with Efficient Bit-Level Communications
One for All: Universal Topological Primitive Transfer for Graph Structure Learning
Instance-Dependent Regret Bounds for Nonstochastic Linear Partial Monitoring
Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention
Bisecle: Binding and Separation in Continual Learning for Video Language Understanding
Unifying Proportional Fairness in Centroid and Non-Centroid Clustering
When Worse is Better: Navigating the Compression Generation Trade-off In Visual Tokenization
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
Guided Diffusion Sampling on Function Spaces with Applications to PDEs
3D Human Pose Estimation with Muscles
Enhancing 3D Reconstruction for Dynamic Scenes
Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models
Leaving No OOD Instance Behind: Instance-Level OOD Fine-Tuning for Anomaly Segmentation
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Uncertainty-aware Preference Alignment for Diffusion Policies
Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
Data Selection Matters: Towards Robust Instruction Tuning of Large Multimodal Models
Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks
Large Language Models as Model Organisms for Human Associative Learning
DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
A Computationally Viable Numerical Gradient-based Technique for Optimal Covering Problems
Memory-Efficient Training with In-Place FFT Implementation
PoE-World: Compositional World Modeling with Products of Programmatic Experts
Higher-Order Learning with Graph Neural Networks via Hypergraph Encodings
AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling
Efficient Verified Unlearning For Distillation
FlashMD: long-stride, universal prediction of molecular dynamics
Compositional Monte Carlo Tree Diffusion for Extendable Planning
Tree-Based Premise Selection for Lean4
Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL
SSTAG: Structure-Aware Self-Supervised Learning Method for Text-Attributed Graphs
SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models
A Single-Swap Local Search Algorithm for k-Means of Lines
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation
Spiking Neural Networks Need High-Frequency Information
Scalable Cross-View Sample Alignment for Multi-View Clustering with View Structure Similarity
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting
Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation
Neural B-frame Video Compression with Bi-directional Reference Harmonization
Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study
A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning
A Fair Federated Learning Method for Handling Client Participation Probability Inconsistencies in Heterogeneous Environments
Conditioning Matters: Training Diffusion Policies is Faster Than You Think
Quantifying Statistical Significance of Deep Nearest Neighbor Anomaly Detection via Selective Inference
C$^2$Prompt: Class-aware Client Knowledge Interaction for Federated Continual Learning
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery
TrajMamba: An Efficient and Semantic-rich Vehicle Trajectory Pre-training Model
LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition
Zero-Shot Performance Prediction for Probabilistic Scaling Laws
SnapMoGen: Human Motion Generation from Expressive Texts
Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling
Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs
Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
Streaming Stochastic Submodular Maximization with On-Demand User Requests
Neural Atlas Graphs for Dynamic Scene Decomposition and Editing
SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?
DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Interpretable Next-token Prediction via the Generalized Induction Head
LoRATv2: Enabling Low-Cost Temporal Modeling in One-Stream Trackers
ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions
Vocabulary-Guided Gait Recognition
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion
Scaling Epidemic Inference on Contact Networks: Theory and Algorithms
Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation
Approximation and Generalization Abilities of Score-based Neural Network Generative Models for Sub-Gaussian Distributions
Accelerated Vertical Federated Adversarial Learning through Decoupling Layer-Wise Dependencies
WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Dual Alignment Framework for Few-shot Learning with Inter-Set and Intra-Set Shifts
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search
Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
One Head to Rule Them All: Amplifying LVLM Safety through a Single Critical Attention Head
AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection
Learning single index models via harmonic decomposition
SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders
Boosting Knowledge Utilization in Multimodal Large Language Models via Adaptive Logits Fusion and Attention Reallocation
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
Photography Perspective Composition: Towards Aesthetic Perspective Recommendation
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
What Can RL Bring to VLA Generalization? An Empirical Study
FAST: Foreground‑aware Diffusion with Accelerated Sampling Trajectory for Segmentation‑oriented Anomaly Synthesis
Online Segment Any 3D Thing as Instance Tracking
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Demystifying Spectral Feature Learning for Instrumental Variable Regression
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
Balancing Gradient and Hessian Queries in Non-Convex Optimization
Attention Sinks: A 'Catch, Tag, Release' Mechanism for Embeddings
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting
Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning
Credal Prediction based on Relative Likelihood
From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
Adjoint Schrödinger Bridge Sampler
Learning Reconfigurable Representations for Multimodal Federated Learning with Missing Data
Auditing Meta-Cognitive Hallucinations in Reasoning Large Language Models
Eve3D: Elevating Vision Models for Enhanced 3D Surface Reconstruction via Gaussian Splatting
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Comparator-Adaptive $\Phi$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games
Online Learning of Pure States is as Hard as Mixed States
On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization
Preconditioned Langevin Dynamics with Score-based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems
On the Edge of Memorization in Diffusion Models
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
A Near-Optimal Algorithm for Decentralized Convex-Concave Finite-Sum Minimax Optimization
HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation
Reasoning Models Better Express Their Confidence
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs
Channel Simulation and Distributed Compression with Ensemble Rejection Sampling
SAM2Flow: Interactive Optical Flow Estimation with Dual Memory for in vivo Microcirculation Analysis
MaNGO — Adaptable Graph Network Simulators via Meta-Learning
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
Improving Target Sound Extraction via Disentangled Codec Representations with Privileged Knowledge Distillation
ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation
MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
State Entropy Regularization for Robust Reinforcement Learning
Language Ranker: A Lightweight Ranking framework for LLM Decoding
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Neural Correlates of Serial Dependence: Synaptic Short-term Plasticity Orchestrates Repulsion and Attraction
Solving Discrete (Semi) Unbalanced Optimal Transport with Equivalent Transformation Mechanism and KKT-Multiplier Regularization
Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts
Greed is Good: A Unifying Perspective on Guided Generation
Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval
FANS: A Flatness-Aware Network Structure for Generalization in Offline Reinforcement Learning
An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations
L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling
The Primacy of Magnitude in Low-Rank Adaptation
Bridging Expressivity and Scalability with Adaptive Unitary SSMs
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning
Disentangled Cross-Modal Representation Learning with Enhanced Mutual Supervision
AiDE-Q: Synthetic Labeled Datasets Can Enhance Learning Models for Quantum Property Estimation
Conformal Prediction for Ensembles: Improving Efficiency via Score-Based Aggregation
Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning
On the rankability of visual embeddings
Buffer layers for Test-Time Adaptation
Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning
Value Improved Actor Critic Algorithms
MotionBind: Multi-Modal Human Motion Alignment for Retrieval, Recognition, and Generation
Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
Accelerating data-driven algorithm selection for combinatorial partitioning problems
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Adaptive and Multi-scale Affinity Alignment for Hierarchical Contrastive Learning
Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration
MoFo: Empowering Long-term Time Series Forecasting with Periodic Pattern Modeling
Curious Causality-Seeking Agents Learn Meta Causal World
Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling
Boosting Adversarial Transferability with Spatial Adversarial Alignment
On the Complexity of Finding Stationary Points in Nonconvex Simple Bilevel Optimization
Spectral Analysis of Representational Similarity with Limited Neurons
Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems
C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models
Point4Bit: Post Training 4-bit Quantization for Point Cloud 3D Detection
Deep Gaussian from Motion: Exploring 3D Geometric Foundation Models for Gaussian Splatting
$\Psi$-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Problem-Parameter-Free Decentralized Bilevel Optimization
Robust Regression of General ReLUs with Queries
MobileODE: An Extra Lightweight Network
Block Coordinate Descent for Neural Networks Provably Finds Global Minima
K-DeCore: Facilitating Knowledge Transfer in Continual Structured Knowledge Reasoning via Knowledge Decoupling
Diffusion Transformers as Open-World Spatiotemporal Foundation Models
Structured Spectral Reasoning for Frequency-Adaptive Multimodal Recommendation
FairNet: Dynamic Fairness Correction without Performance Loss via Contrastive Conditional LoRA
Neural Green’s Functions
Breaking the Order Barrier: Off-Policy Evaluation for Confounded POMDPs
From Self-Check to Consensus: Bayesian Strategic Decoding in Large Language Models
Dimension-adapted Momentum Outscales SGD
SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference
Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning
Geometric Mixture Models for Electrolyte Conductivity Prediction
Enhancing CLIP Robustness via Cross-Modality Alignment
MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction
The Structural Complexity of Matrix-Vector Multiplication
Hyperbolic Dataset Distillation
Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization
Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?
Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models
PseuZO: Pseudo-Zeroth-Order Algorithm for Training Deep Neural Networks
Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning
Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
Estimation of Stochastic Optimal Transport Maps
ContextAgent: Context-Aware Proactive LLM Agents with Open-world Sensory Perceptions
LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits
SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
Deep Learning with Plausible Deniability
AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems
Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning
Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks
Distributionally Robust Feature Selection
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Generative Graph Pattern Machine
Linearization Explains Fine-Tuning in Large Language Models
SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
Time-Evolving Dynamical System for Learning Latent Representations of Mouse Visual Neural Activity
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
RAG4GFM: Bridging Knowledge Gaps in Graph Foundation Models through Graph Retrieval Augmented Generation
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
Sequentially Auditing Differential Privacy
Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition
Breaking AR’s Sampling Bottleneck: Provable Acceleration via Diffusion Language Models
Rethinking Scale-Aware Temporal Encoding for Event-based Object Detection
Thought Communication in Multiagent Collaboration
\(\varepsilon\)-Optimally Solving Two-Player Zero-Sum POSGs
UniTransfer: Video Concept Transfer via Progressive Spatio-Temporal Decomposition
Safety Depth in Large Language Models: A Markov Chain Perspective
Dynamic and Chemical Constraints to Enhance the Molecular Masked Graph Autoencoders
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Scaling Laws For Scalable Oversight
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Convolution Goes Higher-Order: A Biologically Inspired Mechanism Empowers Image Classification
Stochastic Shortest Path with Sparse Adversarial Costs
Doodle to Detect: A Goofy but Powerful Approach to Skeleton-based Hand Gesture Recognition
Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting
Nyström-Accelerated Primal LS-SVMs: Breaking the $O(an^3)$ Complexity Bottleneck for Scalable ODEs Learning
Conformal Risk Training: End-to-End Optimization of Conformal Risk Control
Let the LLM Stick to Its Strengths: Learning to Route Economical LLM
Social World Model-Augmented Mechanism Design Policy Learning
Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation
MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation
3D Visual Illusion Depth Estimation
Graph Few-Shot Learning via Adaptive Spectrum Experts and Cross-Set Distribution Calibration
Towards Graph Foundation Models: Training on Knowledge Graphs Enables Transferability to General Graphs
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank
Fast Inference for Augmented Large Language Models
Learning Repetition-Invariant Representations for Polymer Informatics
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
SPARTAN: A Sparse Transformer World Model Attending to What Matters
Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization
Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization
MuSLR: Multimodal Symbolic Logical Reasoning
Mean Flows for One-step Generative Modeling
FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models
Regional Explanations: Bridging Local and Global Variable Importance
Learning Provably Improves the Convergence of Gradient Descent
Benign Overfitting in Single-Head Attention
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training
Fast MRI for All: Bridging Access Gaps by Training without Raw Data
Towards Prospective Medical Image Reconstruction via Knowledge-Informed Dynamic Optimal Transport
Influence Guided Context Selection for Effective Retrieval-Augmented Generation
Better Estimation of the Kullback--Leibler Divergence Between Language Models
FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning
UFT: Unifying Supervised and Reinforcement Fine-Tuning
Feedback-Aware MCTS for Goal-Oriented Information Seeking
Extracting task-relevant preserved dynamics from contrastive aligned neural recordings
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
A Plug-and-Play Query Synthesis Active Learning Framework for Neural PDE Solvers
On the Relation between Rectified Flows and Optimal Transport
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
CG-SSL: Concept-Guided Self-Supervised Learning
LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators
Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria
Learning to Clean: Reinforcement Learning for Noisy Label Correction
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Distributional Adversarial Attacks and Training in Deep Hedging
Calibrating Translation Decoding with Quality Estimation on LLMs
Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval
egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-world Tasks
Tight High-Probability Bounds for Nonconvex Heavy-Tailed Scenario under Weaker Assumptions
Boundary-to-Region Supervision for Offline Safe Reinforcement Learning
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
A Single-Loop Gradient Algorithm for Pessimistic Bilevel Optimization via Smooth Approximation
Optimism Without Regularization: Constant Regret in Zero-Sum Games
Anomaly Detection by an Ensemble of Random Pairs of Hyperspheres
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Backdoor Mitigation via Invertible Pruning Masks
Subgraph Federated Learning via Spectral Methods
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Let Brain Rhythm Shape Machine Intelligence for Connecting Dots on Graphs
QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation
Towards Robust Zero-Shot Reinforcement Learning
Statistical Inference under Performativity
Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities
Investigating Hallucinations of Time Series Foundation Models through Signal Subspace Analysis
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
EditInfinity: Image Editing with Binary-Quantized Generative Models
DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval
EverybodyDance: Bipartite Graph–Based Identity Correspondence for Multi-Character Animation
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
Self-Training with Dynamic Weighting for Robust Gradual Domain Adaptation
DUO: No Compromise to Accuracy Degradation
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Differentially Private High-dimensional Variable Selection via Integer Programming
STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization
Understanding Bias Terms in Neural Representations
Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement
Principled Model Routing for Unknown Mixtures of Source Domains
Solving Inverse Problems with FLAIR
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Regression Trees Know Calculus
UnCLe: Towards Scalable Dynamic Causal Discovery in Non-linear Temporal Systems
Salient Concept-Aware Generative Data Augmentation
Conflict-Aware Knowledge Editing in the Wild: Semantic-Augmented Graph Representation for Unstructured Text
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
Active Seriation: Efficient Ordering Recovery with Statistical Guarantees
FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed
Active Target Discovery under Uninformative Priors: The Power of Permanent and Transient Memory
COOPERA: Continual Open-Ended Human-Robot Assistance
CURE: Co-Evolving Coders and Unit Testers via Reinforcement Learning
RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains
ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods
Graph Persistence goes Spectral
MonoLift: Learning 3D Manipulation Policies from Monocular RGB via Distillation
Straight-Line Diffusion Model for Efficient 3D Molecular Generation
RoboScape: Physics-informed Embodied World Model
TITAN: A Trajectory-Informed Technique for Adaptive Parameter Freezing in Large-Scale VQE
Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity
Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation
Sequential Attention-based Sampling for Histopathological Analysis
TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning
Beyond Prediction: Managing the Repercussions of Machine Learning Applications
Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning
Sparse Image Synthesis via Joint Latent and RoI Flow
Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents
Learning Juntas under Markov Random Fields
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
Attention (as Discrete-Time Markov) Chains
VividFace: A Robost and High-Fidelity Video Face Swapping Framework
Geometric Algebra-Enhanced Bayesian Flow Network for RNA Inverse Design
Exploring the limits of strong membership inference attacks on large language models
Parameter-Free Hypergraph Neural Network for Few-Shot Node Classification
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts
Distribution Learning Meets Graph Structure Sampling
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
Token Bottleneck: One Token to Remember Dynamics
VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models
DeepHalo: A Neural Choice Model with Controllable Context Effects
Diffusion on Demand: Selective Caching and Modulation for Efficient Generation
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Conformal Information Pursuit for Interactively Guiding Large Language Models
AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness
MODEM: A Morton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
One Prompt Fits All: Universal Graph Adaptation for Pretrained Models
No-Regret Online Autobidding Algorithms in First-price Auctions
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
A Pre-training Framework for Relational Data with Information-theoretic Principles
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios
Rethinking Approximate Gaussian Inference in Classification
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Spectral Compressive Imaging via Chromaticity-Intensity Decomposition
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
Error Forcing in Recurrent Neural Networks
Breaking the Discretization Barrier of Continuous Physics Simulation Learning
Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need
Searching Efficient Semantic Segmentation Architectures via Dynamic Path Selection
Revisiting Bi-Linear State Transitions in Recurrent Neural Networks
Smooth Regularization for Efficient Video Recognition
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Fairshare Data Pricing via Data Valuation for Large Language Models
Evolutionary Multi-View Classification via Eliminating Individual Fitness Bias
LLM-DAMVC: A Large Language Model Assisted Dynamic Agent for Multi-View Clustering
Modeling Neural Activity with Conditionally Linear Dynamical Systems
Efficiently Maintaining the Multilingual Capacity of MCLIP in Downstream Cross-Modal Retrieval Tasks
PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling
Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis
Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding
Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks
Policy Compatible Skill Incremental Learning via Lazy Learning Interface
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Continuous Concepts Removal in Text-to-image Diffusion Models
Equi-mRNA: Protein Translation Equivariant Encoding for mRNA Language Models
Exploring Neural Granger Causality with xLSTMs: Unveiling Temporal Dependencies in Complex Data
Unlocking SLM Potential for Data Analysis Code Generation via Non-Parametric Knowledge Distillation
A Novel General Framework for Sharp Lower Bounds in Succinct Stochastic Bandits
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
Differentially Private Quantiles with Smaller Error
Optimal Best Arm Identification under Differential Privacy
Asymptotic theory of SGD with a general learning-rate
Attention with Trained Embeddings Provably Selects Important Tokens
Multi-step Visual Reasoning with Visual Tokens Scaling and Verification
DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation
Differentiable Hierarchical Visual Tokenization
PID-controlled Langevin Dynamics for Faster Sampling on Generative Models
Provably Efficient Multi-Task Meta Bandit Learning via Shared Representations
Coarse-to-Fine 3D Part Assembly via Semantic Super-Parts and Symmetry-Aware Pose Estimation
Enhancing Infrared Vision: Progressive Prompt Fusion Network and Benchmark
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Distil-E2D: Distilling Image-to-Depth Priors for Event-Based Monocular Depth Estimation
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
Fisher meets Feynman: score-based variational inference with a product of experts
Rebalancing Return Coverage for Conditional Sequence Modeling in Offline Reinforcement Learning
Alligat0R: Pre-Training through Covisibility Segmentation for Relative Camera Pose Regression
Prompt Tuning Decision Transformers with Structured and Scalable Bandits
Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions
DIFFSSR: Stereo Image Super-resolution Using Differential Transformer
Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology
The Future Unmarked: Watermark Removal in AI-Generated Images via Next-Frame Prediction
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic
LocDiff: Identifying Locations on Earth by Diffusing in the Hilbert Space
Decomposing Interventional Causality into Synergistic, Redundant, and Unique Components
Evaluating the Inductive Abilities of Large Language Models: Why Chain-of-Thought Reasoning Sometimes Hurts More Than Helps
SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications
Holistic Large-Scale Scene Reconstruction via Mixed Gaussian Splatting
Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting
Probing Neural Combinatorial Optimization Models
Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Confusion-Driven Self-Supervised Progressively Weighted Ensemble Learning for Non-Exemplar Class Incremental Learning
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation
Prior-Guided Flow Matching for Target-Aware Molecule Design with Learnable Atom Number
Bridging Equivariant GNNs and Spherical CNNs for Structured Physical Domains
CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections
PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
Interpretable and Parameter Efficient Graph Neural Additive Models with Random Fourier Features
DCA: Graph-Guided Deep Embedding Clustering for Brain Atlases
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning
Vulnerable Data-Aware Adversarial Training
Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning
Transformers Learn Faster with Semantic Focus
Precise Information Control in Long-Form Text Generation
Logical Expressiveness of Graph Neural Networks with Hierarchical Node Individualization
VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers
GeRaF: Neural Geometry Reconstruction from Radio Frequency Signals
LogicTree: Improving Complex Reasoning of LLMs via Instantiated Multi-step Synthetic Logical Data
Composing Linear Layers from Irreducibles
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
Faithful Group Shapley Value
A compressive-expressive communication framework for compositional representations
Spatially-aware Weights Tokenization for NeRF-Language Models
Accelerated Evolving Set Processes for Local PageRank Computation
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
PhySense: Sensor Placement Optimization for Accurate Physics Sensing
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
SALS: Sparse Attention in Latent Space for KV Cache Compression
Discovering Opinion Intervals from Conflicts in Signed Graphs
Learnable Burst-Encodable Time-of-Flight Imaging for High-Fidelity Long-Distance Depth Sensing
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Bootstrap Off-policy with World Model
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
Delving into Cascaded Instability: A Lipschitz Continuity View on Image Restoration and Object Detection Synergy
Dynamic Masking and Auxiliary Hash Learning for Enhanced Cross-Modal Retrieval
OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields
Depth-Supervised Fusion Network for Seamless-Free Image Stitching
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks
Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
Zeroth-Order Optimization Finds Flat Minima
Imagined Autocurricula
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks
UltraLED: Learning to See Everything in Ultra-High Dynamic Range Scenes
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Structure-Aware Cooperative Ensemble Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models
Co-PatcheR: Collaborative Software Patching with Component-specific Small Reasoning Models
LoRA-EnVar: Parameter-Efficient Hybrid Ensemble Variational Assimilation for Weather Forecasting
Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect
Scalable In-context Ranking with Generative Models
Private Hyperparameter Tuning with Ex-Post Guarantee
Activity Pruning for Efficient Spiking Neural Networks
Towards Syn-to-Real IQA: A Novel Perspective on Reshaping Synthetic Data Distributions
Robust Integrated Learning and Pauli Noise Mitigation for Parametrized Quantum Circuits
CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models
Accelerating Feature Conformal Prediction via Taylor Approximation
Scalable and Cost-Efficient de Novo Template-Based Molecular Generation
RAST: Reasoning Activation in LLMs via Small-model Transfer
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation
Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing
Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion
MultiNet: Adaptive Multi-Viewed Subgraph Convolutional Networks for Graph Classification
Private Zeroth-Order Optimization with Public Data
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Logic.py: Bridging the Gap between LLMs and Constraint Solvers
Neural Entropy
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
RigAnyFace: Scaling Neural Facial Mesh Auto-Rigging with Unlabeled Data
Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings
HetSyn: Versatile Timescale Integration in Spiking Neural Networks via Heterogeneous Synapses
Let a Neural Network be Your Invariant
Conformal Arbitrage: Risk-Controlled Balancing of Competing Objectives in Language Models
Continuous Diffusion Model for Language Modeling
Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling
Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World
Infrequent Exploration in Linear Bandits
Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2
Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis
Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency
Flow Equivariant Recurrent Neural Networks
CaliGCL: Calibrated Graph Contrastive Learning via Partitioned Similarity and Consistency Discrimination
StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training
OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects
Boosting Resilience of Large Language Models through Causality-Driven Robust Optimization
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
TS-MOF: Two-Stage Multi-Objective Fine-tuning for Long-Tailed Recognition
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
PhysDiff: A Physically-Guided Diffusion Model for Multivariate Time Series Anomaly Detection
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Dependency Matters: Enhancing LLM Reasoning with Explicit Knowledge Grounding
Natural Gradient VI: Guarantees for Non-Conjugate Models
Computation and Memory-Efficient Model Compression with Gradient Reweighting
PubSub-VFL: Towards Efficient Two-Party Split Learning in Heterogeneous Environments via Publisher/Subscriber Architecture
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
Dual-Comb Ghost Imaging with Transformer-Based Reconstruction for Optical Fiber Endomicroscopy
PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization
Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations
PanoWan: Lifting Diffusion Video Generation Models to 360$^\circ$ with Latitude/Longitude-aware Mechanisms
Covariances for Free: Exploiting Mean Distributions for Training-free Federated Learning
DictPFL: Efficient and Private Federated Learning on Encrypted Gradients
PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions
FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges
Just One Layer Norm Guarantees Stable Extrapolation
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies
Path-specific effects for pulse-oximetry guided decisions in critical care
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
A Black-Box Debiasing Framework for Conditional Sampling
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Angles Don’t Lie: Unlocking Training‑Efficient RL Through the Model’s Own Signals
SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency
AutoHood3D: A Multi‑Modal Benchmark for Automotive Hood Design and Fluid–Structure Interaction
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
X-Field: A Physically Informed Representation for 3D X-ray Reconstruction
4DGCPro: Efficient Hierarchical 4D Gaussian Compression for Progressive Volumetric Video Streaming
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
EyeBench: Predictive Modeling from Eye Movements in Reading
One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting
From Synapses to Dynamics: Obtaining Function from Structure in a Connectome Constrained Model of the Head Direction Circuit
Learnable Sampler Distillation for Discrete Diffusion Models
Fair Cooperation in Mixed-Motive Games via Conflict-Aware Gradient Adjustment
The Leaderboard Illusion
A Differential and Pointwise Control Approach to Reinforcement Learning
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
Replicable Distribution Testing
Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning
Data-Driven Performance Guarantees for Classical and Learned Optimizers
Path-Enhanced Contrastive Learning for Recommendation
Disentangled Representation Learning via Modular Compositional Bias
DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm
Distance Adaptive Beam Search for Provably Accurate Graph-Based Nearest Neighbor Search
Venus-MAXWELL: Efficient Learning of Protein-Mutation Stability Landscapes using Protein Language Models
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
Lyapunov-Stable Adaptive Control for Multimodal Concept Drift
SpaceServe: Spatial Multiplexing of Complementary Encoders and Decoders for Multimodal LLMs
Learning long range dependencies through time reversal symmetry breaking
Document Summarization with Conformal Importance Guarantees
Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment
Impartial Selection with Predictions
Faster Fixed-Point Methods for Multichain MDPs
A learnability analysis on neuro-symbolic learning
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning
Estimating Hitting Times Locally at Scale
Wavelet Canonical Coherence for Nonstationary Signals
Variational Supervised Contrastive Learning
The $\varphi$ Curve: The Shape of Generalization through the Lens of Norm-based Capacity Control
LongMagpie: A Self-synthesis Method for Generating Large-scale Long-context Instructions
EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution
Scalable and adaptive prediction bands with kernel sum-of-squares
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Latent Mixture of Symmetries for Sample-Efficient Dynamic Learning
ShapeCraft: LLM Agents for Structured, Textured and Interactive 3D Modeling
Sherlock: Self-Correcting Reasoning in Vision-Language Models
CVGL: Causal Learning and Geometric Topology
ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks
Dynamic Siamese Expansion Framework for Improving Robustness in Online Continual Learning
Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory
Individual Fairness In Strategic Classification
FraPPE: Fast and Efficient Preference-Based Pure Exploration
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting
Sparse Optimistic Information Directed Sampling
Revitalizing SVD for Global Covariance Pooling: Halley’s Method to Overcome Over-Flattening
Refining Norms: A Post-hoc Framework for OOD Detection in Graph Neural Networks
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models
STACI: Spatio-Temporal Aleatoric Conformal Inference
VIKING: Deep variational inference with stochastic projections
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
PlanU: Large Language Model Reasoning through Planning under Uncertainty
On Feasible Rewards in Multi-Agent Inverse Reinforcement Learning
MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks
PREAMBLE: Private and Efficient Aggregation via Block Sparse Vectors
Which Algorithms Have Tight Generalization Bounds?
Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment
Does Representation Guarantee Welfare?
Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining
Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations
Automated Model Discovery via Multi-modal & Multi-step Pipeline
FLAME: Fast Long-context Adaptive Memory for Event-based Vision
Object-Centric Representation Learning for Enhanced 3D Semantic Scene Graph Prediction
CLIPTTA: Robust Contrastive Vision-Language Test-Time Adaptation
Exploring Structural Degradation in Dense Representations for Self-supervised Learning
LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery
Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders
Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning
Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation
Probabilistic Stability Guarantees for Feature Attributions
Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Prior Forgetting and In-Context Overfitting
FSNet: Feasibility-Seeking Neural Network for Constrained Optimization with Guarantees
Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework
MOBO-OSD: Batch Multi-Objective Bayesian Optimization via Orthogonal Search Directions
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
A Scalable, Causal, and Energy Efficient Framework for Neural Decoding with Spiking Neural Networks
Concept-Guided Interpretability via Neural Chunking
InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning
On Hierarchies of Fairness Notions in Cake Cutting: From Proportionality to Super Envy-Freeness
Truth over Tricks: Measuring and Mitigating Shortcut Learning in Misinformation Detection
Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling
Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers
Rotary Masked Autoencoders are Versatile Learners
Evolutionary Reasoning Does Not Arise in Standard Usage of Protein Language Models
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
Generalization Bounds for Rank-sparse Neural Networks
Ridge Boosting is Both Robust and Efficient
DeCaFlow: A deconfounding causal generative model
Topology-Aware Learning of Tubular Manifolds via SE(3)-Equivariant Network on Ball B-Spline Curve
Masked Diffusion Models as Energy Minimization
Don't be lazy: CompleteP enables compute-efficient deep transformers
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Fast Computation and Optimization for Opinion-Based Quantities of Friedkin-Johnsen Model
DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning
Weak-shot Keypoint Estimation via Keyness and Correspondence Transfer
Enhancing Visual Prompting through Expanded Transformation Space and Overfitting Mitigation
Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling
Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems
Heterogeneous Graph Transformers for Simultaneous Mobile Multi-Robot Task Allocation and Scheduling under Temporal Constraints
Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands
DisMo: Disentangled Motion Representations for Open-World Motion Transfer
Exploring and Leveraging Class Vectors for Classifier Editing
Extragradient Method for $(L_0, L_1)$-Lipschitz Root-finding Problems
Greedy Sampling Is Provably Efficient For RLHF
Knee-Deep in C-RASP: A Transformer Depth Hierarchy
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
Personalized Subgraph Federated Learning with Differentiable Auxiliary Projections
Memorization in Graph Neural Networks
KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity
Exploiting Dynamic Sparsity in Einsum
Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues
Accelerating Model-Free Optimization via Averaging of Cost Samples
Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency
Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning
Skill-Driven Neurosymbolic State Abstractions
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
FIPER: Factorized Features for Robust Image Super-Resolution and Compression
FlowFeat: Pixel-Dense Embedding of Motion Profiles
TokenSwap: A Lightweight Method to Disrupt Memorized Sequences in LLMs
Coupling Generative Modeling and an Autoencoder with the Causal Bridge
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model
Dimensionality Mismatch Between Brains and Artificial Neural Networks
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
Availability-aware Sensor Fusion via Unified Canonical Space
Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models
Repo2Run: Automated Building Executable Environment for Code Repository at Scale
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
ToF-IP: Time-of-Flight Enhanced Sparse Inertial Poser for Real-time Human Motion Capture
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs
Non-Line-of-Sight 3D Reconstruction with Radar
Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning
Stable Port-Hamiltonian Neural Networks
Measure-Theoretic Anti-Causal Representation Learning
A Generalized Label Shift Perspective for Cross-Domain Gaze Estimation
Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL
ZeCO: Zero-Communication Overhead Sequence Parallelism for Linear Attention
Eluder dimension: localise it!
C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning
SilentStriker: Toward Stealthy Bit-Flip Attacks on Large Language Models
Fast Non-Log-Concave Sampling under Nonconvex Equality and Inequality Constraints with Landing
Disentangling misreporting from genuine adaptation in strategic settings: a causal approach
Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics
Affine-Invariant Global Non-Asymptotic Convergence Analysis of BFGS under Self-Concordance
Class conditional conformal prediction for multiple inputs by p-value aggregation
A CLT for Polynomial GNNs on Community-Based Graphs
Image Super-Resolution with Guarantees via Conformalized Generative Models
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks
RETRO SYNFLOW: Discrete Flow-Matching for Accurate and Diverse Single-Step Retrosynthesis
ActiveVOO: Value of Observation Guided Active Knowledge Acquisition for Open-World Embodied Lifted Regression Planning
Template-Guided 3D Molecular Pose Generation via Flow Matching and Differentiable Optimization
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
Exponential Convergence Guarantees for Iterative Markovian Fitting
Incentivizing Time-Aware Fairness in Data Sharing
Matchings Under Biased and Correlated Evaluations
HOI-Dyn: Learning Interaction Dynamics for Human-Object Motion Diffusion
NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache
Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models
Go With the Flow: Fast Diffusion for Gaussian Mixture Models
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Towards Provable Emergence of In-Context Reinforcement Learning
Perturbation Bounds for Low-Rank Inverse Approximations under Noise
AC-LoRA: (Almost) Training-Free Access Control Aware Multi-Modal LLMs
Bridging Human and LLM Judgments: Understanding and Narrowing the Gap
Sharp Gaussian approximations for Decentralized Federated Learning
Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints
Bernstein–von Mises for Adaptively Collected Data
Marginal-Nonuniform PAC Learnability
Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos
True Impact of Cascade Length in Contextual Cascading Bandits
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts
Uncertainty Quantification for Deep Regression using Contextualised Normalizing Flows
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
Generative Model Inversion Through the Lens of the Manifold Hypothesis
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Structure-Aware Spectral Sparsification via Uniform Edge Sampling
PROFIT: A Specialized Optimizer for Deep Fine Tuning
Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning
PINNs with Learnable Quadrature
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising
Disentangling Hyperedges through the Lens of Category Theory
Program Synthesis via Test-Time Transduction
Multi-Objective Reinforcement Learning with Max-Min Criterion: A Game-Theoretic Approach
Meta-D2AG: Causal Graph Learning with Interventional Dynamic Data
SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding
Bandit Guided Submodular Curriculum for Adaptive Subset Selection
MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures
Overcoming Long Context Limitations of State Space Models via Context Dependent Sparse Attention
Track3R: Joint Point Map and Trajectory Prior for Spatiotemporal 3D Understanding
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention
ReservoirTTA: Prolonged Test-time Adaptation for Evolving and Recurring Domains
Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits
Agnostic Active Learning Is Always Better Than Passive Learning
Who Reasons in the Large Language Models?
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning
MIX: A Multi-view Time-Frequency Interactive Explanation Framework for Time Series Classification
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
SimpleStrat: Diversifying Language Model Generation with Stratification
Learning to Rank for In-Context Example Retrieval
Remarkable Robustness of LLMs: Stages of Inference?
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation
Compositional Neural Network Verification via Assume-Guarantee Reasoning
Efficient Randomized Experiments Using Foundation Models
Risk Bounds For Distributional Regression
All You Need is One: Capsule Prompt Tuning with a Single Vector
Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
An Efficient Orlicz-Sobolev Approach for Transporting Unbalanced Measures on a Graph
Shallow Diffuse: Robust and Invisible Watermarking through Low-Dim Subspaces in Diffusion Models
Pixel Reasoner: Incentivizing Pixel Space Reasoning via Curiosity-Driven Reinforcement Learning
Adversary Aware Optimization for Robust Defense
OCTDiff: Bridged Diffusion Model for Portable OCT Super-Resolution and Enhancement
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
On topological descriptors for graph products
Spike4DGS: Towards High-Speed Dynamic Scene Rendering with 4D Gaussian Splatting via a Spike Camera Array
Kernel von Mises Formula of the Influence Function
Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis
FlexWorld: Progressively Expanding 3D Scenes for Flexible-View Exploration
High-Dimensional Calibration from Swap Regret
Improving Regret Approximation for Unsupervised Dynamic Environment Generation
Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster
Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time
Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning
Score-informed Neural Operator for Enhancing Ordering-based Causal Discovery
ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models
Is Grokking a Computational Glass Relaxation?
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Generating Creative Chess Puzzles
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Latent Principle Discovery for Language Model Self-Improvement
SpikingVTG: A Spiking Detection Transformer for Video Temporal Grounding
Temporal In‑Context Fine‑Tuning for Versatile Control of Video Diffusion Models
On Optimal Steering to Achieve Exact Fairness
Evaluating Robustness of Monocular Depth Estimation with Procedural Scene Perturbations
PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models
Bayes optimal learning of attention-indexed models
Informed Initialization for Bayesian Optimization and Active Learning
ELECTRA: A Cartesian Network for 3D Charge Density Prediction with Floating Orbitals
Checklists Are Better Than Reward Models For Aligning Language Models
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization
Brain-Inspired fMRI-to-Text Decoding via Incremental and Wrap-Up Language Modeling
Max Entropy Moment Kalman Filter for Polynomial Systems with Arbitrary Noise
Measuring and Guiding Monosemanticity
OmniDraft: A cross-vocabulary, online adaptive drafter for on-device speculative decoding
Flow Matching Neural Processes
HeavyWater and SimplexWater: Distortion-free LLM Watermarks for Low-Entropy Distributions
Trajectory Graph Learning: Aligning with Long Trajectories in Reinforcement Learning Without Reward Design
Generalizing Experience for Language Agents with Hierarchical MetaFlows
Joint Design of Protein Surface and Backbone Using a Diffusion Bridge Model
Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency
Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts
Provable Meta-Learning with Low-Rank Adaptations
Training-Free Safe Denoisers for Safe Use of Diffusion Models
ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge
Learning Linear Attention in Polynomial Time
When and how can inexact generative models still sample from the data manifold?
L2DGCN: Learnable Enhancement and Label Selection Dynamic Graph Convolutional Networks for Mitigating Degree Bias
Anchor-based Maximum Discrepancy for Relative Similarity Testing
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Performative Validity of Recourse Explanations
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum
Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization
Learning Stochastic Multiscale Models
ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation
Scale-invariant attention
Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected
On Evaluating LLM Alignment by Evaluating LLMs as Judges
Exact Expressive Power of Transformers with Padding
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation
Time-uniform and Asymptotic Confidence Sequence of Quantile under Local Differential Privacy
GMM-based VAE model with Normalising Flow for effective stochastic segmentation
Training a Scientific Reasoning Model for Chemistry
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning
GeoClip: Geometry-Aware Clipping for Differentially Private SGD
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL
Efficient Low Rank Attention for Long-Context Inference in Large Language Models
Uncertain Knowledge Graph Completion via Semi-Supervised Confidence Distribution Learning
PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching
RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting
Differentially Private Relational Learning with Entity-level Privacy Guarantees
Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach
Imitation Learning with Temporal Logic Constraints
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Vector Quantization in the Brain: Grid-like Codes in World Models
Cycle-Sync: Robust Global Camera Pose Estimation through Enhanced Cycle-Consistent Synchronization
Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective
Adaptively Coordinating with Novel Partners via Learned Latent Strategies
Parameter-free Algorithms for the Stochastically Extended Adversarial Model
EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Network
TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding
S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning
Don’t Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation
BaRISTA: Brain Scale Informed Spatiotemporal Representation of Human Intracranial Neural Activity
Rao-Blackwellised Reparameterisation Gradients
Partial Correlation Network Estimation by Semismooth Newton Methods
Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning
Panoptic Captioning: An Equivalence Bridge for Image and Text
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Convex Potential Mirror Langevin Algorithm for Efficient Sampling of Energy-Based Models
Understanding Adam Requires Better Rotation Dependent Assumptions
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
Diffusion Generative Modeling on Lie Group Representations
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Graph-based Symbolic Regression with Invariance and Constraint Encoding
Constrained Best Arm Identification
Stab-SGD: Noise-Adaptivity in Smooth Optimization with Stability Ratios
What Really is a Member? Discrediting Membership Inference via Poisoning
Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings
Vision Function Layer in Multimodal LLMs
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model
SimSort: A Data-Driven Framework for Spike Sorting by Large-Scale Electrophysiology Simulation
Efficient semantic uncertainty quantification in language models via diversity-steered sampling
Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models
LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Adaptive 3D Reconstruction via Diffusion Priors and Forward Curvature-Matching Likelihood Updates
Reverse Diffusion Sequential Monte Carlo Samplers
FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design
Siegel Neural Networks
Online Prediction with Limited Selectivity
Obliviator Reveals the Cost of Nonlinear Guardedness in Concept Erasure
AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking
CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis
OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning
Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable RL
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
Training Language Models to Reason Efficiently
Simulating Society Requires Simulating Thought
URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model
EventMG: Efficient Multilevel Mamba-Graph Learning for Spatiotemporal Event Representation
Whole-Body Conditioned Egocentric Video Prediction
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
LaViDa: A Large Diffusion Model for Vision-Language Understanding
Generative Data Augmentation via Diffusion Distillation, Adversarial Alignment, and Importance Reweighting
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
VERA: Variational Inference Framework for Jailbreaking Large Language Models
EPFL-Smart-Kitchen: An Ego-Exo Multi-Modal Dataset for Challenging Action and Motion Understanding in Video-Language Models
MoRIC: A Modular Region-based Implicit Codec for Image Compression
SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism
Additive Models Explained: A Computational Complexity Approach
The Dual Nature of Plasticity Loss in Deep Continual Learning: Dissection and Mitigation
Information-Driven Design of Imaging Systems
MOTION: Multi-Sculpt Evolutionary Coarsening for Federated Continual Graph Learning
RNNs perform task computations by dynamically warping neural representations
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
BlockDecoder: Boosting ASR Decoders with Context and Merger Modules
Smooth Quadratic Prediction Markets
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Fairness-Regularized Online Optimization with Switching Costs
GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow
Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning
KaRF: Weakly-Supervised Kolmogorov-Arnold Networks-based Radiance Fields for Local Color Editing
Dynamic Semantic-Aware Correlation Modeling for UAV Tracking
On the Convergence of Single-Timescale Actor-Critic
From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards
A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Physics-informed machine learning with domain decomposition and global dynamics for three-dimensional intersecting flows
Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems
A Reinforcement Learning-based Bidding Strategy for Data Consumers in Auction-based Federated Learning
Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs
Differential Privacy for Euclidean Jordan Algebra with Applications to Private Symmetric Cone Programming
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
Revisiting Semi-Supervised Learning in the Era of Foundation Models
E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization
Learning to Flow from Generative Pretext Tasks for Neural Architecture Encoding
Entropy-Calibrated Label Distribution Learning
On scalable and efficient training of diffusion samplers
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
Refinement Methods for Distributed Distribution Estimation under $\ell^p$-Losses
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
FACE: Faithful Automatic Concept Extraction
Online robust locally differentially private learning for nonparametric regression
Differential Privacy on Fully Dynamic Streams
FSI-Edit: Frequency and Stochasticity Injection for Flexible Diffusion-Based Image Editing
DeltaPhi: Physical States Residual Learning for Neural Operators in Data-Limited PDE Solving
Reinforcement Learning with Backtracking Feedback
Imbalances in Neurosymbolic Learning: Characterization and Mitigating Strategies
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
PhySwin: An Efficient and Physically-Informed Foundation Model for Multispectral Earth Observation
Wasserstein Transfer Learning
TAMI: Taming Heterogeneity in Temporal Interactions for Temporal Graph Link Prediction
Controlled Visual Hallucination via Thalamus-Driven Decoupling Network for Domain Adaptation of Black-Box Predictors
Private Evolution Converges
Data-Dependent Regret Bounds for Constrained MABs
Metropolis Adjusted Microcanonical Hamiltonian Monte Carlo
Solving and Learning Partial Differential Equations with Variational Q-Exponential Processes
Inference with correlated priors using sisters cells
CamEdit: Continuous Camera Parameter Control for Photorealistic Image Editing
Neural Hamiltonian Diffusions for Modeling Structured Geometric Dynamics
Shortcut Features as Top Eigenfunctions of NTK: A Linear Neural Network Case and More
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention
Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers
What Matters in Data for DPO?
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training
Distance-informed Neural Processes
Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning
Optimizing Retrieval for RAG via Reinforced Contrastive Learning
Execution Guided Line-by-Line Code Generation
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Exploring Tradeoffs through Mode Connectivity for Multi-Task Learning
Optimize the Unseen - Fast NeRF Cleanup with Free Space Prior
Tightening Regret Lower and Upper Bounds in Restless Rising Bandits
Deeper with Riemannian Geometry: Overcoming Oversmoothing and Oversquashing for Graph Foundation Models
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms
Uncertainty-Guided Exploration for Efficient AlphaZero Training
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
AlphaBeta is not as good as you think: a simple random games model for a better analysis of deterministic game-solving algorithms
IDOL: Meeting Diverse Distribution Shifts with Prior Physics for Tropical Cyclone Multi-Task Estimation
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model
Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Continuous Thought Machines
Sampled Estimators For Softmax Must Be Biased
TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making
Metritocracy: Representative Metrics for Lite Benchmarks
DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models
Refusal Direction is Universal Across Safety-Aligned Languages
LoRA vs Full Fine-tuning: An Illusion of Equivalence
ARMesh: Autoregressive Mesh Generation via Next-Level-of-Detail Prediction
Multi-dataset Joint Pre-training of Emotional EEG Enables Generalizable Affective Computing
On Group Sufficiency Under Label Bias
CIDD: Collaborative Intelligence for Structure-Based Drug Design Empowered by LLMs
Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought
Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning
Continual Optimization with Symmetry Teleportation for Multi-Task Learning
Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction
Revisiting 1-peer exponential graph for enhancing decentralized learning efficiency
AI-Researcher: Autonomous Scientific Innovation
Variational Inference with Mixtures of Isotropic Gaussians
Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Learning normalized image densities via dual score matching
Many LLMs Are More Utilitarian Than One
Agnostic Continuous-Time Online Learning
LayerNavigator: Finding Promising Intervention Layers for Efficient Activation Steering in Large Language Models
MMaDA: Multimodal Large Diffusion Language Models
DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging
Universally Invariant Learning in Equivariant GNNs
Why Playing Against Diverse and Challenging Opponents Speeds Up Coevolution: A Theoretical Analysis on Combinatorial Games
Pinpointing Attention-Causal Communication in Language Models
Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism
Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering
MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees
RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
Model-Informed Flows for Bayesian Inference
Aggregation Hides Out-of-Distribution Generalization Failures from Spurious Correlations
Diversity-oriented Deep Multi-modal Clustering
Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation
Multivariate Time Series Anomaly Detection with Idempotent Reconstruction
Federated Continual Learning via Orchestrating Multi-Scale Expertise
Adversarial Graph Fusion for Incomplete Multi-view Semi-supervised Learning with Tensorial Imputation
Connectome-Based Modelling Reveals Orientation Maps in the Drosophila Optic Lobe
BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Improving Model-Based Reinforcement Learning by Converging to Flatter Minima
InstructRestore: Region-Customized Image Restoration with Human Instructions
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
GoalLadder: Incremental Goal Discovery with Vision-Language Models
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates
Synthetic-powered predictive inference
Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification
Synergy Between the Strong and the Weak: Spiking Neural Networks are Inherently Self-Distillers
Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback
ComRank: Ranking Loss for Multi-Label Complementary Label Learning
Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era
Generalized Category Discovery under Domain Shift: A Frequency Domain Perspective
Leveraging semantic similarity for experimentation with AI-generated treatments
SplashNet: Split‑and‑Share Encoders for Accurate and Efficient Typing with Surface Electromyography
A Hierarchy of Graphical Models for Counterfactual Inferences
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Squared families are useful conjugate priors
Assignments for Congestion-Averse Agents: Seeking Competitive and Envy-Free Solutions
Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls
Revisiting Consensus Error: A Fine-grained Analysis of Local SGD under Second-order Data Heterogeneity
$\Delta \mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization
One Filters All: A Generalist Filter For State Estimation
Purest Quantum State Identification
FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation
Value Diffusion Reinforcement Learning
Hippocampal-like Sequential Editing for Continual Knowledge Updates in Large Language Models
Seeds of Structure: Patch PCA Reveals Universal Compositional Cues in Diffusion Models
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Robust Satisficing Gaussian Process Bandits Under Adversarial Attacks
SAEMark: Steering Personalized Multilingual LLM Watermarks with Sparse Autoencoders
Constrained Discrete Diffusion
Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement
Learning 3D Anisotropic Noise Distributions Improves Molecular Force Fields
Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization
Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models
BayeSQP: Bayesian Optimization through Sequential Quadratic Programming
Preserving Task-Relevant Information Under Linear Concept Removal
Learning Crossmodal Interaction Patterns via Attributed Bipartite Graphs for Single-Cell Omics
LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding
Corrector Sampling in Language Models
DSCS: Fast CPDAG-Based Verification of Collapsible Submodels in High-Dimensional Bayesian Networks
Large Language Models as End-to-end Combinatorial Optimization Solvers
TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting
Learning to Learn with Contrastive Meta-Objective
Miss-ReID: Delivering Robust Multi-Modality Object Re-Identification Despite Missing Modalities
FRAM: Frobenius-Regularized Assignment Matching with Mixed-Precision Computing
SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment
On Reasoning Strength Planning in Large Reasoning Models
DAAC: Discrepancy-Aware Adaptive Contrastive Learning for Medical Time series
Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits
InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention
Hypergraph-Enhanced Contrastive Learning for Multi-View Clustering with Hyper-Laplacian Regularization
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
EVODiff: Entropy-aware Variance Optimized Diffusion Inference
Bridging Arbitrary and Tree Metrics via Differentiable Gromov Hyperbolicity
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models
Embeddings as Probabilistic Equivalence in Logic Programs
PRESTO: Preimage-Informed Instruction Optimization for Prompting Black-Box LLMs
Stochastic Gradients under Nuisances
Missing Data Imputation by Reducing Mutual Information with Rectified Flows
Remasking Discrete Diffusion Models with Inference-Time Scaling
Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing
On the Sample Complexity of Differentially Private Policy Optimization
Color Conditional Generation with Sliced Wasserstein Guidance
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
Provable Watermarking for Data Poisoning Attacks
Efficient Part-level 3D Object Generation via Dual Volume Packing
Model Provenance Testing for Large Language Models
Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models
Noise Consistency Training: A Native Approach for One-step Generator in Learning Additional Controls
Algorithm- and Data-Dependent Generalization Bounds for Diffusion Models
Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization
VideoMAR: Autoregressive Video Generation with Continuous Tokens
Towards Multiscale Graph-based Protein Learning with Geometric Secondary Structural Motifs
Joint Modeling of fMRI and EEG Imaging Using Ordinary Differential Equation-Based Hypergraph Neural Networks
BridgePure: Limited Protection Leakage Can Break Black-Box Data Protection
PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
Ascent Fails to Forget
Towards Predicting Any Human Trajectory In Context
Brain Harmony: A Multimodal Foundation Model Unifying Morphology and Function into 1D Tokens
A Partition Cover Approach to Tokenization
Machine Unlearning via Task Simplex Arithmetic
VL-SAM-V2: Open-World Object Detection with General and Specific Query Fusion
Local-Global Associative Frames for Symmetry-Preserving Crystal Structure Modeling
Are Large Language Models Sensitive to the Motives Behind Communication?
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio–Language Models
Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach
Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection
AI-Generated Video Detection via Perceptual Straightening
Controlling The Spread of Epidemics on Networks with Differential Privacy
MutualVPR: A Mutual Learning Framework for Resolving Supervision Inconsistencies via Adaptive Clustering
GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data
Multi-Expert Distributionally Robust Optimization for Out-of-Distribution Generalization
Rethinking Tokenized Graph Transformers for Node Classification
A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
A Minimalistic Unified Framework for Incremental Learning across Image Restoration Tasks
CLiFT: Compressive Light-Field Tokens for Compute Efficient and Adaptive Neural Rendering
Generalizing Single-Frame Supervision to Event-Level Understanding for Video Anomaly Detection
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Continuous Q-Score Matching: Diffusion Guided Reinforcement Learning for Continuous-Time Control
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
GraSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection
Diverse Influence Component Analysis: A Geometric Approach to Nonlinear Mixture Identifiability
Inferring stochastic dynamics with growth from cross-sectional data
Consistent Supervised-Unsupervised Alignment for Generalized Category Discovery
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
NoPo-Avatar: Generalizable and Animatable Avatars from Sparse Inputs without Human Poses
Uncertainty Estimation by Flexible Evidential Deep Learning
Nested Learning: The Illusion of Deep Learning Architectures
Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression
Unveiling Environmental Sensitivity of Individual Gains in Influence Maximization
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance
Attention Mechanism, Max-Affine Partition, and Universal Approximation
Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
PRSformer: Disease Prediction from Million-Scale Individual Genotypes
RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion
Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback
The Boundaries of Fair AI in Medical Image Prognosis: A Causal Perspective
Preference-Guided Diffusion for Multi-Objective Offline Optimization
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image
Actor-Free Continuous Control via Structurally Maximizable Q-Functions
Curriculum Model Merging: Harmonizing Chemical LLMs for Enhanced Cross-Task Generalization
Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs
Rectifying Shortcut Behaviors in Preference-based Reward Learning
Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
Graphs Help Graphs: Multi-Agent Graph Socialized Learning
Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study
Provable Scaling Laws for the Test-Time Compute of Large Language Models
Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Associations
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Optimization Inspired Few-Shot Adaptation for Large Language Models
Mixture of Inputs: Text Generation Beyond Discrete Token Sampling
Individually Fair Diversity Maximization
Information-theoretic Generalization Analysis for VQ-VAEs: A Role of Latent Variables
ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation
Modeling Dynamic Neural Activity by combining Naturalistic Video Stimuli and Stimulus-independent Latent Factors
QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks
Near-Optimal Sample Complexity for Online Constrained MDPs
LabelAny3D: Label Any Object 3D in the Wild
Equivariance by Contrast: Identifiable Equivariant Embeddings from Unlabeled Finite Group Actions
Generating Computational Cognitive models using Large Language Models
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
The Burden of Interactive Alignment with Inconsistent Preferences
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
The Parameterized Complexity of Computing the VC-Dimension
Robust Equilibria in Continuous Games: From Strategic to Dynamic Robustness
Individual Regret in Cooperative Stochastic Multi-Armed Bandits
Practical Kernel Selection for Kernel-based Conditional Independence Test
Continual Multimodal Contrastive Learning
Accelerating Parallel Diffusion Model Serving with Residual Compression
RLVR-World: Training World Models with Reinforcement Learning
AdvEDM: Fine-grained Adversarial Attack against VLM-based Embodied Agents
Generalizable Insights for Graph Transformers in Theory and Practice
Learning Urban Climate Dynamics via Physics-Guided Urban Surface–Atmosphere Interactions
How Does Topology Bias Distort Message Passing in Graph Recommender? A Dirichlet Energy Perspective
One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling
Stepsize anything: A unified learning rate schedule for budgeted-iteration training
Fréchet Geodesic Boosting
Functional Complexity-adaptive Temporal Tensor Decomposition
Entropy Rectifying Guidance for Diffusion and Flow Models
MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Discovering Symbolic Partial Differential Equation by Abductive Learning
An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
Incentivizing Desirable Effort Profiles in Strategic Classification: The Role of Causality and Uncertainty
Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable
Meta-Learning Objectives for Preference Optimization
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Efficient Algorithms for Robust and Partial Semi-Discrete Optimal Transport
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
Generative diffusion for perceptron problems: statistical physics analysis and efficient algorithms
Sparse Diffusion Autoencoder for Test-time Adapting Prediction of Complex Systems
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
GLVD: Guided Learned Vertex Descent
Learning Spatial-Aware Manipulation Ordering
Continuous Subspace Optimization for Continual Learning
Probing Hidden Knowledge Holes in Unlearned LLMs
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis
Progress Reward Model for Reinforcement Learning via Large Language Models
VCM: Vision Concept Modeling with Adaptive Vision Token Compression via Instruction Fine-Tuning
Investigating and Mitigating Catastrophic Forgetting in Medical Knowledge Injection through Internal Knowledge Augmentation Learning
A Reliable Cryptographic Framework for Empirical Machine Unlearning Evaluation
Prediction-Powered Semi-Supervised Learning with Online Power Tuning
SALoM: Structure Aware Temporal Graph Networks with Long-Short Memory Updater
GeGS-PCR: Fast and Robust Color 3D Point Cloud Registration with Two-Stage Geometric-3DGS Fusion
Tight analyses of first-order methods with error feedback
Toward Human Deictic Gesture Target Estimation
Channel Matters: Estimating Channel Influence for Multivariate Time Series
AnimateQR: Bridging Aesthetics and Functionality in Dynamic QR Code Generation
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement
Attention on the Sphere
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference
SpiderSolver: A Geometry-Aware Transformer for Solving PDEs on Complex Geometries
Watermarking Autoregressive Image Generation
Energy-based generator matching: A neural sampler for general state space
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Adversarial generalization of unfolding (model-based) networks
BlockScan: Detecting Anomalies in Blockchain Transactions
Real-Time Execution of Action Chunking Flow Policies
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Uncertainty Quantification with the Empirical Neural Tangent Kernel
Scaling Law with Learning Rate Annealing
Robust Minimax Boosting with Performance Guarantees
Open-Vocabulary Part Segmentation via Progressive and Boundary-Aware Strategy
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Elastic Robust Unlearning of Specific Knowledge in Large Language Models
VORTA: Efficient Video Diffusion via Routing Sparse Attention
Sketch-Augmented Features Improve Learning Long-Range Dependencies in Graph Neural Networks
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving
Robust Hallucination Detection in LLMs via Adaptive Token Selection
Analyzing the Power of Chain of Thought through Memorization Capabilities
Image Token Matters: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Separating the 'what' and 'how' of compositional computation to enable reuse and continual learning
From Pose to Muscle: Multimodal Learning for Piano Hand Muscle Electromyography
Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
Learning-Augmented Algorithms for $k$-median via Online Learning
Whose Instructions Count? Resolving Preference Bias in Instruction Fine-Tuning
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models
A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications
Beyond Benign Overfitting in Nadaraya-Watson Interpolators
VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree
Synergy over Discrepancy: A Partition-Based Approach to Multi-Domain LLM Fine-Tuning
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment
End-to-End Low-Light Enhancement for Object Detection with Learned Metadata from RAWs
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees
Pre-Trained Policy Discriminators are General Reward Models
MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Learning Robust Spectral Dynamics for Temporal Domain Generalization
MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild
On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm
Mitigating the Privacy–Utility Trade-off in Decentralized Federated Learning via f-Differential Privacy
Rethinking Out-of-Distribution Detection and Generalization with Collective Behavior Dynamics
Optimize Any Topology: A Foundation Model for Shape- and Resolution-Free Structural Topology Optimization
Robust Distortion-Free Watermark for Autoregressive Audio Generation Models
Scaling Language-centric Omnimodal Representation Learning
Understanding and Improving Fast Adversarial Training against $l_0$ Bounded Perturbations
Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models
Quasi-Self-Concordant Optimization with $\ell_{\infty}$ Lewis Weights
SAP: Exact Sorting in Splatting via Screen-Aligned Primitives
Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context
LoRO: Real-Time on-Device Secure Inference for LLMs via TEE-Based Low Rank Obfuscation
Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers
Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens
Mask Image Watermarking
Equivariant Eikonal Neural Networks: Grid-Free, Scalable Travel-Time Prediction on Homogeneous Spaces
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Learning in Compact Spaces with Approximately Normalized Transformer
ShoeFit: A New Dataset and Dual-image-stream DiT Framework for Virtual Footwear Try-On
Angular Steering: Behavior Control via Rotation in Activation Space
Parallel Scaling Law for Language Models
Beyond Modality Collapse: Representation Blending for Multimodal Dataset Distillation
A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models
Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving
TreeGen: A Bayesian Generative Model for Hierarchies
Energy Landscape-Aware Vision Transformers: Layerwise Dynamics and Adaptive Task-Specific Training via Hopfield States
Generalized and Invariant Single-Neuron In-Vivo Activity Representation Learning
Representation Consistency for Accurate and Coherent LLM Answer Aggregation
Crucible: Quantifying the Potential of Control Algorithms through LLM Agents
Implicit-ARAP: Efficient Handle-Guided Neural Field Deformation via Local Patch Meshing
GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity
Retrieval is Not Enough: Enhancing RAG through Test-Time Critique and Optimization
Cost-Sensitive Freeze-thaw Bayesian Optimization for Efficient Hyperparameter Tuning
Can Large Language Models Master Complex Card Games?
Online Functional Tensor Decomposition via Continual Learning for Streaming Data Completion
Relieving the Over-Aggregating Effect in Graph Transformers
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond
Ambient Proteins - Training Diffusion Models on Noisy Structures
Consistency of Physics-Informed Neural Networks for Second-Order Elliptic Equations
Don't Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding
More of the Same: Persistent Representational Harms Under Increased Representation
Federated Dialogue-Semantic Diffusion for Emotion Recognition under Incomplete Modalities
Collapsing Taylor Mode Automatic Differentiation
Spectral Analysis of Diffusion Models with Application to Schedule Design
Deep Nonlinear Sufficient Dimension Reduction
GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization
Structure Matters: Dynamic Policy Gradient
Statistical Inference for Decentralized Federated Learning
A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules
Unified Algorithms for RL with Decision-Estimation Coefficients: PAC, Reward-Free, Preference-Based Learning, and Beyond
A duality framework for analyzing random feature and two-layer neural networks
TESTING STATIONARITY AND CHANGE POINT DETECTION IN REINFORCEMENT LEARNING
From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization
Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality
Asymptotic Theory of Geometric and Adaptive $k$-Means Clustering
Online Statistical Inference in Decision Making with Matrix Context
Estimation and Inference in Distributional Reinforcement Learning
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Robust Transfer Learning with Unreliable Source Data
IMPROVED LEARNING THEORY FOR KERNEL DISTRIBUTION REGRESSION WITH TWO-STAGE SAMPLING
A Geometrical Analysis of Kernel Ridge Regression and its Applications
Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift
Neural Networks Generalize on Low Complexity Data
Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation
Versatile differentially private learning for general loss functions
Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Offline Actor-Critic for Average Reward MDPs
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Geometric Learning with Positively Decomposable Kernels
Stochastic-Constrained Stochastic Optimization with Markovian Data
Dropout Regularization Versus l2-Penalization in the Linear Model
Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces II: non-compact symmetric spaces
Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact case
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
We Should Chart an Atlas of All the World's Models
Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track
Stop the Nonconsensual Use of Nude Images in Research
Large Language Models Miss the Multi-agent Mark
Position: Biology is the Challenge Physics-Informed ML Needs to Evolve
AI Testing Should Account for Sophisticated Strategic Behaviour
Position: If Innovation in AI systematically Violates Fundamental Rights, Is It Innovation at All?
Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition
Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems
Collective Bargaining in the Information Economy Can Address AI-Driven Power Concentration
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor
SMRS: advocating a unified reporting standard for surrogate models in the artificial intelligence era.
Embracing Trustworthy Brain-Agent Collaboration as Paradigm Extension for Intelligent Assistive Technologies
FACT: Mitigating Inconsistent Hallucinations in LLMs via Fact-Driven Alternating Code-Text Training
NeurIPS should lead scientific consensus on AI policy
World Models Should Prioritize the Unification of Physical and Social Dynamics
Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming
Fostering the Ecosystem of AI for Social Impact Requires Expanding and Strengthening Evaluation Standards
Emerging Risks from Embodied AI Require Urgent Policy Action
Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications
Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health
SGCD: Stain-Guided CycleDiffusion for Unsupervised Domain Adaptation of Histopathology Image Classification
Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch
The Rich and the Simple: On the Implicit Bias of Adam and SGD
A Sustainable AI Economy Needs Data Deals That Work for Generators
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
LLM Generated Persona is a Promise with a Catch
More effort is needed to protect pedestrian privacy in the era of AI
Setting $\varepsilon$ is not the Issue in Differential Privacy
Prohibiting Generative AI in any Form of Weapon Control
Don’t call it privacy-preserving or human-centric pose estimation if you don’t measure privacy
Position: Benchmarking is Broken - Don't Let AI be Its Own Judge
Position: Require Frontier AI Labs To Release Small "Analog" Models
Position: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift
Prompting as Scientific Inquiry
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting
PanTS: The Pancreatic Tumor Segmentation Dataset
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation
PolypSense3D: A Multi-Source Benchmark Dataset for Depth-Aware Polyp Size Measurement in Endoscopy
PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries
DataSIR: A Benchmark Dataset for Sensitive Information Recognition
Meta-World+: An Improved, Standardized, RL Benchmark
Scaling Physical Reasoning with the PHYSICS Dataset
EuroSpeech: A Multilingual Speech Corpus
GUARD: Constructing Realistic Two-Player Matrix and Security Games for Benchmarking Game-Theoretic Algorithms
ORBIT - Open Recommendation Benchmark for Reproducible Research with Hidden Tests
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Intend to Move: A Multimodal Dataset for Intention-Aware Human Motion Understanding
Bubbleformer: Forecasting Boiling with Transformers
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans
Mitigating Semantic Collapse in Partially Relevant Video Retrieval
VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification
Towards Principled Unsupervised Multi-Agent Reinforcement Learning
Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code
MLLM-ISU: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models based Intrusion Scene Understanding
Multi-Objective One-Shot Pruning for Large Language Models
MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization
Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback
MARS-VFL: A Unified Benchmark for Vertical Federated Learning with Realistic Evaluation
Bridging the Gap Between Cross-Domain Theory and Practical Application: A Case Study on Molecular Dissolution
DAVE: Diagnostic benchmark for Audio Visual Evaluation
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
Video-R1: Reinforcing Video Reasoning in MLLMs
OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models
Safe and Stable Control via Lyapunov-Guided Diffusion Models
Sheetpedia: A 300K-Spreadsheet Corpus for Spreadsheet Intelligence and LLM Fine-Tuning
ConnectomeBench: Can LLMs proofread the connectome?
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Struct-Bench: A Benchmark for Differentially Private Structured Text Generation
PSMBench: A Benchmark and Dataset for Evaluating LLMs Extraction of Protocol State Machines from RFC Specifications
HouseLayout3D: A Benchmark and Training-free Baseline for 3D Layout Estimation in the Wild
A Semantic Parsing Framework for End-to-End Time Normalization
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
LawShift: Benchmarking Legal Judgment Prediction Under Statute Shifts
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
PHANTOM: A Benchmark for Hallucination Detection in Financial Long-Context QA
Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness
Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring
SWE-smith: Scaling Data for Software Engineering Agents
One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution
Optimal Spectral Transitions in High-Dimensional Multi-Index Models
Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies
Bidirectional Motion Transformer for Safety-Critical Traffic Scenario Generation
Bringing SAM to new heights: leveraging elevation data for tree crown segmentation from drone imagery
Factorio Learning Environment
Learning-Augmented Online Bipartite Fractional Matching
MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models
Merlin L48 Spectrogram Dataset
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning
Simulation-Based Inference for Adaptive Experiments
MolVision: Molecular Property Prediction with Vision Language Models
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities
VideoCAD: A Dataset and Model for Learning Long‑Horizon 3D CAD UI Interactions from Video
mmWalk: Towards Multi-modal Multi-view Walking Assistance
Introducing FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark
Learning from positive and unlabeled examples -Finite size sample bounds
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
MedSG-Bench: A Benchmark for Medical Image Sequences Grounding
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming
CheMixHub: Datasets and Benchmarks for Chemical Mixture Property Prediction
Counterfactual Evolution of Multimodal Datasets via Visual Programming
LooGLE v2: Are LLMs Ready for Real World Long Dependency Challenges?
CoreaSpeech: Korean Speech Corpus via JAMO-based Coreset Selection for Efficient and Robust Korean Speech Generation
PSBench: a large-scale benchmark for estimating the accuracy of protein complex structural models
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
scGeneScope: A Treatment-Matched Single Cell Imaging and Transcriptomics Dataset and Benchmark for Treatment Response Modeling
TAPAS: Datasets for Learning the Learning with Errors Problem
VaporTok: RL-Driven Adaptive Video Tokenizer with Prior & Task Awareness
Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes
Diffusion Feature Field for Text-based 3D Editing with Gaussian Splatting
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
GTPBD: A Fine-Grained Global Terraced Parcel and Boundary Dataset
Revisiting Glorot Initialization for Long-Range Linear Recurrences
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding
Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models
A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
SeePhys: Does Seeing Help Thinking? – Benchmarking Vision-Based Physics Reasoning
Controllable 3D Molecular Generation for Structure-Based Drug Design Through Bayesian Flow Networks and Gradient Integration
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
DQVis Dataset: Natural Language to Biomedical Visualization
TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery
Aeolus: A Multi-structural Flight Delay Dataset
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
A Technical Report on “Erasing the Invisible”: The 2024 NeurIPS Competition on Stress Testing Image Watermarks
Anatomically inspired digital twins capture hierarchical object representations in visual cortex
Bridging Crypto with ML-based Solvers: the SAT Formulation and Benchmarks
SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation
Do You Really Need Public Data? Surrogate Public Data for Differential Privacy on Tabular Data
GPO: Learning from Critical Steps to Improve LLM Reasoning
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity
Collaborating Vision, Depth, and Thermal Signals for Multi-Modal Tracking: Dataset and Algorithm
REFED: A Subject Real-time Dynamic Labeled EEG-fNIRS Synchronized Recorded Emotion Dataset
Benchmarking Large Language Models with Integer Sequence Generation Tasks
NFL-BA: Near-Field Light Bundle Adjustment for SLAM in Dynamic Lighting
Unified 2D-3D Discrete Priors for Noise-Robust and Calibration-Free Multiview 3D Human Pose Estimation
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
GMV: A Unified and Efficient Graph Multi-View Learning Framework
LithoSim: A Large, Holistic Lithography Simulation Benchmark for AI-Driven Semiconductor Manufacturing
RGB-to-Polarization Estimation: A New Task and Benchmark Study
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Meta-learning how to Share Credit among Macro-Actions
Reframing Gaussian Splatting Densification with Complexity-Density Consistency of Primitives
Transformers are almost optimal metalearners for linear classification
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Establishing Best Practices in Building Rigorous Agentic Benchmarks
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly
The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models
MMPB: It’s Time for Multi-Modal Personalization
OligoGym: Curated Datasets and Benchmarks for Oligonucleotide Drug Discovery
OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
MDReID: Modality-Decoupled Learning for Any-to-Any Multi-Modal Object Re-Identification
Lattice Boltzmann Model for Learning Real-World Pixel Dynamicity
CAT: Content-Adaptive Image Tokenization
Solver-Free Decision-Focused Learning for Linear Optimization Problems
ST$^2$360D: Spatial-to-Temporal Consistency for Training-free 360 Monocular Depth Estimation
NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning
Robust and Scalable Autonomous Reinforcement Learning in Irreversible Environments
Personalized Bayesian Federated Learning with Wasserstein Barycenter Aggregation
Constrained Linear Thompson Sampling
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
Unveiling the Power of Multiple Gossip Steps: A Stability-Based Generalization Analysis in Decentralized Training
Dense Metric Depth Estimation via Event-based Differential Focus Volume Prompting
DPAIL: Training Diffusion Policy for Adversarial Imitation Learning without Policy Optimization
Graph–Smoothed Bayesian Black-Box Shift Estimator and Its Information Geometry
Robust Contextual Pricing
Fast exact recovery of noisy matrix from few entries: the infinity norm approach
NeuroH-TGL: Neuro-Heterogeneity Guided Temporal Graph Learning Strategy for Brain Disease Diagnosis
Semantic Representation Attack against Aligned Large Language Models
RrED: Black-box Unsupervised Domain Adaptation via Rectifying-reasoning Errors of Diffusion
Taxonomy of reduction matrices for Graph Coarsening
UGM2N: An Unsupervised and Generalizable Mesh Movement Network via M-Uniform Loss
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Token Embeddings Violate the Manifold Hypothesis
Theoretical Guarantees for the Retention of Strict Nash Equilibria by Coevolutionary Algorithms
Cypher-RI: Reinforcement Learning for Integrating Schema Selection into Cypher Generation
Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling
Epistemic Uncertainty Estimation in Regression Ensemble Models with Pairwise Epistemic Estimators
Bio-Inspired Image Restoration
Geometric Logit Decoupling for Energy-Based Graph Out-of-distribution Detection
Improved Algorithms for Fair Matroid Submodular Maximization
Adaptable Safe Policy Learning from Multi-task Data with Constraint Prioritized Decision Transformer
A Closer Look to Positive-Unlabeled Learning from Fine-grained Perspectives: An Empirical Study
Learning Diffusion Models with Flexible Representation Guidance
Discrete Neural Flow Samplers with Locally Equivariant Transformer
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
Neural Attention Search
TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Activation-Guided Consensus Merging for Large Language Models
ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly Detection
Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning
CroPe: Cross-Modal Semantic Compensation Adaptation for All Adverse Scene Understanding
NeuralSurv: Deep Survival Analysis with Bayesian Uncertainty Quantification
GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Private Statistical Estimation via Truncation
Hybrid Latent Representations for PDE Emulation
Enhancing Deep Batch Active Learning for Regression with Imperfect Data Guided Selection
Effective Neural Approximations for Geometric Optimization Problems
High Resolution UDF Meshing via Iterative Networks
Unlocking Dataset Distillation with Diffusion Models
Statistics Caching Test-Time Adaptation for Vision-Language Models
Inference-Time Personalized Alignment with a Few User Preference Queries
Dynamic Shadow Unveils Invisible Semantics for Video Outpainting
Toward Interpretable Evaluation Measures for Time Series Segmentation
CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors
Unlocker: Disentangle the Deadlock of Learning between Label-noisy and Long-tailed Data
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model
Explainable Reinforcement Learning from Human Feedback to Improve Alignment
A geometric framework for momentum-based optimizers for low-rank training
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
Adaptive Re-calibration Learning for Balanced Multimodal Intention Recognition
Collective Counterfactual Explanations: Balancing Individual Goals and Collective Dynamics
BeyondMix: Leveraging Structural Priors and Long-Range Dependencies for Domain-Invariant LiDAR Segmentation
Quadratic Coreset Selection: Certifying and Reconciling Sequence and Token Mining for Efficient Instruction Tuning
On the Stability and Generalization of Meta-Learning: the Impact of Inner-Levels
Image Stitching in Adverse Condition: A Bidirectional-Consistency Learning Framework and Benchmark
Ask a Strong LLM Judge when Your Reward Model is Uncertain
Strategic Cost Selection in Participatory Budgeting
EddyFormer: Accelerated Neural Simulations of Three-Dimensional Turbulence at Scale
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
HYPRL: Reinforcement Learning of Control Policies for Hyperproperties
Timely Clinical Diagnosis through Active Test Selection
Prompt-Guided Alignment with Information Bottleneck Makes Image Compression Also a Restorer
ICLScan: Detecting Backdoors in Black-Box Large Language Models via Targeted In-context Illumination
Reading Recognition in the Wild
DiCoFlex: Model-Agnostic Diverse Counterfactuals with Flexible Control
Balancing Performance and Costs in Best Arm Identification
HoloScene: Simulation‑Ready Interactive 3D Worlds from a Single Video
INC: An Indirect Neural Corrector for Auto-Regressive Hybrid PDE Solvers
What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains
FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning
Single-pass Adaptive Image Tokenization for Minimum Program Search
Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning
Improved Bounds for Swap Multicalibration and Swap Omniprediction
Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms
Kinetics: Rethinking Test-Time Scaling Law
LinPrim: Linear Primitives for Differentiable Volumetric Rendering
Non-Markovian Discrete Diffusion with Causal Language Models
Learning Skill-Attributes for Transferable Assessment in Video
Hephaestus: Mixture Generative Modeling with Energy Guidance for Large-scale QoS Degradation
Self-Supervised Learning of Graph Representations for Network Intrusion Detection
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
WebDancer: Towards Autonomous Information Seeking Agency
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
Pseudo-Riemannian Graph Transformer
A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
In-Context Learning Strategies Emerge Rationally
Generalization Bounds for Model-based Algorithm Configuration
KeeA*: Epistemic Exploratory A* Search via Knowledge Calibration
Causal Explanation-Guided Learning for Organ Allocation
A data and task-constrained mechanistic model of the mouse outer retina shows robustness to contrast variations
Non-equilibrium Annealed Adjoint Sampler
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning
Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval
ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification
Concentration and excess risk bounds for imbalanced classification with synthetic oversampling
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Computable universal online learning
Online Two-Stage Submodular Maximization
Sample-Conditional Coverage in Split-Conformal Prediction
Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks
Revisiting Agnostic Boosting
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
Quantitative convergence of trained neural networks to Gaussian processes
STree: Speculative Tree Decoding for Hybrid State Space Models
Validating LLM-as-a-Judge Systems under Rating Indeterminacy
Let's Revise Step-by-Step: A Unified Local Search Framework for Code Generation with LLMs
Large Language Bayes
Efficient Safe Meta-Reinforcement Learning: Provable Near-Optimality and Anytime Safety
New Parallel and Streaming Algorithms for Directed Densest Subgraph
VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Semantic-guided Diverse Decoding for Large Language Model
Diffusion-Driven Progressive Target Manipulation for Source-Free Domain Adaptation
Topology-aware Graph Diffusion Model with Persistent Homology
Wukong's 72 Transformations: High-fidelity Textured 3D Morphing via Flow Models
SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning
How to Learn a Star: Binary Classification with Starshaped Polyhedral Sets
Flexible inference for animal learning rules using neural networks
PaTH Attention: Position Encoding via Accumulating Householder Transformations
scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration
Robust and Computation-Aware Gaussian Processes
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Causal Climate Emulation with Bayesian Filtering
Small Singular Values Matter: A Random Matrix Analysis of Transformer Models
Interpreting Arithmetic Reasoning in Large Language Models using Game-Theoretic Interactions
Generalized Contrastive Learning for Universal Multimodal Retrieval
Noise-Robustness Through Noise: A Framework combining Asymmetric LoRA with Poisoning MoE
Multiresolution Analysis and Statistical Thresholding on Dynamic Networks
Learning-Augmented Facility Location Mechanisms for the Envy Ratio Objective
Towards General Continuous Memory for Vision-Language Models
Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
Can Diffusion Models Disentangle? A Theoretical Perspective
A Geometry-Aware Metric for Mode Collapse in Time Series Generative Models
Streaming Audio Generation from Discrete Tokens via Streaming Flow Matching
IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL
Non-rectangular Robust MDPs with Normed Uncertainty Sets
Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination
DevFD : Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces
POCO: Scalable Neural Forecasting through Population Conditioning
RvLLM: LLM Runtime Verification with Domain Knowledge
Scaling Offline RL via Efficient and Expressive Shortcut Models
Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL
CDFlow: Building Invertible Layers with Circulant and Diagonal Matrices
Self-Verification Provably Prevents Model Collapse in Recursive Synthetic Training
Partner Modelling Emerges in Recurrent Agents (But Only When It Matters)
S$^2$M-Former: Spiking Symmetric Mixing Branchformer for Brain Auditory Attention Detection
Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints
ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
Unbiased Prototype Consistency Learning for Multi-Modal and Multi-Task Object Re-Identification
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning
MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
Unfolding the Black Box of Recurrent Neural Networks for Path Integration
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Diffusion Beats Autoregressive in Data-Constrained Settings
Tackling Feature-Classifier Mismatch in Federated Learning via Prompt-Driven Feature Transformation
scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery
FSEO: Few-Shot Evolutionary Optimization via Meta-Learning for Expensive Multi-Objective Optimization
Removing Concepts from Text-to-Image Models with Only Negative Samples
A Generalized Binary Tree Mechanism for Private Approximation of All-Pair Shortest Distances
Dual-Res Tandem Mamba-3D: Bilateral Breast Lesion Detection and Classification on Non-contrast Chest CT
A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning
GaRA-SAM: Robustifying Segment Anything Model with Gated-Rank Adaptation
Towards Generalizable Multi-Policy Optimization with Self-Evolution for Job Scheduling
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?
Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models
Interpretable Global Minima of Deep ReLU Neural Networks on Sequentially Separable Data
Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
Minimal Semantic Sufficiency Meets Unsupervised Domain Generalization
Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs
Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
Test3R: Learning to Reconstruct 3D at Test Time
The Adaptive Complexity of Minimizing Relative Fisher Information
Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
When Are Concepts Erased From Diffusion Models?
Emergent Risk Awareness in Rational Agents under Resource Constraints
Online Learning of Neural Networks
PolyPose: Deformable 2D/3D Registration via Polyrigid Transformations
AutoData: A Multi-Agent System for Open Web Data Collection
HPSERec: A Hierarchical Partitioning and Stepwise Enhancement Framework for Long-tailed Sequential Recommendation
SGN: Shifted Window-Based Hierarchical Variable Grouping for Multivariate Time Series Classification
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
Plenodium: Underwater 3D Scene Reconstruction with Plenoptic Medium Representation
Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers
Online Bilateral Trade With Minimal Feedback: Don’t Waste Seller’s Time
Dependency Parsing is More Parameter-Efficient with Normalization
MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
DreamLight: Towards Harmonious and Consistent Image Relighting
DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method
Improving the Euclidean Diffusion Generation of Manifold Data by Mitigating Score Function Singularity
Efficient Quadratic Corrections for Frank-Wolfe Algorithms
FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators
StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations
Optimal community detection in dense bipartite graphs
Improved Balanced Classification with Theoretically Grounded Loss Functions
Latent Retrieval Augmented Generation of Cross-Domain Protein Binders
CoFFT: Chain of Foresight-Focus Thought for Visual Language Models
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
Learning to Integrate Diffusion ODEs by Averaging the Derivatives
FuncGenFoil: Airfoil Generation and Editing Model in Function Space
Enforcing convex constraints in Graph Neural Networks
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
One-Step Diffusion-Based Image Compression with Semantic Distillation
A Latent Multilayer Graphical Model For Complex, Interdependent Systems
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Seeing the Arrow of Time in Large Multimodal Models
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows
From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review
ShapeX: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Continual Model Merging without Data: Dual Projections for Balancing Stability and Plasticity
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
Online Portfolio Selection with ML Predictions
Towards Generalizable Retina Vessel Segmentation with Deformable Graph Priors
Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality
EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting
Noisy Multi-Label Learning through Co-Occurrence-Aware Diffusion
Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via $\textit{In-the-wild}$ Cascading Flow Optimization
Improving planning and MBRL with temporally-extended actions
A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Constrained Optimization From a Control Perspective via Feedback Linearization
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning
Causality Meets the Table: Debiasing LLMs for Faithful TableQA via Front-Door Intervention
ChatbotID: Identifying Chatbots with Granger Causality Test
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Plug-and-play Feature Causality Decomposition for Multimodal Representation Learning
Predicting the Performance of Black-box Language Models with Follow-up Queries
Segment Anything Model Meets Semi-supervised Medical Image Segmentation: A Novel Perspective
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
How Classifier Features Transfer to Downstream: An Asymptotic Analysis in a Two-Layer Model
EPA: Boosting Event-based Video Frame Interpolation with Perceptually Aligned Learning
Scaling can lead to compositional generalization
Adaptive Riemannian ADMM for Nonsmooth Optimization: Optimal Complexity without Smoothing
Theoretical Investigation of Adafactor for Non-Convex Smooth Optimization
What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
APML: Adaptive Probabilistic Matching Loss for Robust 3D Point Cloud Reconstruction
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport
Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures
Online Learning in the Repeated Mediated Newsvendor Problem
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Volume Transmission Implements Context Factorization to Target Online Credit Assignment and Enable Compositional Generalization
3D-GSRD: 3D Molecular Graph Auto-Encoder with Selective Re-mask Decoding
Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies
VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning
Improving Task-Specific Multimodal Sentiment Analysis with General MLLMs via Prompting
Rethinking Residual Distribution in Locate-then-Edit Model Editing
Image Editing As Programs with Diffusion Models
Jacobian-Based Interpretation of Nonlinear Neural Encoding Model
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks
On Logic-based Self-Explainable Graph Neural Networks
Covariate-moderated Empirical Bayes Matrix Factorization
CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision
Multimodal Negative Learning
Grids Often Outperform Implicit Neural Representation at Compressing Dense Signals
Elastic ViTs from Pretrained Models without Retraining
A Learning-Augmented Dynamic Programming Approach for Orienteering Problem with Time Windows
Training-Free Test-Time Adaptation via Shape and Style Guidance for Vision-Language Models
Rectified CFG++ for Flow Based Models
JAFAR: Jack up Any Feature at Any Resolution
How Far Are We from Optimal Reasoning Efficiency?
Embodied Crowd Counting
Generative Perception of Shape and Material from Differential Motion
Riemannian Flow Matching for Brain Connectivity Matrices via Pullback Geometry
Space Group Equivariant Crystal Diffusion
Model Editing for Vision Transformers
MI-TRQR: Mutual Information-Based Temporal Redundancy Quantification and Reduction for Energy-Efficient Spiking Neural Networks
Learning to Instruct for Visual Instruction Tuning
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Can Agent Fix Agent Issues?
FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens
Asymptotically exact variational flows via involutive MCMC kernels
Transformers for Mixed-type Event Sequences
Simultaneous Statistical Inference for Off-Policy Evaluation in Reinforcement Learning
LuxDiT: Lighting Estimation with Video Diffusion Transformer
SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Causal Discovery and Inference through Next-Token Prediction
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
MIBP-Cert: Certified Training against Data Perturbations with Mixed-Integer Bilinear Programs
Scalable Exploration via Ensemble++
Learning Without Augmenting: Unsupervised Time Series Representation Learning via Frame Projections
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
On Efficiency-Effectiveness Trade-off of Diffusion-based Recommenders
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards
Reasoning Models Sometimes Output Illegible Chains of Thought
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Do Language Models Use Their Depth Efficiently?
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework
Learning Efficient Fuse-and-Refine for Feed-Forward 3D Gaussian Splatting
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Angular Constraint Embedding via SpherePair Loss for Constrained Clustering
MagCache: Fast Video Generation with Magnitude-Aware Cache
HollowFlow: Efficient Sample Likelihood Evaluation using Hollow Message Passing
Flat Channels to Infinity in Neural Loss Landscapes
TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models
Proxy-SPEX: Sample-Efficient Interpretability via Sparse Feature Interactions in LLMs
A Geometric Analysis of PCA
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference
Bridging Brains and Concepts: Interpretable Visual Decoding from fMRI with Semantic Bottlenecks
Contimask: Explaining Irregular Time Series via Perturbations in Continuous Time
Multi-Class Support Vector Machine with Differential Privacy
You Can Trust Your Clustering Model: A Parameter-free Self-Boosting Plug-in for Deep Clustering
ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG
Geometry of Decision Making in Language Models
Brain-Informed Fine-Tuning for Improved Multilingual Understanding in Language Models
Covering Multiple Objectives with a Small Set of Solutions Using Bayesian Optimization
Transstratal Adversarial Attack: Compromising Multi-Layered Defenses in Text-to-Image Models
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Rethinking Gradient Step Denoiser: Towards Truly Pseudo-Contractive Operator
Preference Optimization by Estimating the Ratio of the Data Distribution
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer
The Gaussian Mixing Mechanism: Renyi Differential Privacy via Gaussian Sketches
On Transferring Transferability: Towards a Theory for Size Generalization
Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action
Distribution-Aligned Decoding for Efficient LLM Task Adaptation
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
Beyond Oracle: Verifier-Supervision for Instruction Hierarchy in Reasoning and Instruction-Tuned LLMs
Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback
MisoDICE: Multi-Agent Imitation from Mixed-Quality Demonstrations
Towards Effective Federated Graph Foundation Model via Mitigating Knowledge Entanglement
Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
Making Classic GNNs Strong Baselines Across Varying Homophily: A Smoothness–Generalization Perspective
Demystifying Language Model Forgetting with Low-rank Example Associations
Differentially Private Bilevel Optimization: Efficient Algorithms with Near-Optimal Rates
MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference
Reward-oriented Causal Representation Learning
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Instant4D: 4D Gaussian Splatting in Minutes
Adaptive Data Analysis for Growing Data
Interpreting Emergent Features in Deep Learning-based Side-channel Analysis
SCoT: Unifying Consistency Models and Rectified Flows via Straight-Consistent Trajectories
FEAT: Free energy Estimators with Adaptive Transport
AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians
Emergent Temporal Correspondences from Video Diffusion Transformers
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
CCL: Causal-aware In-context Learning for Out-of-Distribution Generalization
EUGens: Efficient, Unified and General Dense Layers
Fully Spiking Neural Networks for Unified Frame-Event Object Tracking
Flux4D: Flow-based Unsupervised 4D Reconstruction
Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution
Simple and Efficient Heterogeneous Temporal Graph Neural Network
Thompson Sampling in Function Spaces via Neural Operators
Sampling from multi-modal distributions with polynomial query complexity in fixed dimension via reverse diffusion
From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
Robust Estimation Under Heterogeneous Corruption Rates
Orthogonal Survival Learners for Estimating Heterogeneous Treatment Effects from Time-to-Event Data
PLEIADES: Building Temporal Kernels with Orthogonal Polynomials
MetaDefense: Defending Fine-tuning based Jailbreak Attack Before and During Generation
Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion
Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Linear Extrapolation
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
The Fragile Truth of Saliency: Improving LLM Input Attribution via Attention Bias Optimization
Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards
Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints
SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks
Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models
Novel Exploration via Orthogonality
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
MVSMamba: Multi-View Stereo with State Space Model
SegMASt3R: Geometry Grounded Segment Matching
NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective
Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles
ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation
A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random
Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism
On the Entropy Calibration of Language Models
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective
Geometric Imbalance in Semi-Supervised Node Classification
Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames
Partial Physics Informed Diffusion Model for Ocean Chlorophyll Concentration Reconstruction
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation
Linear Attention for Efficient Bidirectional Sequence Modeling
RANK++LETR: Learn to Rank and Optimize Candidates for Line Segment Detection
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments
Conditional Gradient Methods with Standard LMO for Stochastic Simple Bilevel Optimization
Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL
PRESCRIBE: Predicting Single-Cell Responses with Bayesian Estimation
SPRO: Improving Image Generation via Self-Play
Tensor Decomposition Networks for Accelerating Machine Learning Force Field Computations
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Unveiling the Uncertainty in Embodied and Operational Carbon of Large AI Models through a Probabilistic Carbon Accounting Model
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning
Efficient and Generalizable Mixed-Precision Quantization via Topological Entropy
Traversal Verification for Speculative Tree Decoding
Better NTK Conditioning: A Free Lunch from (ReLU) Nonlinear Activation in Wide Neural Networks
AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning
Bounds on the computational complexity of neurons due to dendritic morphology
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
FRN: Fractal-Based Recursive Spectral Reconstruction Network
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling
InstructSAM: A Training-free Framework for Instruction-Oriented Remote Sensing Object Recognition
Bandit and Delayed Feedback in Online Structured Prediction
BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation
VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models
DataRater: Meta-Learned Dataset Curation
Delving into Large Language Models for Effective Time-Series Anomaly Detection
The Underappreciated Power of Vision Models for Graph Structural Understanding
AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing
Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
Efficient Bayesian Experiment Design with Equivariant Networks
On the SAC-BL Algorithm for Anomaly Detection
REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
SAS: Simulated Attention Score
Normalize Filters! Classical Wisdom for Deep Vision
DBLoss: Decomposition-based Loss Function for Time Series Forecasting
Multi-Kernel Correlation-Attention Vision Transformer for Enhanced Contextual Understanding and Multi-Scale Integration
Preference-driven Knowledge Distillation for Few-shot Node Classification
DoseSurv: Predicting Personalized Survival Outcomes under Continuous-Valued Treatments
IPSI: Enhancing Structural Inference with Automatically Learned Structural Priors
Towards Visualization-of-Thought Jailbreak Attack against Large Visual Language Models
Analog Foundation Models
How Particle System Theory Enhances Hypergraph Message Passing
Value-Guided Decision Transformer: A Unified Reinforcement Learning Framework for Online and Offline Settings
Anti-Aliased 2D Gaussian Splatting
Looking Into the Water by Unsupervised Learning of the Surface Shape
Dynamic Regret Reduces to Kernelized Static Regret
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
ReDi: Rectified Discrete Flow
Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model
L2RSI: Cross-view LiDAR-based Place Recognition for Large-scale Urban Scenes via Remote Sensing Imagery
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Steering Information Utility in Key-Value Memory for Language Model Post-Training
Bayesian Ego-graph inference for Networked Multi-Agent Reinforcement Learning
Rationalized All-Atom Protein Design with Unified Multi-Modal Bayesian Flow
Variance-Reduced Long-Term Rehearsal Learning with Quadratic Programming Reformulation
TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE
Solving Partial Differential Equations via Radon Neural Operator
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
TranSUN: A Preemptive Paradigm to Eradicate Retransformation Bias Intrinsically from Regression Models in Recommender Systems
A Signed Graph Approach to Understanding and Mitigating Oversmoothing
Erasing Conceptual Knowledge from Language Models
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Resounding Acoustic Fields with Reciprocity
It’s Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation
AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget
ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search
Causal Mixture Models: Characterization and Discovery
Precise Diffusion Inversion: Towards Novel Samples and Few-Step Models
OPHR: Mastering Volatility Trading with Multi-Agent Deep Reinforcement Learning
Denoising Trajectory Biases for Zero-Shot AI-Generated Image Detection
Direct Fisher Score Estimation for Likelihood Maximization
UMoE: Unifying Attention and FFN with Shared Experts
See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction
Beyond Least Squares: Uniform Approximation and the Hidden Cost of Misspecification
AgentAuditor: Human-level Safety and Security Evaluation for LLM Agents
Graph Data Selection for Domain Adaptation: A Model-Free Approach
Sequential Multi-Agent Dynamic Algorithm Configuration
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Neural Rule Lists: Learning Discretizations, Rules, and Order in One Go
Resolution of Simpson's paradox via the common cause principle
Personalized Visual Content Generation in Conversational Systems
How do Transformers Learn Implicit Reasoning?
Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning
Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning
Unleashing the Power of One-Step Diffusion based Image Super-Resolution via a Large-Scale Diffusion Discriminator
SurfelSplat: Learning Efficient and Generalizable Gaussian Surfel Representations for Sparse-View Surface Reconstruction
Gaze-VLM: Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding
TensorRL-QAS: Reinforcement learning with tensor networks for improved quantum architecture search
ComPO: Preference Alignment via Comparison Oracles
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
Conditional Forecasts and Proper Scoring Rules for Reliable and Accurate Performative Predictions
Theory-Driven Label-Specific Representation for Incomplete Multi-View Multi-Label Learning
Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients
Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation
Uncertainty-Sensitive Privileged Learning
Dynamic Bundling with Large Language Models for Zero-Shot Inference on Text-Attributed Graphs
Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning
Knowledge Starts with Practice: Knowledge-Aware Exercise Generative Recommendation with Adaptive Multi-Agent Cooperation
Dimensional Collapse in VQVAEs: Evidence and Remedies
SpEx: A Spectral Approach to Explainable Clustering
Towards Irreversible Attack: Fooling Scene Text Recognition via Multi-Population Coevolution Search
Mellow: a small audio language model for reasoning
Object-centric binding in Contrastive Language-Image Pretraining
Non-Stationary Lipschitz Bandits
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Tradeoffs between Mistakes and ERM Oracle Calls in Online and Transductive Online Learning
Sampling 3D Molecular Conformers with Diffusion Transformers
GD$^2$: Robust Graph Learning under Label Noise via Dual-View Prediction Discrepancy
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing
DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference
Truthful Aggregation of LLMs with an Application to Online Advertising
Scaling Image Geo-Localization to Continent Level
STAR-Bets: Sequential TArget-Recalculating Bets for Tighter Confidence Intervals
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
SpecEM: Training-Free LLM Ensembling via Iterative Drafting, Verification, and Online Feedback
In-Context Compositional Learning vis Sparse Coding Transformer
On Local Limits of Sparse Random Graphs: Color Convergence and the Refined Configuration Model
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Wavy Transformer
Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
Transforming Generic Coder LLMs to Effective Binary Code Embedding Models for Similarity Detection
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
D$^2$GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction
What Does It Take to Build a Performant Selective Classifier?
CausalVTG: Towards Robust Video Temporal Grounding via Causal Inference
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
DINGO: Constrained Inference for Diffusion LLMs
Flattening Hierarchies with Policy Bootstrapping
DAMamba: Vision State Space Model with Dynamic Adaptive Scan
An Effective Levelling Paradigm for Unlabeled Scenarios
Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades
G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
Understanding Data Influence in Reinforcement Finetuning
Retrosynthesis Planning via Worst-path Policy Optimisation in Tree-structured MDPs
Towards Doctor-Like Reasoning: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products
Learning Source-Free Domain Adaptation for Visible-Infrared Person Re-Identification
Bi-Directional Communication-Efficient Stochastic FL via Remote Source Generation
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
Data Fusion for Partial Identification of Causal Effects
Table as a Modality for Large Language Models
Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats
Ultra-high Resolution Watermarking Framework Resistant to Extreme Cropping and Scaling
Self-Evolving Pseudo-Rehearsal for Catastrophic Forgetting with Task Similarity in LLMs
Learning with Statistical Equality Constraints
Amortized Sampling with Transferable Normalizing Flows
Correlation Dimension of Autoregressive Large Language Models
Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving
Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs
Private Geometric Median in Nearly-Linear Time
DISC: Dynamic Decomposition Improves LLM Inference Scaling
PC-Net: Weakly Supervised Compositional Moment Retrieval via Proposal-Centric Network
Towards Unsupervised Open-Set Graph Domain Adaptation via Dual Reprogramming
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining
Visual Instruction Bottleneck Tuning
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Vicinity-Guided Discriminative Latent Diffusion for Privacy-Preserving Domain Adaptation
Interaction-Centric Knowledge Infusion and Transfer for Open Vocabulary Scene Graph Generation
Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression
Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
Nearly-Linear Time and Massively Parallel Algorithms for $k$-anonymity
On the Stability of Graph Convolutional Neural Networks: A Probabilistic Perspective
Tapered Off-Policy REINFORCE - Stable and efficient reinforcement learning for large language models
Intrinsic Goals for Autonomous Agents: Model-Based Exploration in Virtual Zebrafish Predicts Ethological Behavior and Whole-Brain Dynamics
CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning
Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
X-Mahalanobis: Transformer Feature Mixing for Reliable OOD Detection
Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning
Steering Generative Models with Experimental Data for Protein Fitness Optimization
$\textit{Hyper-GoalNet}$: Goal-Conditioned Manipulation Policy Learning with HyperNetworks
Synergistic Tensor and Pipeline Parallelism
Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations
LLM Query Scheduling with Prefix Reuse and Latency Constraints
Weaver: Shrinking the Generation-Verification Gap by Scaling Compute for Verification
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Architectural and Inferential Inductive Biases for Exchangeable Sequence Modeling
The third pillar of causal analysis? A measurement perspective on causal representations
Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns
Fractional Langevin Dynamics for Combinatorial Optimization via Polynomial-Time Escape
Last Iterate Convergence in Monotone Mean Field Games
Surface-Aware Feed-Forward Quadratic Gaussian for Frame Interpolation with Large Motion
Towards Generalizable 3D Human Pose Estimation via Ensembles on Flat Loss Landscapes
Rethinking PCA Through Duality
YEAST: Yet Another Sequential Test
Fast Zeroth-Order Convex Optimization with Quantum Gradient Methods
Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables
ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibiltiy Data
Interactive Anomaly Detection for Articulated Objects via Motion Anticipation
Universal Visuo-Tactile Video Understanding for Embodied Interaction
Predicting Empirical AI Research Outcomes with Language Models
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
Scalable inference of functional neural connectivity at submillisecond timescales
MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization
Improving Time Series Forecasting via Instance-aware Post-hoc Revision
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework
Learning Gradient Boosted Decision Trees with Algorithmic Recourse
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning
Unveiling Extraneous Sampling Bias with Data Missing-Not-At-Random
Omni-DNA: A Genomic Model Supporting Sequence Understanding, Long-context, and Textual Annotation
Transition Matching: Scalable and Flexible Generative Modeling
Incomplete Multi-view Deep Clustering with Data Imputation and Alignment
CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification
TTRL: Test-Time Reinforcement Learning
PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space
Self Iterative Label Refinement via Robust Unlabeled Learning
PDPO: Parametric Density Path Optimization
DynaPhArM: Adaptive and Physics-Constrained Modeling for Target-Drug Complexes with Drug-Specific Adaptations
AudSemThinker: Enhancing Audio-Language Models Through Reasoning over Semantics of Sound
Physics-informed Neural Operator for Pansharpening
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Parameter Dynamics of Online Machine Learning and Test-time Adaptation
Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data
Tight Asymptotics of Extreme Order Statistics
Training-free Detection of AI-generated images via Cropping Robustness
ReDit: Reward Dithering for Improved LLM Policy Optimization
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
VideoLucy: Deep Memory Backtracking for Long Video Understanding
Neuro-Spectral Architectures for Causal Physics-Informed Networks
Counterfactual reasoning: an analysis of in-context emergence
Injecting Frame-Event Complementary Fusion into Diffusion for Optical Flow in Challenging Scenes
End-to-End Vision Tokenizer Tuning
Mitigating Spurious Features in Contrastive Learning with Spectral Regularization
Understanding and Enhancing Mask-Based Pretraining towards Universal Representations
Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules
Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos
BrainOmni: A Brain Foundation Model for Unified EEG and MEG Signals
BrainEC-LLM: Brain Effective Connectivity Estimation by Multiscale Mixing LLM
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
Activated LoRA: Fine-tuned LLMs for Intrinsics
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos
Self-Refining Language Model Anonymizers via Adversarial Distillation
Fast attention mechanisms: a tale of parallelism
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks
OmniGaze: Reward-inspired Generalizable Gaze Estimation in the Wild
VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion
Generative Pre-trained Autoregressive Diffusion Transformer
Efficiently Scaling LLM Reasoning Programs with Certaindex
Stratify or Die: Rethinking Data Splits in Image Segmentation
Uniform Wrappers: Bridging Concave to Quadratizable Functions in Online Optimization
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
On the Sample Complexity Bounds of Bilevel Reinforcement Learning
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation
Compress & Cache: Vision token compression for efficient generation and retrieval
Valid Inference with Imperfect Synthetic Data
Group-in-Group Policy Optimization for LLM Agent Training
Understanding Contrastive Learning via Gaussian Mixture Models
Building 3D Representations and Generating Motions From a Single Image via Video-Generation
Alleviating Hallucinations in Large Language Models through Multi-Model Contrastive Decoding and Dynamic Hallucination Detection
MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search
Handling Missing Responses under Cluster Dependence with Applications to Language Model Evaluation
Integrating Drug Substructures and Longitudinal Electronic Health Records for Personalized Drug Recommendation
Pan-LUT: Efficient Pan-sharpening via Learnable Look-Up Tables
Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing
Hyper-Modality Enhancement for Multimodal Sentiment Analysis with Missing Modalities
ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs
Revising and Falsifying Sparse Autoencoder Feature Explanations
Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
Direct Alignment with Heterogeneous Preferences
FACE: A General Framework for Mapping Collaborative Filtering Embeddings into LLM Tokens
On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection
Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning
Fortifying Time Series: DTW-Certified Robust Anomaly Detection
Bilevel Optimization for Adversarial Learning Problems: Sharpness, Generation, and Beyond
SRHand: Super-Resolving Hand Images and 3D Shapes via View/Pose-aware Neural Image Representations and Explicit Meshes
MoonCast: High-Quality Zero-Shot Podcast Generation
Convergence Rates for Gradient Descent on the Edge of Stability for Overparametrised Least Squares
Rethinking Nighttime Image Deraining via Learnable Color Space Transformation
Superposition Yields Robust Neural Scaling
CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
Selective Learning for Deep Time Series Forecasting
On Traceability in $\ell_p$ Stochastic Convex Optimization
Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability
A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees
WaveAR: Wavelet-Aware Continuous Autoregressive Diffusion for Accurate Human Motion Prediction
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models
Towards Multi-Table Learning: A Novel Paradigm for Complementarity Quantification and Integration
Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
ToolRL: Reward is All Tool Learning Needs
Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models
Generalization vs Specialization under Concept Shift
Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
Localizing Knowledge in Diffusion Transformers
Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning
RepGuard: Adaptive Feature Decoupling for Robust Backdoor Defense in Large Language Models
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models
Improving Bilinear RNN with Closed-loop Control
Reinforced Context Order Recovery for Adaptive Reasoning and Planning
Backdoor Cleaning without External Guidance in MLLM Fine-tuning
MoleBridge: Synthetic Space Projecting with Discrete Markov Bridges
Multimodal 3D Genome Pre-training
Dynamical modeling of nonlinear latent factors in multiscale neural activity with real-time inference
FedGPS: Statistical Rectification Against Data Heterogeneity in Federated Learning
CADMorph: Geometry‑Driven Parametric CAD Editing via a Plan–Generate–Verify Loop
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
Contribution of task-irrelevant stimuli to drift of neural representations
Contextual Online Pricing with (Biased) Offline Data
Stitch and Tell: A Structured Data Augmentation Method for Spatial Understanding
Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
AF-UMC: An Alignment-Free Fusion Framework for Unaligned Multi-View Clustering
Robust and Diverse Multi-Agent Learning via Rational Policy Gradient
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Unbalanced Optimal Total Variation Transport: A Theoretical Approach to Spatial Resource Allocation Problems
Reinforcement Learning Teachers of Test Time Scaling
Structured Temporal Causality for Interpretable Multivariate Time Series Anomaly Detection
Conditional Representation Learning for Customized Tasks
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
Enhancing Consistency of Flow-Based Image Editing through Kalman Control
Entropic Time Schedulers for Generative Diffusion Models
VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning
Active Test-time Vision-Language Navigation
ZeroPatcher: Training-free Sampler for Video Inpainting and Editing
Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers
STAR: Efficient Preference-based Reinforcement Learning via Dual Regularization
Gaussian Regression-Driven Tensorized Incomplete Multi-View Clustering with Dual Manifold Regularization
Predicting Functional Brain Connectivity with Context-Aware Deep Neural Networks
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
Efficient Representativeness-Aware Coreset Selection
Median Selection with Noisy and Structural Information
Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
MIRA: Medical Time Series Foundation Model for Real-World Health Data
Combining Cost Constrained Runtime Monitors for AI Safety
Discovering Compositional Hallucinations in LVLMs
MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis
Statistical Analysis of an Adversarial Bayesian Weak Supervision Method
Differentiable Generalized Sliced Wasserstein Plans
S$^2$NN: Sub-bit Spiking Neural Networks
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
Lifelong Safety Alignment for Language Models
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
Detecting High-Stakes Interactions with Activation Probes
Bidirectional Representations Augmented Autoregressive Biological Sequence Generation: Application in De Novo Peptide Sequencing
Multi-View Oriented GPLVM: Expressiveness and Efficiency
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering
The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis
Real-World Adverse Weather Image Restoration via Dual-Level Reinforcement Learning with High-Quality Cold Start
Learning Generalizable Shape Completion with SIM(3) Equivariance
Neural Emulator Superiority: When Machine Learning for PDEs Surpasses its Training Data
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling
PANGEA: Projection-Based Augmentation with Non-Relevant General Data for Enhanced Domain Adaptation in LLMs
Predictability Enables Parallelization of Nonlinear State Space Models
Bridging the gap to real-world language-grounded visual concept learning
Martingale Posterior Neural Networks for Fast Sequential Decision Making
High Dynamic Range Imaging with Time-Encoding Spike Camera
World-aware Planning Narratives Enhance Large Vision-Language Model Planner
Hybrid-Collaborative Augmentation and Contrastive Sample Adaptive-Differential Awareness for Robust Attributed Graph Clustering
Bilevel Network Learning via Hierarchically Structured Sparsity
Weak-to-Strong Generalization under Distribution Shifts
Hierachical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
UniZyme: A Unified Protein Cleavage Site Predictor Enhanced with Enzyme Active-Site Knowledge
A solvable model of learning generative diffusion: theory and insights
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
DON’T NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
GnnXemplar: Exemplars to Explanations - Natural Language Rules for Global GNN Interpretability
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis
Towards a Pairwise Ranking Model with Orderliness and Monotonicity for Label Enhancement
Theoretical Insights into In-context Learning with Unlabeled Data
PANTHER: Generative Pretraining Beyond Language for Sequential User Behavior Modeling
Understanding and Enhancing Message Passing on Heterophilic Graphs via Compatibility Matrix
$\text{S}^2$Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation
ForceFM: Enhancing Protein-Ligand Predictions through Force-Guided Flow Matching
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Normal-Abnormal Guided Generalist Anomaly Detection
High-Performance Arithmetic Circuit Optimization via Differentiable Architecture Search
Efficient Training of Minimal and Maximal Low-Rank Recurrent Neural Networks
Adaptive Fission: Post-training Encoding for Low-latency Spike Neural Networks
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
Neurosymbolic Diffusion Models
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation
Asymptotically Stable Quaternion-valued Hopfield-structured Neural Network with Periodic Projection-based Supervised Learning Rules
Prediction-Powered Causal Inferences
Low-Rank Graphon Learning for Networks
Maximizing the Value of Predictions in Control: Accuracy Is Not Enough
HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Hybrid-Balance GFlowNet for Solving Vehicle Routing Problems
OmniGen-AR: AutoRegressive Any-to-Image Generation
Machine Unlearning in 3D Generation: A Perspective-Coherent Acceleration Framework
Online Time Series Forecasting with Theoretical Guarantees
DUET: Dual-Perspective Pseudo Labeling and Uncertainty-aware Exploration & Exploitation Training for Source-Free Domain Adaptation
FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling
Distilling LLM Prior to Flow Model for Generalizable Agent’s Imagination in Object Goal Navigation
Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shift with Unlabeled Data
TabDPT: Scaling Tabular Foundation Models on Real Data
TAI3: Testing Agent Integrity in Interpreting User Intent
Towards Identifiability of Hierarchical Temporal Causal Representation Learning
A Practical Guide for Incorporating Symmetry in Diffusion Policy
DeltaFormer: Unlock the state space of Transformer
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
State-Covering Trajectory Stitching for Diffusion Planners
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
State Size Independent Statistical Error Bound for Discrete Diffusion Models
Linear Mixture Distributionally Robust Markov Decision Processes
Private Continual Counting of Unbounded Streams
FedFree: Breaking Knowledge-sharing Barriers through Layer-wise Alignment in Heterogeneous Federated Learning
Better Language Model Inversion by Compactly Representing Next-Token Distributions
Beyond the Average: Distributional Causal Inference under Imperfect Compliance
Efficient and Near-Optimal Algorithm for Contextual Dueling Bandits with Offline Regression Oracles
Sinusoidal Initialization, Time for a New Start
Backward Conformal Prediction
TGA: True-to-Geometry Avatar Dynamic Reconstruction
Continuous Soft Actor-Critic: An Off-Policy Learning Method Robust to Time Discretization
PathVQ: Reforming Computational Pathology Foundation Model for Whole Slide Image Analysis via Vector Quantization
Non-Singularity of the Gradient Descent Map for Neural Networks with Piecewise Analytic Activations
HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation
Normalizing Flows are Capable Models for Continuous Control
Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Imaging Inverse Problems
NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables
Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins
Learning Human-Object Interaction as Groups
DyMU: Dynamic Merging and Virtual Unmerging for Efficient Variable-Length VLMs
Policy Optimized Text-to-Image Pipeline Design
Sound Logical Explanations for Mean Aggregation Graph Neural Networks
High-dimensional neuronal activity from low-dimensional latent dynamics: a solvable model
Bilevel ZOFO: Efficient LLM Fine-Tuning and Meta-Training
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
Counterfactual Image Editing with Disentangled Causal Latent Space
On the Integration of Spatial-Temporal Knowledge: A Lightweight Approach to Atmospheric Time Series Forecasting
UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation
PMLF: A Physics-Guided Multiscale Loss Framework for Structurally Heterogeneous Time Series
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Can Dependencies Induced by LLM-Agent Workflows Be Trusted?
DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
When Models Don’t Collapse: On the Consistency of Iterative MLE
FreqExit: Enabling Early-Exit Inference for Visual Autoregressive Models via Frequency-Aware Guidance
Balancing Positive and Negative Classification Error Rates in Positive-Unlabeled Learning
Lookahead Routing for Large Language Models
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data
Accelerating Block Coordinate Descent for LLM Finetuning via Landscape Expansion
AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners
Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior
The Indra Representation Hypothesis
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
Fantastic Features and Where to Find Them: A Probing Method to combine Features from Multiple Foundation Models
Data-Adaptive Exposure Thresholds under Network Interference
Harnessing the Universal Geometry of Embeddings
Simple and Effective Specialized Representations for Fair Classifiers
Assessing the quality of denoising diffusion models in Wasserstein distance: noisy score and optimal bounds
Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models
Momentum Multi-Marginal Schrödinger Bridge Matching
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Adjusted Count Quantification Learning on Graphs
Adversarial Diffusion for Robust Reinforcement Learning
EchoShot: Multi-Shot Portrait Video Generation
MOF-BFN: Metal-Organic Frameworks Structure Prediction via Bayesian Flow Networks
Gaussian Processes for Shuffled Regression
NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Event-Guided Consistent Video Enhancement with Modality-Adaptive Diffusion Pipeline
Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models
EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation
Metropolis-Hastings Sampling for 3D Gaussian Reconstruction
Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents
Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-training
Revisiting Orbital Minimization Method for Neural Operator Decomposition
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation
Sum Estimation under Personalized Local Differential Privacy
Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation
Sketched Gaussian Mechanism for Private Federated Learning
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
GLID$^2$E: A Gradient-Free Lightweight Fine-tune Approach for Discrete Biological Sequence Design
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
Competitive Advantage Attacks to Decentralized Federated Learning
Variational Task Vector Composition
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
Smoothed Agnostic Learning of Halfspaces over the Hypercube
Towards Pre-trained Graph Condensation via Optimal Transport
Pancakes: Consistent Multi-Protocol Image Segmentation Across Biomedical Domains
Robust Ego-Exo Correspondence with Long-Term Memory
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
From Faults to Features: Pretraining to Learn Robust Representations against Sensor Failures
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching
OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning
Generation as Search Operator for Test-Time Scaling of Diffusion-based Combinatorial Optimization
Compute-Optimal Scaling for Value-Based Deep RL
Asymmetric Duos: Sidekicks Improve Uncertainty
For Better or for Worse, Transformers Seek Patterns for Memorization
TF-MAS: Training-free Mamba2 Architecture Search
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers
HEIR: Learning Graph-Based Motion Hierarchies
Probabilistic Token Alignment for Large Language Model Fusion
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
Titans: Learning to Memorize at Test Time
Feature-aware Modulation for Learning from Temporal Tabular Data
VideoTitans: Scalable Video Prediction with Integrated Short- and Long-term Memory
AlphaFold Database Debiasing for Robust Inverse Folding
KINDLE: Knowledge-Guided Distillation for Prior-Free Gene Regulatory Network Inference
Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning
MR. Video: MapReduce as an Effective Principle for Long Video Understanding
EA3D: Online Open-World 3D Object Extraction from Streaming Videos
MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions
The Rise of Parameter Specialization for Knowledge Storage in Large Language Models
Show-o2: Improved Native Unified Multimodal Models
LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models
Learning Robust Vision-Language Models from Natural Latent Spaces
Self-supervised Blending Structural Context of Visual Molecules for Robust Drug Interaction Prediction
LLM Layers Immediately Correct Each Other
Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping
Instance-Level Composed Image Retrieval
Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation
Pattern-Guided Adaptive Prior for Structure Learning
Oracle-Efficient Combinatorial Semi-Bandits
Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions
Accelerating Optimization via Differentiable Stopping Time
BMW: Bidirectionally Memory bank reWriting for Unsupervised Person Re-Identification
Bi-Level Knowledge Transfer for Multi-Task Multi-Agent Reinforcement Learning
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates
Efficient Fairness-Performance Pareto Front Computation
PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
PocketSR: The Super-Resolution Expert in Your Pocket Mobiles
ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training
DeblurDiff: Real-Word Image Deblurring with Generative Diffusion Models
Block-Biased Mamba for Long-Range Sequence Processing
Memory by accident: a theory of learning as a byproduct of network stabilization
Masked Gated Linear Unit
SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking
In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning
How to Auto-optimize Prompts for Domain Tasks? Adaptive Prompting and Reasoning through Evolutionary Domain Knowledge Adaptation
Rethinking Fair Federated Learning from Parameter and Client View
On Fairness of Unified Multimodal Large Language Model for Image Generation
Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
Scaling and context steer LLMs along the same computational path as the human brain
Evolving and Regularizing Meta-Environment Learner for Fine-Grained Few-Shot Class-Incremental Learning
Finding separatrices of dynamical flows with Deep Koopman Eigenfunctions
Embedding Principle of Homogeneous Neural Network for Classification Problem
ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality
Effects of Dropout on Performance in Long-range Graph Learning Tasks
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing
Dimension-free Score Matching and Time Bootstrapping for Diffusion Models
MARS: A Malignity-Aware Backdoor Defense in Federated Learning
Thompson Sampling for Multi-Objective Linear Contextual Bandit
Systematic Reward Gap Optimization for Mitigating VLM Hallucinations
OrdShap: Feature Position Importance for Sequential Black-Box Models
AdaptGrad: Adaptive Sampling to Reduce Noise
Mechanism Design for LLM Fine-tuning with Multiple Reward Models
A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
Explore In-Context Message Passing Operator for Graph Neural Networks in A Mean Field Game
Uncover Governing Law of Pathology Propagation Mechanism Through A Mean-Field Game
Inductive Domain Transfer In Misspecified Simulation-Based Inference
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space
From Black-box to Causal-box: Towards Building More Interpretable Models
How Memory in Optimization Algorithms Implicitly Modifies the Loss
Improve Temporal Reasoning in Multimodal Large Language Models via Video Contrastive Decoding
A Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
FP64 is All You Need: Rethinking Failure Modes in Physics-Informed Neural Networks
Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model
Understanding outer learning rates in Local SGD
Cancer Survival Analysis via Zero-shot Tumor Microenvironment Segmentation on Low-resolution Whole Slide Pathology Images
Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs
Learning from Interval Targets
MTRec: Learning to Align with User Preferences via Mental Reward Models
MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification
Localist Topographic Expert Routing: A Barrel Cortex-Inspired Modular Network for Sensorimotor Processing
GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding
Empowering Decision Trees via Shape Function Branching
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Are Pixel-Wise Metrics Reliable for Computerized Tomography Reconstruction?
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
GeoCAD: Local Geometry-Controllable CAD Generation with Large Language Models
TEMPO: Temporal Multi-scale Autoregressive Generation of Protein Conformational Ensembles
Learning CAD Modeling Sequences via Projection and Part Awareness
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing
Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning
Conditional Diffusion Anomaly Modeling on Graphs
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models
A Stable Whitening Optimizer for Efficient Neural Network Training
Learning to Insert for Constructive Neural Vehicle Routing Solver
Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning
Enhancing Contrastive Learning with Variable Similarity
Stackelberg Learning with Outcome-based Payment
COS3D: Collaborative Open-Vocabulary 3D Segmentation
Quantifying Uncertainty in the Presence of Distribution Shifts
Spik-NeRF: Spiking Neural Networks for Neural Radiance Fields
Functional data analysis for multivariate distributions through Wasserstein slicing
Unifying Reconstruction and Density Estimation via Invertible Contraction Mapping in One-Class Classification
Learning to Reason under Off-Policy Guidance
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
Enhancing Zero-Shot Black-Box Optimization via Pretrained Models with Efficient Population Modeling, Interaction, and Stable Gradient Approximation
Scaling Laws for Optimal Data Mixtures
DP²O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution
Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Point Cloud Synthesis Using Inner Product Transforms
PUATE: Efficient ATE Estimation from Treated (Positive) and Unlabeled Units
Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks
Near-Exponential Savings for Population Mean Estimation with Active Learning
Length Generalization via Auxiliary Tasks
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation
Spectral Convolutional Conditional Neural Process
Reasoning as an Adaptive Defense for Safety
Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
Scalable Valuation of Human Feedback through Provably Robust Model Alignment
In-Context Learning of Stochastic Differential Equations with Foundation Inference Models
MokA: Multimodal Low-Rank Adaptation for MLLMs
Multi-Agent Debate for LLM Judges with Adaptive Stability Detection
$\texttt{BetaConform}$: Efficient MAP Estimation of LLM Ensemble Judgment Performance with Prior Transfer
Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector
PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models
When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses
MMCSBench: A Fine-Grained Benchmark for Large Vision-Language Models in Camouflage Scenes
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
GlobalTomo: A global dataset for physics-ML seismic wavefield modeling and FWI
MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
DrivAerStar: An Industrial-Grade CFD Dataset for Vehicle Aerodynamic Optimization
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
A2Seek: Towards Reasoning-Centric Benchmark for Aerial Anomaly Understanding
RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
Martian World Model: Controllable Video Synthesis with Physically Accurate 3D Reconstructions
EDBench: Large-Scale Electron Density Data for Molecular Modeling
SolidGeo: Measuring Multimodal Spatial Math Reasoning in Solid Geometry
ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
LIFEBENCH: Evaluating Length Instruction Following in Large Language Models
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
SafeVid: Toward Safety Aligned Video Large Multimodal Models
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts
EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Leader360V: A Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment
CellVerse: Do Large Language Models Really Understand Cell Biology?
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges
ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization
Listening to the Brain: Multi-Band sEEG Auditory Reconstruction via Dynamic Spatio-Temporal Hypergraphs
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Knot So Simple: A Minimalistic Environment for Spatial Reasoning
IR-OptSet: An Optimization-Sensitive Dataset for Advancing LLM-Based IR Optimizer
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms
Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
QUT-DV25: A Dataset for Dynamic Analysis of Next-Gen Software Supply Chain Attacks
Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
URB - Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series
NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Through the Lens: Benchmarking Deepfake Detectors Against Moiré-Induced Distortions
SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset
nvBench 2.0: Resolving Ambiguity in Text-to-Visualization through Stepwise Reasoning
Towards Understanding Camera Motions in Any Video
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection
GenSpace: Benchmarking Spatially-Aware Image Generation
SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Videos Generation
V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
Rethinking Evaluation of Infrared Small Target Detection
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Diffusion Classifiers Understand Compositionality, but Conditions Apply
CHASM: Unveiling Covert Advertisements on Chinese Social Media
SECODEPLT: A Unified Benchmark for Evaluating the Security Risks and Capabilities of Code GenAI
MyoChallenge 2024: A New Benchmark for Physiological Dexterity and Agility in Bionic Humans
ClinBench: A Standardized Multi-Domain Framework for Evaluating Large Language Models in Clinical Information Extraction
RoFt-Mol: Benchmarking Robust Fine-tuning with Molecular Graph Foundation Models
AHa-Bench: Benchmarking Audio Hallucinations in Large Audio-Language Models
BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
STSBench: A Large-Scale Dataset for Modeling Neuronal Activity in the Dorsal Stream of Primate Visual Cortex
Web-Scale Collection of Video Data for 4D Animal Reconstruction
OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata
CPSea: Large-scale cyclic peptide-protein complex dataset for machine learning in cyclic peptide design
Towards Evaluating Proactive Risk Awareness of Multimodal Language Models
KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Quantifying Generalisation in Imitation Learning
Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals
AneuG-Flow: A Large-Scale Synthetic Dataset of Diverse Intracranial Aneurysm Geometries and Hemodynamics
GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction
CarbonGlobe: A Global-Scale, Multi-Decade Dataset and Benchmark for Carbon Forecasting in Forest Ecosystems
Generalizing Verifiable Instruction Following
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
CLEVER: A Curated Benchmark for Formally Verified Code Generation
CoRe: Benchmarking LLMs’ Code Reasoning Capabilities through Static Analysis Tasks
All that structure matches does not glitter
MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
HawkBench: Investigating Resilience of RAG Methods on Stratified Information-Seeking Tasks
ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
AnomalyCoT: A Multi-Scenario Chain-of-Thought Dataset for Multimodal Large Language Models
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction
RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
MedChain: Bridging the Gap Between LLM Agents and Clinical Practice with Interactive Sequence
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
NoBOOM: Chemical Process Datasets for Industrial Anomaly Detection
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models
MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives
Position: Towards Bidirectional Human-AI Alignment
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
Purity Law for Neural Routing Problem Solvers with Enhanced Generalizability
Probing Equivariance and Symmetry Breaking in Convolutional Networks
Lorentz Local Canonicalization: How to make any Network Lorentz-Equivariant
SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Versatile Transferable Unlearnable Example Generator
Pre-trained Large Language Models Learn to Predict Hidden Markov Models In-context
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Value-Guided Search for Efficient Chain-of-Thought Reasoning
OOD Detection with Relative Angles
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
Comparing Uniform Price and Discriminatory Multi-Unit Auctions through Regret Minimization
Offline imitation learning in $Q^\pi$-realizable MDPs without expert realizability
Gaussian Process Upper Confidence Bound Achieves Nearly-Optimal Regret in Noise-Free Gaussian Process Bandits
Uncertainty Estimation on Graphs with Structure Informed Stochastic Partial Differential Equations
Universal Sequence Preconditioning
Efficient Preference-Based Reinforcement Learning: Randomized Exploration meets Experimental Design
Geometry Meets Incentives: Sample-Efficient Incentivized Exploration with Linear Contexts
VarFlow: Proper Scoring-Rule Diffusion Distillation via Energy Matching
Normalization in Attention Dynamics
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
Scaling Embedding Layers in Language Models
Low Rank Gradients and Where to Find Them
An Ellipsoid Algorithm for Online Convex Optimization
Finite Sample Analyses for Continuous-time Linear Systems: System Identification and Online Control
Dynamic View Synthesis as an Inverse Problem
From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning
Sparse Gaussian Processes: Structured Approximations and Power-EP Revisited
Understanding Softmax Attention Layers:\\ Exact Mean-Field Analysis on a Toy Problem
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
Test-Time Adaptation by Causal Trimming
Smart Surrogate Losses for Contextual Stochastic Linear Optimization with Robust Constraints
HOComp: Interaction-Aware Human-Object Composition
Bag of Tricks for Inference-time Computation of LLM Reasoning
U-REPA: Aligning Diffusion U-Nets to ViTs
Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation
Training the Untrainable: Introducing Inductive Bias via Representational Alignment
Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
What We Miss Matters: Learning from the Overlooked in Point Cloud Transformers
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning
Adaptive Sigmoid Clipping for Balancing the Direction–Magnitude Mismatch Trade-off in Differentially Private Learning
Online Multi-Class Selection with Group Fairness Guarantee
PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors
Information Theoretic Learning for Diffusion Models with Warm Start
MUVR: A Multi-Modal Untrimmed Video Retrieval Benchmark with Multi-Level Visual Correspondence
MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection
SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training
How Different from the Past? Spatio-Temporal Time Series Forecasting with Self-Supervised Deviation Learning
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning
Abstract Counterfactuals for Language Model Agents
Tight Bounds on the Distortion of Randomized and Deterministic Distributed Voting
Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery
PLD: A Choice-Theoretic List-Wise Knowledge Distillation
Don’t Give Up on Democratizing AI for the Wrong Reasons
SHAP values via sparse Fourier representation
Fair Matroid Selection
Learning to Plan Like the Human Brain via Visuospatial Perception and Semantic-Episodic Synergistic Decision-Making
Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning
World Models as Reference Trajectories for Rapid Motor Adaptation
Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantization
A Closer Look at Graph Transformers: Cross-Aggregation and Beyond
Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation
Efficient Knowledge Transfer in Federated Recommendation for Joint Venture Ecosystem
PointMAC: Meta-Learned Adaptation for Robust Test-Time Point Cloud Completion
Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs
The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
Understanding Prompt Tuning and In-Context Learning via Meta-Learning
Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation
Dynamic Diffusion Schrödinger Bridge in Astrophysical Observational Inversions
User-Instructed Disparity-aware Defocus Control
$\texttt{STRCMP}$: Integrating Graph Structural Priors with Language Models for Combinatorial Optimization
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Can Class-Priors Help Single-Positive Multi-Label Learning?
Resource-Constrained Federated Continual Learning: What Does Matter?
Cognitive Predictive Processing: A Human-inspired Framework for Adaptive Exploration in Open-World Reinforcement Learning
OpenGU: A Comprehensive Benchmark for Graph Unlearning
Reward-Aware Proto-Representations in Reinforcement Learning
AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant Adversarial Patches
FrameShield: Adversarially Robust Video Anomaly Detection
Rope to Nope and Back Again: A New Hybrid Attention Strategy
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
Detecting Data Deviations in Electronic Health Records
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool
Unsupervised Federated Graph Learning
Tight Lower Bounds and Improved Convergence in Performative Prediction
SPARKE: Scalable Prompt-Aware Diversity and Novelty Guidance in Diffusion Models via RKE Score
Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs
Reduction-based Pseudo-label Generation for Instance-dependent Partial Label Learning
DERD-Net: Learning Depth from Event-based Ray Densities
Care-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson’s Disease Gait Assessment
Strassen Attention, Split VC Dimension and Compositionality in Transformers
Hawaii: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
Global Prompt Refinement with Non-Interfering Attention Masking for One-Shot Federated Learning
ChemX: A Collection of Chemistry Datasets for Benchmarking Automated Information Extraction
Stop DDoS Attacking the Research Community with AI-Generated Survey Papers
Prompt Tuning Transformers for Data Memorization
Mysteries of the Deep: Role of Intermediate Representations in Out of Distribution Detection
Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits
Semi-supervised Graph Anomaly Detection via Robust Homophily Learning
SEMPO: Lightweight Foundation Models for Time Series Forecasting
Orthogonal Contrastive Learning for Multi-Representation fMRI Analysis
On the VC dimension of deep group convolutional neural networks
Aligning What Matters: Masked Latent Adaptation for Text-to-Audio-Video Generation
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization
Storyboard-guided Alignment for Fine-grained Video Action Recognition
Leveraging robust optimization for llm alignment under distribution shifts
VPO: Reasoning Preferences Optimization Based on $\mathcal{V}$-Usable Information
Non-convex entropic mean-field optimization via Best Response flow
Reasoning is Periodicity? Improving Large Language Models Through Effective Periodicity Modeling
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning
Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs
3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization
CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation
Explainably Safe Reinforcement Learning
Audio Super-Resolution with Latent Bridge Models
Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning
Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Implicit Modeling for Transferability Estimation of Vision Foundation Models
UniGTE: Unified Graph–Text Encoding for Zero-Shot Generalization across Graph Tasks and Domains
Local Curvature Descent: Squeezing More Curvature out of Standard and Polyak Gradient Descent
AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition
THD-BAR: Topology Hierarchical Derived Brain Autoregressive Modeling for EEG Generic Representations
S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation
Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning
RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Rank Correlation between Labels
Addressing Mark Imbalance in Integration-free Marked Temporal Point Processes
Mesh Interpolation Graph Network for Dynamic and Spatially Irregular Global Weather Forecasting
The Quest for Universal Master Key Filters in DS-CNNs
No Object Is an Island: Enhancing 3D Semantic Segmentation Generalization with Diffusion Models
Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction
Light-Weight Diffusion Multiplier and Uncertainty Quantification for Fourier Neural Operators
Dr. RAW: Towards General High-Level Vision from RAW with Efficient Task Conditioning
Soft-consensual Federated Learning for Data Heterogeneity via Multiple Paths
Rising from Ashes: Generalized Federated Learning via Dynamic Parameter Reset
Compressed and Smooth Latent Space for Text Diffusion Modeling
Pin the Tail on the Model: Blindfolded Repair of User-Flagged Failures in Text-to-Image Services
FedIGL: Federated Invariant Graph Learning for Non-IID Graphs
3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction
GLNCD: Graph-Level Novel Category Discovery
AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models
More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models
AutoPartGen: Autoregressive 3D Part Generation and Discovery
Predictable Scale (Part II) --- Farseer: A Refined Scaling Law in LLMs
Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA
Gradient Variance Reveals Failure Modes in Flow-Based Generative Models
UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning
IMPACT: Irregular Multi-Patch Adversarial Composition Based on Two‑Phase Optimization
Enhancing Sample Selection Against Label Noise by Cutting Mislabeled Easy Examples
PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring
A machine learning approach that beats Rubik's cubes
Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
Unified Scaling Laws for Compressed Representations
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly
Flow-Based Policy for Online Reinforcement Learning
Multi-Modal Interactive Agent Layer for Few-Shot Universal Cross-Domain Retrieval and Beyond
COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation
InstructHOI: Context-Aware Instruction for Multi-Modal Reasoning in Human-Object Interaction Detection
Near-Optimal Experiment Design in Linear non-Gaussian Cyclic Models
Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection
Computational Hardness of Reinforcement Learning with Partial $q^{\pi}$-Realizability
Toward a Unified Geometry Understanding : Riemannian Diffusion Framework for Graph Generation and Prediction
Efficient Utility-Preserving Machine Unlearning with Implicit Gradient Surgery
Memory-Augmented Potential Field Theory: A Framework for Adaptive Control in Non-Convex Domains
Relaxing partition admissibility in Cluster-DAGs: a causal calculus with arbitrary variable clustering
Pessimistic Data Integration for Policy Evaluation
Statistical inference for Linear Stochastic Approximation with Markovian Noise
Uncertainty-Aware Multi-Objective Reinforcement Learning-Guided Diffusion Models for 3D De Novo Molecular Design
One Sample is Enough to Make Conformal Prediction Robust
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework
Majority of the Bests: Improving Best-of-N via Bootstrapping
The Promise of RL for Autoregressive Image Editing
Manipulating Feature Visualizations with Gradient Slingshots
Capturing Individual Human Preferences with Reward Features
Physics-informed Reduced Order Modeling of Time-dependent PDEs via Differentiable Solvers
Transcending Cost-Quality Tradeoff in Agent Serving via Session-Awareness
Navigating the MIL Trade-Off: Flexible Pooling for Whole Slide Image Classification
Constant Bit-size Transformers Are Turing Complete
Towards foundational LiDAR world models with efficient latent flow matching
An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination
Diffusion-Guided Graph Data Augmentation
BNMusic: Blending Environmental Noises into Personalized Music
Doubly Robust Alignment for Large Language Models
V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation
Fast Rate Bounds for Multi-Task and Meta-Learning with Different Sample Sizes
Beyond Pairwise Connections: Extracting High-Order Functional Brain Network Structures under Global Constraints
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
ECO: Evolving Core Knowledge for Efficient Transfer
$\epsilon$-Seg: Sparsely Supervised Semantic Segmentation of Microscopy Data
BADiff: Bandwidth Adaptive Diffusion Model
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data
Cyclic Counterfactuals under Shift–Scale Interventions
GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning
Enhancing GUI Agent with Uncertainty-Aware Self-Trained Evaluator
Bridging Time and Linguistics: LLMs as Time Series Analyzer through Symbolization and Segmentation
A Learning-Augmented Approach to Online Allocation Problems
TreeSplat: Mergeable Tree for Deformable Gaussian Splatting
Limitations of Normalization in Attention
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving
Reasoning Is Not a Race: When Stopping Early Beats Going Deeper
Breakthrough Sensor-Limited Single View: Towards Implicit Temporal Dynamics for Time Series Domain Adaptation
Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression
The Quotient Bayesian Learning Rule
NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding
Test-Time Adaptive Object Detection with Foundation Model
Shapley-Based Data Valuation for Weighted $k$-Nearest Neighbors
Integral Imprecise Probability Metrics
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
HypoBootstrap: A Bootstrapping Framework for Inductive Reasoning
A Dynamic Learning Strategy for Dempster-Shafer Theory with Applications in Classification and Enhancement
Improving the Straight-Through Estimator with Zeroth-Order Information
Learning to Solve Complex Problems via Dataset Decomposition
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning
NegoCollab: A Common Representation Negotiation Approach for Heterogeneous Collaborative Perception
Price of Parsimony: Complexity of Fourier Sparsity Testing
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
A unified framework for establishing the universal approximation of transformer-type architectures
Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Statistical Parity with Exponential Weights
RankSEG-RMA: An Efficient Segmentation Algorithm via Reciprocal Moment Approximation
More Than Just Functional: LLM-as-a-Critique for Efficient Code Generation
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation
LLM at Network Edge: A Layer-wise Efficient Federated Fine-tuning Approach
Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
CHPO: Constrained Hybrid-action Policy Optimization for Reinforcement Learning
Token Perturbation Guidance for Diffusion Models
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
Enhancing Privacy in Multimodal Federated Learning with Information Theory
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Turning the Tables: Enabling Backward Transfer via Causal-Aware LoRA in Continual Learning
Localized Data Shapley: Accelerating Valuation for Nearest Neighbor Algorithms
Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization
Learning Simple Interpolants for Linear Integer Arithmetic
EAReranker: Efficient Embedding Adequacy Assessment for Retrieval Augmented Generation
Identifying Macro Causal Effects in C-DMGs over DMGs
Revealing Multimodal Causality with Large Language Models
SPICED: A Synaptic Homeostasis-Inspired Framework for Unsupervised Continual EEG Decoding
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Robust Graph Condensation via Classification Complexity Mitigation
scSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
SSRB: Direct Natural Language Querying to Massive Heterogeneous Semi-Structured Data
Linguini: A benchmark for language-agnostic linguistic reasoning
StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold
AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs
We use cookies to store which papers have been visited.
I agree
Successful Page Load
NeurIPS uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept
We use cookies to store which papers have been visited.
I agree