Track: Exhibitor Spot Talks

Tue 2 Dec. 12:00 - 12:12 PST

Exhibitor Talk - Simular

Tue 2 Dec. 12:15 - 12:27 PST

Recycling the World Computer: Fault-Tolerant LLM Training on Idle GPU Capacity

Jason Mancuso

Modal runs a global, multi-cloud fleet of worker instances to provide our users sub-second access to autoscaled, GPU-backed compute. Fulfilling this commitment to our users requires running some amount of buffer capacity to account for bursty workloads. We present early results leveraging our buffer capacity for distributed training of large-language models with a fault-tolerant implementation of DiLoCo.

Tue 2 Dec. 12:30 - 12:42 PST

Half Is Heroic: Rewarding Non-Answers for Responsible AI Decision-Making

Sergio Bruccoleri

This talk explores a new evaluation paradigm that moves beyond binary correctness to tricategorical reasoning, rewarding AI systems for responsible restraint and ethical uncertainty. We will share insights and early metrics showing how human-in-the-loop evaluation can strengthen content safety and model reliability in real-world applications.

Tue 2 Dec. 12:45 - 12:57 PST

Auto-SWE-Bench: Scalable, Real-World Benchmarks for LLM Coding Evaluation

Lilin Wang

In this talk, Turing presents Auto-SWE-Bench, a framework for automatically generating large-scale, high-fidelity SWE-Bench datasets and environments from open-source GitHub repositories. Unlike static, manually curated datasets, Auto-SWE-Bench continuously sources real-world GitHub issues and pull requests across multiple programming languages, producing diverse tasks that reflect the true challenges of software engineering. Our framework ensures reproducibility, granular test feedback, and rigorous quality alignment, enabling dynamic and multilingual benchmarks at scale. For AI researchers working on code generation and reasoning, Auto-SWE-Bench provides a powerful platform to evaluate today’s systems and accelerate progress toward more capable coding models.

Tue 2 Dec. 13:00 - 13:12 PST

Exhibitor Talk - Optiver

Scott McKenzie

Tue 2 Dec. 13:00 - 13:12 PST

Tabular and Causal Foundation Modelling: TabDPT and CausalPFN

Anthony L Caterini

Tabular and Causal Foundation Modelling: TabDPT and CausalPFN

Tabular data powers decision-making across industries, yet its diversity has long limited the reach of deep learning. TabDPT breaks this barrier: it’s a tabular foundation model (TFM), trained using real-world data, that generalizes rapidly to new tasks and domains. TabDPT is the first TFM to demonstrate scaling laws akin to those seen in large language models, predictably improving by increasing the size of the model and training data. TabDPT’s scalability, robust representations, and in-context learning make it the backbone for next-generation applications, including CausalPFN. Built on TabDPT's scalable architecture, CausalPFN automates causal effect estimation from observational data, eliminating manual model selection and tuning. By amortizing learning over a vast array of data generating processes, CausalPFN delivers out-of-the-box causal effect estimation on unseen observational datasets without any fine-tuning or parameter selection, and surpasses the state-of-the-art for models trained on individual datasets. Together, TabDPT and CausalPFN set a new standard for tabular AI, combining broad generalization with trustworthy causal reasoning for real-world impact.

Tue 2 Dec. 13:15 - 13:27 PST

Exhibitor Talk

Tue 2 Dec. 13:15 - 13:27 PST

Rights Management Capabilities and Enforcement for AI Music

Sound Patrol is developing GenAI-native technologies—authenticity detection, infringement analysis, and attribution frameworks—that enable rights management and enforcement for artists and media companies, providing the technical foundation for tomorrow's contracts and compliance systems.

Tue 2 Dec. 13:30 - 13:42 PST

Exploring Pathways into Quantitative Research

Jamie Watson · Oliva Bateman

Curious about how scientific thinking drives progress in quantitative research? This session offers a look at how researchers from maths, physics, computer science, and related fields apply their skills to complex, data-driven problems. We’ll also discuss the pathways and opportunities available for those interested in contributing to this kind of work, and the collaborative environment that helps new researchers thrive.

Tue 2 Dec. 13:30 - 13:42 PST

Semantic Parsing at Bloomberg

Sachith Sri Ram Kothur

Code generation — and semantic parsing in particular — enables the creation of natural language interfaces for interacting with the vast trove of financial data provided by Bloomberg. This data is available both in structured repositories and through dedicated APIs. In this sponsored talk, we discuss the potential of these technologies to democratize access and empower users to perform complex financial analyses and analytics. We will also highlight two of Bloomberg's recent publications at EMNLP 2025: one on calibrating text-to-SQL outputs from large language models without retraining, and another that introduced STARQA, a dataset for testing complex analytical reasoning and real-world query understanding.

Tue 2 Dec. 13:45 - 13:57 PST

Entropy by Design: Synthetic Data at Scale

Marah Abdin

At scale, under-constructed generation amplifies uniformity, causing synthetic data pipelines to plateau into self-similarity. We approach the problem through entropy-aware design as a system of variability levers - both structural and cognitive - preserving quality and diversity at pretraining scale.

Tue 2 Dec. 13:45 - 13:57 PST

ATLAS: AdapTive-LeArning Speculator System for Real-Time LLM Inference Acceleration with Together AI

Junxiong Wang · Ben Athiwaratkun

We present ATLAS (AdapTive-LeArning Speculator System), a speculative decoding framework that achieves up to 4x faster LLM inference by learning from live traffic in real-time. ATLAS combines a static speculator for robust baseline performance with a lightweight adaptive speculator that rapidly specializes to emerging workload patterns. A confidence-aware controller dynamically selects between speculators and adjusts lookahead length to optimize acceptance rates. On DeepSeek-V3.1, ATLAS achieves up to 500 TPS, 3.18x faster than standard decoding. In RL training scenarios, acceptance rates improve from 10% to 80%, reducing training time by over 60%. Unlike static speculators that degrade as workloads evolve, ATLAS continuously adapts to maintain peak performance.

Tue 2 Dec. 14:00 - 14:12 PST

Exhibitor Talk - Cohere

Tue 2 Dec. 14:15 - 14:27 PST

Drug Discovery with Large-Scale, Physics-Based Synthetic Data at D. E. Shaw Research

Peter Skopp

Recent breakthroughs in machine learning—such as diffusion models for biomolecular cofolding and multimodal large language models (LLMs)—have the potential to reshape the drug discovery process by enabling reasoning across complex biological and chemical systems. Despite advancements in machine learning hardware, algorithms, and model architectures, data scarcity remains a fundamental bottleneck to progress in the field. High-quality experimental data on biological structures, their interactions with drug-like molecules, and important physical properties of those molecules are limited, slow to generate, and capital intensive. At D. E. Shaw Research, one way we are addressing this challenge is through large-scale, physics-based synthetic data generation. Many of our models are trained using vast amounts of molecular dynamics simulation data produced by our special-purpose supercomputer, Anton. The synthetic data from Anton helps us to generate diverse and physically meaningful datasets that extend well beyond data available from experiment. This talk will highlight some of the types of data we generate as well as some of the methods used to incorporate molecular dynamics data into multimodal LLMs.

Tue 2 Dec. 14:15 - 14:27 PST

Agentic AI: Exploring Evolution and Evaluation

Lu Lu

Agentic AI models are reshaping how we approach problem solving. This talk introduces a conceptual framework for evaluating their capabilities, highlighting key challenges, assessment dimensions, and future directions for real-world readiness.

Tue 2 Dec. 14:30 - 14:42 PST

Measuring Enterprise Agents - Production Readiness Index and Evaluation Environment

Tue 2 Dec. 14:30 - 14:42 PST

Measuring Emergent Behavior in AI Agents - Weights & Biases

As language models transition into agents, they exhibit behaviors that were not explicitly trained—emergent dynamics that are powerful yet poorly understood. Measuring these behaviors requires dedicated tooling that treats evaluation as a central research problem rather than a peripheral task. This talk introduces frameworks for self-improving agents that generate candidate variants, run structured experiments, and incorporate evaluation feedback into iterative refinement. Such loops operationalize the scientific method in software, enabling agents to improve through cycles of hypothesis, measurement, and revision. Tooling for evaluation plays a critical role in this process, transforming measurement from a diagnostic exercise into an engine for discovery. Early experiments reveal both hidden failure modes and novel capabilities, underscoring the need to build for emergence as an active research objective. The talk concludes by outlining a research agenda in which evaluation frameworks provide the substrate for cultivating reliable, trustworthy agent systems.

Tue 2 Dec. 14:45 - 14:57 PST

New Frontier of AI: Eval, RL, and What's Next

Bing Liu

Tue 2 Dec. 14:45 - 14:57 PST

Latent Thought Models with Variational Bayes Inference-Time Computation

Jianwen Xie

This talk introduces Latent Thought Models (LTMs), a novel class of language models that incorporate explicit latent thought vectors following a prior model in latent space. These vectors, inferred from observed ground tokens via posterior inference within the classical variational Bayes framework, are refined through inference-time computation. This process enables explicit abstraction and reasoning in a compact latent space, distinct from standard LLM’s unstructured embeddings. Experiments show that this new paradigm achieves superior sample and parameter efficiency compared to autoregressive models and introduces inference-time computation as a new scaling dimension beyond traditional LLMs.

Tue 2 Dec. 15:00 - 15:12 PST

Exhibitor Talk - MathWorks: Would you trust your AI model with your life?

Lucas Garcia

Generative AI has made headlines with fluent text and clever code, setting new benchmarks in creativity and performance. But in safety-critical domains like aerospace, automotive, and healthcare, the challenge is different: ensuring that AI systems behave reliably and safely under all possible operating conditions.

This talk explores the known gap between academic breakthroughs and industrial trust, and shows how formal verification, explainability, and runtime assurance can turn black-box models into certifiable systems. Drawing on MathWorks’ experience working with engineers and scientists in developing AI-enabled safety-critical systems, we’ll demonstrate how to verify robustness, detect out-of-distribution inputs, and meet emerging safety standards—across models in PyTorch, ONNX, and MATLAB.

If your AI is heading into the real world, it’s time to ask yourself: Would you trust it with your life?

Tue 2 Dec. 15:15 - 15:27 PST

I for Code: What We’ve Learned (and Whats Next)

Tue 2 Dec. 15:30 - 15:42 PST

From Agent Soup to Proper Software Design: Putting the Developer Back in Control of Generative AI with Mellea

David Cox

Mellea is a generative AI library for writing robust application software. LLM outputs are intrinsically unpredictable and often wrong. Agentic frameworks and prompt optimization libraries address this unpredictability by putting the LLM in control leading to systems that are often difficult to debug, maintain, evolve, and port. Mellea puts developers back in control by defining a programming model that encourages task decomposition, information hiding, and compositional contracts between isolated modules. Mellea’s programming model results in systems with manageable fault models, better portability, and lower inference-time costs.

Tue 2 Dec. 15:45 - 15:57 PST

Realizing Personal and Enterprise AI Twins - Lenovo

Oguz Elibol

In this talk, we outline our progress toward realizing Personal and Enterprise AI Twins using a Hybrid Compute paradigm. We will discuss specific advancements in our on-device and cloud agents, focusing on improved tool calling, information retrieval, and data-driven optimization. Furthermore, we will present results on enhancing speculative decoding methodologies to accelerate inference, along with model routing strategies designed to balance cost, latency, and performance.

Tue 2 Dec. 16:00 - 16:12 PST

From RLHF to RL Environments - Welcoming Agents to the Real World

John Cutter

LLMs have reached a glass ceiling. Models continue to improve, but static data and common benchmarks cannot teach systems to operate in chaotic and multi-objective settings.

Progress now depends less on scale and more on the environments in which models learn. At Invisible, we believe reinforcement learning environments and evolving evaluations are the core ingredients of this shift. By transitioning from RLHF to verifiable reward systems and reinforcement learning with AI feedback, agents train inside conditions that resemble the real world. They learn to make tradeoffs, use tools, reason through uncertainty, and adjust based on consequences rather than simple instructions.

This session outlines why static training has stalled, how reward-driven RL environments support deeper reasoning, and why evaluations must become multi-objective, verifiable, and informed by human expertise to capture real world skills such as judgment, tone, and strategy.

Join us to examine how teaching AI to reason within realistic environments opens new research territory and why the next breakthroughs will come from smarter environments.

Tue 2 Dec. 16:15 - 16:27 PST

Exhibitor Talk - WRITER

Daniel M. Bikel

In this session, Dan Bikel will explore what the next generation of enterprise LLMs will require as organizations move from experimentation to true autonomy. The first wave of large models proved their value in retrieval and workflow execution, but the next frontier demands systems that can plan, reason, backtrack, and specialize. Dan will outline where the research community is heading and how the future points toward single deployed models that can shift into domain-specific modes without sacrificing efficiency or control. He will also share a preview of WRITER’s “Enterprise Brain” research, which represents an enterprise-grade approach to continuous learning – automatically extracting and organizing knowledge across workflows while meeting governance, privacy, and security. Attendees will walk away with a clear view of how LLMs are evolving from static models into self-improving, context-rich systems that drive real autonomy in the enterprise.

Tue 2 Dec. 16:30 - 16:42 PST

Exhibitor Talk - Rennaissance Philanthropy

Tue 2 Dec. 16:45 - 16:57 PST

Algorithmic Trading with Large-Scale Deep Learning

Tue 2 Dec. 17:00 - 17:12 PST

Automated Curation of Foundation-Scale Pretraining Datasets

Matthew Leavitt

Large-scale models are what they eat: the quality and composition of their training data fundamentally determine their performance and behavior. Data curation is a frontier research and engineering challenge in deep learning, but cutting-edge techniques remain confined to large organizations with extensive in-house data teams. We developed a deployable, productionized data curation pipeline that integrates a suite of modular algorithms and scales efficiently to trillions of tokens. We share the results of applying our curation pipeline to generate state-of-the-art text and image-text datasets, demonstrating that scalable, high-quality data curation is accessible beyond the largest AI labs.

Tue 2 Dec. 17:15 - 17:27 PST

Databricks Presents a New IDP Benchmark

Most business documents still exist for humans first and machines second. One of our goals at Databricks is to make this human-centered data "legible" to AI and Agents, so that we gain insights and even take actions based upon those insights. But AI can still struggle to understand the full range of messy unstructured documents we produce for each other. We've created and will present a benchmark, OfficeQA, that probes the limits of current AI systems in analyzing a large 89,000 page public dataset.

Tue 2 Dec. 17:30 - 17:42 PST

Decentralized Diffusion Models

Bidhan Roy

Training diffusion models across isolated clusters produces experts that never communicate during training yet develop clear specializations. This challenges fundamental assumptions about what gradient synchronization actually provides in distributed learning.

Tue 2 Dec. 17:45 - 18:00 PST

The State of Open Source AI and a Look Into the Future

Nader Khalil

The rapid advancement of artificial intelligence (AI) has been profoundly shaped by the open-source community, fostering unprecedented collaboration, innovation, and democratization of AI technologies. We examine the pivotal role of open-source ecosystems in driving accessibility, reproducibility, and scalability of AI solutions, with a focus platforms like PyTorch, Triton Inference Server, and Megatron-LM.

Main Navigation

Exhibitor Spot Talks

Exhibitor Spot Talks - Session 1

Exhibit Hall A,B

Exhibitor Talk - Simular

Recycling the World Computer: Fault-Tolerant LLM Training on Idle GPU Capacity

Half Is Heroic: Rewarding Non-Answers for Responsible AI Decision-Making

Auto-SWE-Bench: Scalable, Real-World Benchmarks for LLM Coding Evaluation

Exhibitor Talk - Optiver

Tabular and Causal Foundation Modelling: TabDPT and CausalPFN

Exhibitor Talk

Rights Management Capabilities and Enforcement for AI Music

Exploring Pathways into Quantitative Research

Semantic Parsing at Bloomberg

Entropy by Design: Synthetic Data at Scale

ATLAS: AdapTive-LeArning Speculator System for Real-Time LLM Inference Acceleration with Together AI

Exhibitor Talk - Cohere

Drug Discovery with Large-Scale, Physics-Based Synthetic Data at D. E. Shaw Research

Agentic AI: Exploring Evolution and Evaluation

Measuring Enterprise Agents - Production Readiness Index and Evaluation Environment

Measuring Emergent Behavior in AI Agents - Weights & Biases

New Frontier of AI: Eval, RL, and What's Next

Latent Thought Models with Variational Bayes Inference-Time Computation

Exhibitor Talk - MathWorks: Would you trust your AI model with your life?

I for Code: What We’ve Learned (and Whats Next)

From Agent Soup to Proper Software Design: Putting the Developer Back in Control of Generative AI with Mellea

Realizing Personal and Enterprise AI Twins - Lenovo

From RLHF to RL Environments - Welcoming Agents to the Real World

Exhibitor Talk - WRITER

Exhibitor Talk - Rennaissance Philanthropy

Algorithmic Trading with Large-Scale Deep Learning

Automated Curation of Foundation-Scale Pretraining Datasets

Databricks Presents a New IDP Benchmark

Decentralized Diffusion Models

The State of Open Source AI and a Look Into the Future