Exhibitor Spot Talks
Exhibitor Spot Talks - Session 3
Exhibit Hall A,B
We are entering an inference age, driven by the growing role of agentic AI, test-time compute, and post-training, making compute efficiency increasingly critical. Furiosa’s Tensor Contraction Processor (TCP) architecture and co-designed hardware-software stack maximize utilization across layers, from transformer-based models to multimodal inference, enabling large-scale LLMs and generative AI workloads to run with greater efficiency and performance.
EMM-1: Expanding embedding space to be multimodal and multilingual
Jennifer Ding
Many multimodal systems are really bi-modal - or maybe tri-modal. The shift to multimodal AI has created an urgent need for large-scale, high-quality training data spanning more modalities. The EMM-1 dataset represents the largest and highest quality dataset of five modalities. The dataset has three parts: x000D i) A large, >100M sample, automatically generated dataset of quintuples consisting of matching (caption, image, video, audio, and point clouds); x000D ii) a human-rated subset comprising ~1M ratings of pairs amongst the five modalities; x000D iii) a novel, first-of-its-kind, consensus-based evaluation set (3.5K data points) to evaluate zero-shot capabilities between audio and point clouds. x000D With the release, we hope to accelerate the development of truly multimodal applications. To demonstrate the usefulness of the dataset, we publish a simple, yet powerful, baseline model that demonstrates strong cross-modal retrieval performance. While powerful, the model leaves substantial headroom for further optimization. To name but a few, attention over full token sequences, quality-weighted objectives, and expanded fine-tuning. By expanding captions to multiple languages, we're unlocking this dataset for teams building multimodal AI worldwide.
This talk provides an overview of our papers featured at this conference, focusing on two complementary approaches to enhancing large language model (LLM) capabilities: internal reasoning enhancement and external tool integration. We investigate strategies to boost the intrinsic reasoning abilities of LLMs through advanced search algorithms and problem decomposition, specifically applied to automated code generation. Simultaneously, we explore how LLMs can enhance their problem-solving capacities by interacting with specialized external tools, including satisfiability solvers, optimization solvers, computer vision tools, and time series analysis tools. By integrating these internal and external enhancements, we aim to unlock new frontiers in artificial intelligence, enabling more sophisticated decision-making and problem-solving capabilities.
Towards reference models for Engineering Abstract
Johannes Brandstetter
In the era of LLMs, one gets notoriously confronted with the question of where we stand with applicability of large-scale deep learning models within scientific or engineering domains. The discussion starts by reiterating on recent triumphs in weather and climate modeling, making connections to computer vision, physics-informed learning and neural operators. Secondly, we discuss breakthroughs in multi-physics modeling, computational fluid dynamics, and related fields, putting an emphasis on what it takes to build reference models for whole industry verticals. We relate those breakthroughs to advancements in engineering and much faster process cycles.
Modular Agents and Orchestrated Coordination using Open Agentic Protocols: Foundations for the Next Wave of Agentic AI
Matt White
We argue that the future of open agentic Al hinges on systems composed of small, domain-specialist models fine-tuned for agentic tasks, coordinated by asynchronous multi-agent orchestration and stabilized via deterministic rule scaffolding. In this paradigm, each agent focuses on a slice of reasoning, planning, or execution; orchestration layers manage long-horizon workflows and inter-agent dependencies; and open protocols and frameworks ensure interoperability, extensibility, and community-driven evolution.x000D This talk outlines a blueprint for this architecture, touching on real challenges and emerging research directions: semantic routing of subtasks, fault recovery in agent networks, hybrid symbolic-neural constraint enforcement, and the design of open agentic communication standards (e.g. MCP, A2A). We show how modular, protocol-driven agent ecosystems can scale, adapt, and coordinate over complex tasks, and we offer a roadmap for pushing these ideas from prototypes to robust agentic infrastructure.
Web agents need a new training paradigm built on trajectories that capture how perception, planning, and action co-evolve. Together, we'll explore how we can scale reasoning through experience using a self-improving engine that expands data by structure. This marks a shift toward scaling intelligence through coherence and grounded reasoning, rather than rote imitation. In the second half, I’ll show how simple synthetic reasoning gyms with verifiable rewards can serve as catalysts for reasoning to surface latent intelligence and drive generalization to open-ended web tasks. We’ll compare SFT and RL, and discuss how balanced training mixtures strengthen robustness and cross-domain transfer.
Takes two to Tangle: showcasing an experimentation platform with the focus on collaboration and ease of editing
Tangle is an open-source experimentation platform from Shopify that lets teams design and run ML/data processing graphs. Tangle's focus is on Reuse and Remix, simplifying the path for you to get the first working model, then to refine and make it better, by yourself or with colleagues. Tangle offers a cloud-agnostic orchestrator, a visual graph editor and a run database that automatically captures all your experiments. In this talk, we will showcase the ease of creating, editing, running, remixing and sharing Tangle pipelines.
LFM2: Capable and efficient on-device multimodal foundation models.
Jimmy Smith
While recent multimodal foundation model families emphasize efficiency at scale, there remains a gap for edge-first models that simultaneously lead in quality, speed, and memory efficiency on phones, tablets, and laptops, while remaining practical to pre-train and post-train. We present LFM2, the second generation of Liquid Foundation Models optimized end-to-end for on-device deployment. We will explore LFM2's edge-first design: it co-designs architecture, pre-training and post-training for optimizing quality subject to on-device latency and peak memory constraints.
tsuzumi: Advanced and Sovereign Japanese LLM
Mio Nagai · Kyosuke Nishida
We have developed a series of large language models called tsuzumi, Japanese LLMs built entirely from scratch. The latest model, tsuzumi 2, includes 28.6 billion parameters and is trained on more than 10T tokens of a carefully curated multilingual corpus with a strong emphasis on high-quality Japanese data. It demonstrates robust performance in instruction following, controllable reasoning that can be enabled or disabled, and domain-specific tasks. The tokenizer is designed to reflect the structure of Japanese grammar and vocabulary, which enables significantly improved compression efficiency for Japanese while also supporting strong performance in English and other languages. The model is well suited for enterprises and public organizations because it can be deployed on premise to securely handle highly sensitive user data.
Ling 2.0 and Ling-1T: Scaling Knowledge-Enhanced Large Language Models for Reasoning and Efficiency
Wei Fu
The Ling 2.0 series represents a new generation of large language models designed around knowledge enhancement, reasoning efficiency, and scalable architecture innovation. Built upon trillion-scale sparse MoE foundations, Ling-1T achieves ~50B active parameters per token with FP8 mixed-precision pipelines and 1F1B interleaved scheduling, realizing over 40% training-throughput gains with negligible accuracy loss (<0.1%). This talk presents the technical journey behind Ling-mini, Ling-flash, and Ling-1T, focusing on (1) efficient large-scale training systems for trillion-parameter models; (2) the Ling Scaling Law and its implications for cross-domain reasoning; (3) hybrid attention and RL-based alignment strategies that enable both concise reasoning and long-context understanding; and (4) how these architectural and algorithmic advances empower industrial applications such as financial risk modeling and knowledge-grounded agents. We will conclude with open-sourced implementations (inclusionAI on Hugging Face and ModelScope) and future research directions toward trustworthy, efficient, and domain-enhanced LLMs.
In this research talk, Mohamed Ahmed, Staff Machine Learning Researcher at RBC Borealis, introduces ATOM: RBC's proprietary AI foundation model designed specifically for financial services. This session explores the research behind ATOM's design, including its use of large-scale asynchronous transactional data to model complex client behaviours. Combining domain-specific insights with foundation model techniques, ATOM delivers predictive capabilities across a variety of baking products, channels, and tasks. By harnessing RBC's extensive data ecosystem and commitment to responsible AI, ATOM represents a significant toward generalizable, trustworthy and scaleable machine learning systems for the financial industry.
In this talk, we will describe our ongoing efforts at Datadog AI Research. We are tackling ambitious research areas grounded in real-world challenges in cloud observability. Some of our ongoing directions are: (1) Observability Foundation Models for forecasting, anomaly detection, and multi-modal telemetry analysis (logs, metrics, traces, etc.); (2) Site Reliability Engineering (SRE) Agents to detect, diagnose, and resolve production incidents; and (3) Production Code Repair Agents that leverage code, logs, and runtime data to identify and fix performance issues.
Exhibitor Talk - Boson.AI Talk to me - engineering conversational intelligence
Talk to me - engineering conversational intelligence
In this talk I give a brief overview of Higgs Audio for both understanding and generation. The talk covers data collection, cleaning and preprocessing, model architecture and tokenization, training and the application of RL/alignment to design models that are engaging. This has allowed us to train models that sound realistic and are effective communicators.
Unify Vector and Relational Search for Smarter, Faster Applications
Weiwei Gong
Today’s data-driven applications demand both semantic understanding and structured intelligence. With native vector search integrated directly into the core SQL engine, organizations can build retrieval-augmented solutions that unify relational and vector data—without adding complexity. Learn how this unified approach delivers high performance, scalability, and simplicity for modern, AI-powered workloads.
Semantic latent variables inference for improved generative modeling
Aaron Courville · Samuel Lavoie
We argue that diffusion models’ success in modeling complex distributions is, for the most part, coming from their input conditioning. This paper investigates the representation used to condition diffusion models from the perspective that ideal representations should improve sample fidelity, be easy to generate, and be compositional to allow out-of-training samples generation. We introduce Discrete Latent Code (DLC), an image representation derived from Simplicial Embeddings trained with a self-supervised learning objective. DLCs are sequences of discrete tokens, as opposed to the standard continuous image embeddings. They are easy to generate and their compositionality enables sampling of novel images beyond the training distribution. Diffusion models trained with DLCs have improved generation fidelity, establishing a new state-of-the-art for unconditional image generation on ImageNet. Additionally, we show that composing DLCs allows the image generator to produce out-of-distribution samples that coherently combine the semantics of images in diverse ways. Finally, we showcase how DLCs can enable text-to-image generation by leveraging large-scale pretrained language models. We efficiently finetune a text diffusion language model to generate DLCs that produce novel samples outside of the image generator training distribution.
Empowering Financial Services with Trustworthy Frontier AI at Wells Fargo
Freddy Lecue · Swarup Pogalur
Discover how Wells Fargo is transforming financial services with trustworthy Frontier AI. This session explores our commitment to responsible innovation, highlighting real-world applications that enhance customer experience, streamline operations, and strengthen risk management. Learn how Wells Fargo leverages Frontier AI while upholding the highest standards of ethics, transparency, and trust.
The AI agent ecosystem is at a crossroads. Unsurprisingly, major players are chasing revenue, but in doing so they seem to have forgotten the lessons of the early internet. Instead of openness, we’re seeing a rush toward lock-in and walled gardens.x000D At Pydantic, we believe the future can be different.x000D x000D We built Pydantic AI as an open, extensible foundation for building applications powered by LLMs, focused on good engineering, clarity, and reliability rather than the next AI hype cycle. Lots of people seemed to like that idea, and Pydantic AI has grown very fast.x000D x000D But can it last? We think so.x000D x000D In this talk, I’ll share why we believe that openness in the agent ecosystem matters more than ever.
How to make your AI models faster, smaller, cheaper, greener?
Bertrand Charpentier
AI models become more complex, the cost of inference—both in terms of computation and energy—continues to rise. In this talk, we will explore how combining compression techniques such as quantization, pruning, caching, and distillation can significantly optimize model performance during inference. By applying these methods, combining compression make possible to reduce model size and computational load while maintaining quality, thus making AI more accessible and environmentally sustainable.