Toward Industrial Artificial Intelligence
Technological change typically occurs in three phases: basic research, scale-up, and industrial application, each with a different degree of methodological diversity—high, low, and medium, respectively. Historically, breakthroughs such as the steam engine and the Haber-Bosch process exemplify these phases and have had a profound impact on society. A similar pattern can be observed in the development of modern artificial intelligence (AI). In the scale-up phase of AI, large language models (LLMs) have emerged as the most prominent example. While LLMs can be seen as highly sophisticated knowledge representation techniques, they have not fundamentally advanced AI itself. The upscaling phase of AI was dominated by the transformer architecture. More recently, other architectures, such as state-space models and recurrent neural networks, have also been scaled up. For example, Long Short-Term Memory (LSTM) networks have been scaled up to xLSTM, which in many cases outperform transformers. We are now transitioning into the third phase: industrial AI. In this phase, we are adapting AI methods to real-world applications in robotics, life and earth sciences, engineering, or large-scale simulations that can be dramatically accelerated by AI methods. As we continue to develop these industrial AI methods, we expect to see an increase in methodological diversity, allowing us to overcome what has been called the "bitter lesson" of scaling up.
Queer in AI
Creative AI Session 3
Address the common bottlenecks AI developers face in training and deploying workloads, and how flexible GPU cloud solutions can help mitigate these issues, fostering faster innovation and reduced time to market for AI products.
Industrial Applications of Distributional Preference Alignment of LLMs via Optimal Transport
In this talk, we will present an industry-accessible version of a NeurIPS 2024 main track paper "Distributional Preference Alignment of LLMs via Optimal Transport." In addition to presenting the theory and method, we will explain how the algorithm has been merged into the Hugging Face TRL library and used in industrial LLM alignment workflows.
Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the stochastic dominance of the reward distribution of the positive samples on the reward distribution of the negative samples. We analyze the sample complexity of AOT by considering the dual of the OT problem and show that it converges at the parametric rate. Empirically, we show on a diverse set of alignment datasets and LLMs that AOT leads to state-of-the-art models in the 7B family of models when evaluated with Open LLM Benchmarks and AlpacaEval.
Streamlining Computer Vision Data Annotation with Segment Anything 2.1
This workshop aims to explore the use of Meta’s Segment Anything Model 2.1 for efficient and precise data annotation in specialized computer vision domains. The primary objective is to enhance the segmentation and object tracking processes in video datasets by leveraging domain-specific adaptations of SAM 2.1, reducing the manual effort traditionally required for such tasks.
With the increasing demand for domain-adapted computer vision models—whether in medical imaging, environmental monitoring, or other niche areas—the need to optimize annotation workflows has become crucial. SAM 2.1 presents a flexible base model for segmentation and tracking tasks, and fine-tuning it for specific, nuanced domains can improve segmentation accuracy, particularly when applied to highly specialized or hard-to-segment objects in videos.
In this workshop, we will showcase: (1) methods to fine-tune SAM 2.1 using specific domain datasets, (2) techniques for evaluating fine-tuned model performance, and (3) mechanisms for integrating fine-tuned SAM models into real-world annotation pipelines. We aim to empower researchers and practitioners to scale their computer vision research, from object detection to activity recognition, by significantly reducing the time and resources required for manual data labeling. Additionally, we will discuss the benefits of combining automated and human-in-the-loop approaches for enhanced labeling performance in dynamic video datasets.
Ultimately, this workshop seeks to bridge the gap between general-purpose segmentation models and specialized computer vision applications, providing practical solutions for researchers dealing with complex video data. We are committed to fostering an inclusive environment with broad representation across research areas, regions, and industries, encouraging collaboration and knowledge sharing to push the boundaries of computer vision annotation technology.
This presentation provides an overview of the Open-Sora v1.2 for video generation and provide an easy way on HPC-AI.COM platform to fine-tuning this model. Open-Sora is an innovative open-source project that redefines the fine-tuning process for video-generation large models, empowering creators across various fields to achieve unprecedented levels of customization and efficiency. It enables the adaptation of video-generation models to meet any need, including diverse styles and specialized subdomains. By offering a comprehensive solution that includes data preprocessing, advanced training workflows, and detailed model checkpoints, Open-Sora streamlines the fine-tuning process, making it both efficient and accessible. This groundbreaking platform allows users to tailor state-of-the-art text-to-video generation models for a wide range of applications, setting a new standard for flexibility and innovation in video-generation technology.
Hands-On AI for Everyone: Using Drones, Jumping Jacks, Gestures, and Skincare Product Labels
Artificial intelligence (AI) is all around us, yet it often seems enigmatic and challenging to master. To bridge this gap, we developed a series of Hands-On AI workshops by making AI creation, testing, and deployment accessible to everyone through interactive and engaging sessions.
BRING YOUR LAPTOP.
In our featured workshop, "Farm-to-Plate AI," we delve into the agricultural domain. In a simulation environment, the participants fly a drone mounted with LiDAR (Light Detection and Ranging) to survey mango orchards while avoiding collisions. Participants use object detection algorithms to count fruit. They train regression models to evaluate ripeness of fruits from hyperspectral images, gaining insights into the farm-to-table journey and AI’s impact on food production with hands-on exercises.
Workshops in the series include “Pocket AI and IoT: Turn Your Phone into a Smart Fitness Tracker,” “Beyond the Label: AI Techniques for Healthier Personal Care Choices,” “Do You See What I See?” “Catching Fire: Autonomous Drones to Track and Detect Wildfires,” and “Mars Rover.”
Designed for a diverse audience ranging from middle schoolers to retirees from technical and non-technical fields, the workshops are ideal for community outreach. Participants only need a laptop with a web browser, a stable Wi-Fi connection, a free MathWorks account, and optionally a smartphone. These workshops have been successfully conducted across continents, and are available in multiple languages, ensuring broad and inclusive reach.
We will share insights and reflections on designing sessions that initiate important conversations about AI, such as addressing bias, understanding model cards, and verifying AI models, ensuring that participants not only learn technical skills in fun ways but also engage with the ethical and societal implications of AI.
As we continue to expand this series, we underscore the transformative power of experiential education in making cutting-edge technology accessible to all.
Post-training LLMs is a critical step to make LLMs follow instructions, align with human values, reduce hallucinations, etc,. Besides standard Supervised Finetuning (SFT), Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF) are the commonly used methods for post-training. Also, the test time scaling has become increasingly popular. In this active training, we present a comprehensive introduction of various post-training methods, how they are implemented in practice and a high level overview of inference-time scaling methods. Through this training, attendees will gain a basic understanding of i) the necessity and formal problem formulation of post-training; ii) commonly-used post-training methods and their theory. iii) a live demo that shows how to use existing training infrastructures to build post-training pipelines. iv) a live demo that shows how to use Monte-Carlo Tree Search (MCTS) to boost inference-time performance.
Deep tabular data
Deep learning has made remarkable progress in recent years. These advances have been mostly in problems of unstructured data such as natural language processing, and computer vision. In contrast, most machine learning problems involve highly structured tabular data. This is true in many industries, including the financial industry. For tabular data, tree based machine learning methods, such as XGBoost are widely perceived as state of the art. Do the recent advances in deep learning have the opportunity to surpass the capabilities of tree based methods? We will review recent research on this topic, share what are some learnings from the various ideas that have been tried out, and point out what are the things that deep learning methods will be able to do that tree based methods do not provide us.
As an empirical field, AI and ML research relies on a foundation of evaluation – it is critical that observers can assess and compare the results of different approaches and come to reliable conclusions about their performance and effectiveness. Indeed, evaluation has never been more important than it is today, given the rapid rise and acceleration in progress for generative models, LLMs, and related methods. However, the problem of evaluation in this domain is far from trivial. There are high level issues around definitions of ground truth and assessing correctness and context, logistical issues around cost and reliability, theoretical issues around defining an appropriate evaluation distribution of tasks, organizational issues of which entities can be trusted to perform evaluations without undue influence, and practical issues as researchers and developers struggle reconcile a myriad of reported benchmarks and metrics. On top of this, we recall Feynman’s famous dictum that the most important thing in any science is “not to fool yourself – and you are the easiest person to fool.” It is all too easy to encounter issues of contamination and leakage that can invalidate results. In this talk, we take a tour through current approaches to addressing these many complexities, and offer thoughts on ways forward for the field. We also share experience from Kaggle in ways that broad community efforts such as competitions can help in this domain. In particular, we describe methods that have been developed to help make competitions resistant to cheating from bad actors, and how they are also of significant value in helping ensure that benchmarks and evaluations are set up to help researchers avoid fooling ourselves.
EUREKA: Evaluating and Understanding Large Foundation Models
Rigorous and reproducible evaluation of large foundation models is critical for assessing the state of the art, informing next steps in model improvement, and for guiding scientific advances in Artificial Intelligence. The evaluation process has however become challenging in practice due to several reasons that require immediate attention from the community, including benchmark saturation, lack of transparency in the methods being deployed for measurement, development challenges in extracting the right measurements for generative tasks, and, more generally, the extensive number of capabilities that need to be considered for showing a well-rounded comparison across models.
This session will provide an introduction to Eureka as an evaluation framework and accompanying insights. First, we will present Eureka, a reusable and open evaluation framework for standardizing evaluations of large foundation models beyond single-score reporting and rankings. Next, we will introduce Eureka-Bench as an extensible collection of benchmarks testing capabilities that (i) are still challenging for state-of-the-art foundation models and (ii) represent fundamental but overlooked capabilities for completing tasks in both language and vision modalities. Finally, we will present insights from an analysis of 12 state-of-the-art models. Such insights uncover granular weaknesses of models for a given capability and can then be further leveraged to plan more precisely on what areas are most promising for improvement. Eureka is available as open-source to foster transparent and reproducible evaluation practices.
Blog: https://aka.ms/eureka-ml-insights-blog
Github repository: https://github.com/microsoft/eureka-ml-insights
Website: https://microsoft.github.io/eureka-ml-insights
Innovations in number systems, such as logarithmic math, and their co-designed hardware can accelerate AI adoption. We explore practical hardware implementations and provide quantitative examples at both the operation and system levels.
Impact of Inference System Design on AI Adoption: How system design affects trust in AI outputs, cost, and user experience.
Trends in Low-Precision Data Types: Why low-precision data types are so effective in improving AI compute cost, and a review of common types in AI models.
Logarithmic Math as an Alternative: Presenting our research on logarithmic number systems, which replace multiplications with additions, reducing chip area and power by ~4x. We address challenges in mapping from logarithmic to linear space in multiply-accumulate operations and compare approaches like LUTs, Taylor Series, and the Mitchell Approximation regarding accuracy, feasibility, and efficiency.
Co-Designing Logarithmic Math and AI Hardware: Introducing an improvement to the Mitchell Approximation that renders logarithmic math Pareto-optimal in power vs. precision, ideal for large multimodal models. We demonstrate enhanced trust and UX over traditional linear math, showing quantitative results for large models with sub-0.1% accuracy losses compared to baseline IEEE 32/16-bit models, while maintaining low costs and power consumption comparable to 4-bit precision on floating-point hardware.
Finally, reduced chip area and power offer secondary benefits in silicon design, such as more flexibility in datapath design and more generic compute for better utilization, leading to even lower AI inference costs. We also share our experience in co-designing SW and chips, emphasizing the importance of discipline integration.
With seven years of foundational innovations and proven hardware systems in logarithmic math, we are pioneers in this field and eager to share our insights.
Creative AI Session 4
MegaBeam-Mistral-7B: Advancing Long-Context Processing for Real-World AI Applications
This presentation discusses MegaBeam-Mistral-7B (MegaBeam), an open-source long-context LLM released by AWS. MegaBeam demonstrates the importance of long-context processing in LLMs for various downstream applications, including retrieval-augmented generation, extended conversation tracking and recommendations, multi-document analysis, multi-modal understanding.
Global AI Research and Open Source Contributions:
- Overview of MegaBeam-Mistral-7B-512k, supporting half a million tokens
- >57,000 HuggingFace downloads
- Comparison with long-context benchmarks, including Nvidia's RULER leaderboard and Princeton/Intel's application-focused benchmarks
- Commitment to open-source AI, releasing MegaBeam under Apache 2.0 license
Practical Challenges in AI Deployment:
- Insights into continual pre-training and supervised fine-tuning MegaBeam using Amazon SageMaker
- Overcoming computational and data challenges in developing long-context LLMs
- Efficient inference and deployment of long-context LLM models
Real-World Implementation and Industry Use:
- MegaBeam's application in comprehending entire Git repositories for coding tasks (recall and debugging)
- Long-context processing enabling effective multi-document analysis and multi-modality understanding
- Integrating long-context LLMs into existing AI pipelines
Technical Insights for Practitioners:
- Advanced techniques like RingAttention and FlashAttention on both PyTorch and JAX
- Position encoding, length generalization, and ring attention implementation
- Data engineering (data distribution and synthesis) tailored for long-context training
- Evaluation methodologies for long-context LLMs
Industry Perspective and Thought Leadership:
- Lessons from our open-source long-context LLMs series
- Future directions in long-context processing and industry impact
In this breakout session, we share best practices for harnessing the power of distributed training with PyTorch to accelerate model development and fully utilize GPU clusters.
The session represents lessons learned from across a diverse range of distributed training runs and ultimately shows how to train a 405B sized model using pure PyTorch.
Highlighted best practices include: -Scaling your training code from single GPU to multi node -Diagnostic techniques for quickly identifying cluster issues and freezes during training -Sharding large models with PyTorch FSDP
This presentation delves into the potential of generative AI to revolutionize content moderation. We'll explore how AI can analyze text, images, and videos to flag problematic content with greater speed and accuracy than human moderators. But can GenAI truly understand the nuances of human language and context? And what are the ethical implications of entrusting machines with such power?
Learnings From Teams Training Large-Scale Models: Challenges and Solutions for Monitoring at Hyperscale
This talk delves into insights from teams monitoring large-scale model training, focusing on reproducibility, transparency, and efficiency, aligning with NeurIPS' emphasis on practical challenges and actionable insights.
Key Points:
Managing and Visualizing Data
- Challenges: Handling vast data during large-scale training.
- Solutions: Robust data management and visualization tools to monitor training progress and performance metrics.
Efficient Resource Utilization
- Challenges: High computational resources for training large models.
- Solutions: Real-time resource monitoring, minimizing job failures, efficiently restarting failed jobs, terminating unpromising experiments early, and forking promising ones.
Reproducibility and Transparency
- Challenges: Ensuring reproducibility to validate results and build trust.
- Solutions: Version control for datasets, code, and model configurations.
Best Practices
- Documentation: Detailed records for each experiment.
- Automation: Streamlining experiment tracking with tools like Jenkins or GitHub Actions.
Case Studies
- Industry Applications: Insights from customers, users, and the AI research community, showcasing successful large-scale experiment tracking.
Interactive Elements: Live demonstrations of tracking tools and techniques.
Audience Takeaways: Attendees will learn innovative techniques for managing large-scale model training, best practices for reproducibility and transparency, and strategies for efficient resource utilization, applicable to their AI/ML projects.
Q&A Session: An interactive session to address audience questions and discuss practical implementations.
How to Break Into an Industry Research Lab and Know Your Market Value
"When academics transition into industry they encounter a huge information asymmetry that makes a transition significantly more difficult and stressful than it needs to be.
These are unknown unknown for new grads; it’s ultimately a problem of access to the unspoken rules and experience of industry, not intelligence or effort.
Rora is hosting a panel of researchers in industry who will educate the audience on the best practices for preparing for interviews and succeeding in the first year."
Nonprofits Bridging Tech and Social Impact: A NeurIPS Social with Wikimedia and Common Crawl
Explore the intersections between nonprofits and tech. This session will feature the Wikimedia and Common Crawl Foundations, offering an opportunity to connect with nonprofits committed to using technology for social missions. The event will begin with presentations from both organizations, highlighting their goals, projects, research, and challenges facing the open community. Following the presentations, the session will transition into roundtables focused on current initiatives and Q&A.
Breaking Silos: Open Community for AI × Science
AI infused into scientific discovery is revolutionizing how research is conducted across disciplines such as materials science, chemistry, biology, and physics leading to early success of groundbreaking discoveries. The complexity and interdisciplinary nature of impactful AI for Science research requires open collaboration across disciplines, institutions, and borders to be fully realized. In this social, we aim to connect researchers passionate about AI for scientific discovery.
Anime & AI
Come to our space to discuss and learn about all the advancements in AI that relate to and enhance our enjoyment of the art of anime! We will have some prizes, refreshments, and mini-games for all to enjoy!