Timezone: »

Workshop
ML For Systems
Benoit Steiner · Jonathan Raiman · Martin Maas · Azade Nova · Mimee Xu · Anna Goldie

Mon Dec 13 08:55 AM -- 06:00 PM (PST) @

ML for Systems is an emerging research area that has shown promising results in the past few years. Recent work has shown that ML can be used to replace heuristics, solve complex optimization problems, and improve modeling and forecasting when applied in the context of computer systems.

As an emerging area, ML for Systems is still in the process of defining the common problems, frameworks and approaches to solving its problems, which requires venues that bring together researchers and practitioners from both the systems and machine learning communities. Past iterations of the workshops focused on providing such a venue and broke new ground on a broad range of emerging new directions in ML for Systems. We want to carry this momentum forward by encouraging the community to explore areas that have previously received less attention. Specifically, the workshop commits to highlighting works that also optimize for security and privacy, as opposed to metrics like speed and memory and use ML to optimize for energy usage and carbon impact. Additionally, this year we will encourage the development of shared methodology, tools, and frameworks.

For the first time since the inception of the workshop, we will organize a competition. This competition will showcase important systems problems, and challenges the ML community to test their methods and algorithms on these problems. Our competition tasks are designed to have a low barrier of entry that attracts newcomers as well as systems veterans.

This setup will allow attendees to meet with top researchers and domain experts, old and new, bridging cutting edge ML research with practical systems design. We hope that providing a prestigious venue for researchers from both fields to meet and interact will result in both fundamental ML research as well as real-world impact to computer systems design and implementation.

 Mon 8:55 a.m. - 6:00 p.m. Poster Session & Hallway Track (gather.town)  link » 🔗 Mon 9:00 a.m. - 9:25 a.m. Opening Remarks (Introduction) Jonathan Raiman · Anna Goldie · Benoit Steiner · Azade Nova · Martin Maas · Mimee Xu 🔗 Mon 9:30 a.m. - 10:05 a.m. Towards instance-optimized data systems (Invited Talk)    Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithms and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other data management tasks. Arguably, the ideas behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, what these techniques will allow us to build are “instance-optimized” systems; systems that self-adjust to a given workload and data distribution to provide unprecedented performance and avoid the need for tuning by an administrator. In this talk, I will first provide an overview of the opportunities and limitations of current ML-enhanced algorithms and data structures, present initial results of SageDB, a first instance-optimized system we are building as part of DSAIL@CSAIL at MIT, and finally outline remaining challenges and future directions. Tim Kraska 🔗 Mon 10:10 a.m. - 10:40 a.m. Accelerating Systems and ML for Science (Invited Talk)    TBD Anima Anandkumar 🔗 Mon 10:45 a.m. - 11:20 a.m. Learning Neurosymbolic Performance Models (Invited Talk)    Computer systems have become increasingly complicated through increased system specialization and heterogeneity designed to meet an increasingly diverse set of system requirements across scale, performance, energy efficiency, reliability, and quality of results. With automated system optimization opportunities being driven by predictive models of system behavior, traditional strategies for manually developing predictive behavioral models have become increasingly more complicated and less precise with growing system complexity. In this talk, I'll present DiffTune, a technique for learning neurosymbolic performance models of modern computer processors. Processor performance models are critical for many computer systems engineering tasks, however, due to the limits on our ability to introspect modern processors, these models must be inferred from behavioral measurements. Our system leverages deep learning to perform differentiable surrogate optimization of a CPU simulator to yield models that predict the performance of programs executed on modern Intel CPUs better than state-of-the-art, handcrafted techniques from LLVM. Our approach demonstrates that behavioral models can be effectively learned from data as well as can be constructed to provide an interpretation of their predictions through behavioral traces grounded in the execution of a simulator. Michael Carbin 🔗 Mon 11:30 a.m. - 1:00 p.m. Lunch Break & Poster Session (gather.town)  link » 🔗 Mon 1:00 p.m. - 1:11 p.m. Towards Intelligent Load Balancing in Data Centers (Spotlight)    Network load balancers (LBs) are important components in data centers (DCs) to provide scalable services. Workload distribution algorithms are based on heuristics (ECMP, WCMP) or naive machine learning (ML) algorithms (ridge regression). Advanced ML-based approaches help achieve performance gain in different networking and system problems. However, it is challenging to apply ML algorithms on networking problems in real-life systems. It requires domain knowledge to collect features from low-latency, high-throughput, and scalable networking systems, which are dynamic and heterogenous. This paper proposes Aquarius to bridge the gap between ML and networking systems and demonstrates its usage in the context of network LBs. This paper demonstrates its ability of conducting both offline data analysis and online model deployment in realistic systems. The results show that the ML model trained and deployed using Aquarius improves load balancing performance yet they also reveals more challenges to be resolved to apply ML for networking systems. Zhiyuan Yao · Thomas Heide Clausen 🔗 Mon 1:11 p.m. - 1:20 p.m. Learning to Combine Instructions in LLVM Compiler (Spotlight)    Instruction combiner (IC) is a critical compiler optimization pass, which replaces a sequence of instructions with an equivalent and optimized instruction sequence at basic block level. There can be thousands of instruction-combining patterns which need to be frequently updated as new coding styles/idioms/applications and novel hardware evolve over time. This makes the IC optimization pass error prone, incurring high maintenance cost. Prior work has shown that IC pass is the buggiest pass in the LLVM (Low Level Virtual Machine) compiler and the third most buggy pass in GCC (GNU Compiler Collection). To mitigate these challenges associated with the traditional IC, we design and implement a Neural Instruction Combiner {NIC}) and demonstrate its feasibility by integrating it into the standard LLVM compiler optimization pipeline. NIC leverages neural Seq2Seq model techniques for generating optimized encoded IR sequence from the unoptimized encoded IR sequence. We show that NIC achieves exact match results percentage of 72% for optimized sequences as compared to traditional IC, demonstrating its feasibility in a production compiler pipeline. sandya mannarswamy · Dibyendu Das 🔗 Mon 1:20 p.m. - 1:30 p.m. Generative Optimization Networks for Memory Efficient Data Generation (Spotlight)    In standard generative deep learning models, such as autoencoders or GANs, the size of the parameter set is proportional to the complexity of the generated data distribution. A significant challenge is to deploy resource-hungry deep learning models in devices with limited memory to prevent system upgrade costs. To combat this, we propose a novel framework called generative optimization networks (GON) that is similar to GANs, but does not use a generator, significantly reducing its memory footprint. GONs use a single discriminator network and run optimization in the input space to generate new data samples, achieving an effective compromise between training time and memory consumption. GONs are most suited for data generation problems in limited memory settings. Here we illustrate their use for the problem of anomaly detection in memory-constrained edge devices arising from attacks or intrusion events. Specifically, we use a GON to calculate a reconstruction-based anomaly score for input time-series windows. Experiments on a Raspberry-Pi testbed with two existing and a new suite of datasets show that our framework gives up to 32% higher detection F1 scores and 58% lower memory consumption, with only 5% higher training overheads compared to the state-of-the-art. Shreshth Tuli · Shikhar Tuli · Giuliano Casale · Nicholas Jennings 🔗 Mon 1:30 p.m. - 1:38 p.m. DeepRNG: Towards Deep Reinforcement Learning-Assisted Generative Testing of Software (Spotlight)    Although machine learning (ML) has been successful in automating various software engineering needs, software testing still remains a highly challenging topic. In this paper, we aim to improve the generative testing of software by directly augmenting the random number generator (RNG) with a deep reinforcement learning (RL) agent using an efficient, automatically extractable state representation of the software under test. Using the Cosmos SDK as the testbed, we show that the proposed DeepRNG framework provides a statistically significant improvement to the testing of the highly complex software library with over 350,000 lines of code. The source code of the DeepRNG framework is publicly available online. Chuan-Yung Tsai · Graham Taylor 🔗 Mon 1:38 p.m. - 1:49 p.m. Interpretability of Machine Learning in Computer Systems: Analyzing a Caching Model (Spotlight)    Machine Learning has been successfully applied in systems applications such as memory prefetching and caching, where learned models have been shown to outperform heuristics. However, the lack of understanding the inner workings of these models -- interpretability -- remains a major obstacle for adoption in real-world deployments. Understanding a model's behavior can help system administrators and developers gain confidence in the model, understand risks, and debug unexpected behavior in production. Interpretability for models used in computer systems poses a particular challenge: Unlike ML models trained on images or text, the input domain (e.g., memory access patterns, program counters) is not immediately interpretable. A major challenge is therefore to explain the model in terms of concepts that are approachable to a human practitioner. By analyzing a state-of-the-art caching model, we provide evidence that the model has learned concepts beyond simple statistics that can be leveraged for explanations. Our work provides a first step towards understanding ML models in systems and highlights both promises and challenges of this emerging research area. Leon Sixt · Evan Liu · Marie Pellat · James Wexler · Milad Hashemi · Been Kim · Martin Maas 🔗 Mon 1:49 p.m. - 2:00 p.m. Automap: Towards Ergonomic Automated Parallelism for ML Models (Spotlight)    The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user workflows. Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding. Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers. Michael Schaarschmidt · Adam Paszke 🔗 Mon 2:00 p.m. - 2:10 p.m. Resource Allocation in Disaggregated Data Centre Systems with Reinforcement Learning (Spotlight)    Resource-disaggregated data centres (RDDC) propose a resource-centric, and high-utilisation architecture for data centres (DC), avoiding resource fragmentation and enabling arbitrarily sized resource pools to be allocated to tasks, rather than server-sized ones. RDDCs typically impose greater demand on the network, requiring more infrastructure and increasing cost and power, so new resource allocation algorithms that co-manage both server and networks resources are essential to ensure that allocation is not bottlenecked by the network, and that requests can be served successfully with minimal networking resources. We apply reinforcement learning (RL) to this problem for the first time and show that an RL policy based on graph neural networks can learn resource allocation policies end-to-end that outperform previous hand-engineered heuristics by up to 22.0\%, 42.6\% and 22.6\% for acceptance ratio, CPU and memory utilisation respectively, maintain performance when scaled up to RDDC topologies with $10^2\times$ more nodes than those seen during training and can achieve comparable performance to the best baselines while using $5.3\times$ less network resources. Zacharaya Shabka · Georgios Zervas 🔗 Mon 2:10 p.m. - 2:22 p.m. Reinforced Workload Distribution Fairness (Spotlight)    Network load balancers (LBs) are one of the key components in data centers (DCs). They distribute workloads across multiple servers and help offer scalable services. However, operating in dynamic network environments with limited observations, modern LBs rely on heuristic algorithms and require manual configurations for fairness optimization. As reinforcement learning (RL) helps achieve performance gains in dynamic systems, this paper proposes a distributed asynchronous RL mechanism to improve LBs’ workload distribution fairness with limited observations. The performance of proposed mechanism is evaluated and compared with state-of-the-art LB algorithms in a simulator, under configurations with progressively increasing difficulties. Preliminary results show promise in RL-based LB algorithms, and cast light on more challenges for future research, including reward function design and model scalability. Zhiyuan Yao · Zihan Ding · Thomas Heide Clausen 🔗 Mon 2:22 p.m. - 2:32 p.m. Community Infrastructure for Applying Reinforcement Learning to Compiler Optimizations (Spotlight)    Interest in applying Reinforcement Learning (RL) techniques to compiler optimizations is increasing rapidly, but compiler research has a high entry barrier. Unlike in other domains, compiler and RL researchers do not have access to the infrastructure and datasets that enable fast iteration and development of ideas, and getting started requires a significant engineering investment. We present CompilerGym, a community infrastructure for exposing compiler optimizations as RL environments, and initial results in applying RL to these environments. Our findings suggest two key challenges in RL for compilers is representation learning and transfer learning between program domains. Chris Cummins · Bram Wasti · Brandon Cui · Olivier Teytaud · Benoit Steiner · Yuandong Tian · Hugh Leather 🔗 Mon 2:32 p.m. - 2:42 p.m. Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization (Spotlight)    Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads by deploying different sets of bit-width precisions on separate compute operations. Recent research has shown significant progress in applying mixed-precision quantization techniques to reduce the memory footprint of various workloads, while also preserving task performance. Prior work, however, has often ignored additional objectives, such as bit-operations, that are important for deployment of workloads on hardware. Here we present a flexible and scalable framework for automated mixed-precision quantization that optimizes multiple objectives. Our framework relies on Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search method, to find Pareto optimal mixed-precision configurations for memory and bit-operations objectives. Within NEMO, a population is divided into structurally distinct sub-populations (species) which jointly form the Pareto frontier of solutions for the multi-objective problem. At each generation, species are re-sized in proportion to the goodness of their contribution to the Pareto frontier. This allows NEMO to leverage established search techniques and neuroevolution methods to continually improve the goodness of the Pareto frontier. In our experiments we apply a graph-based representation to describe the underlying workload, enabling us to deploy graph neural networks trained by NEMO to find Pareto optimal configurations for various workloads trained on ImageNet. Compared to the state-of-the-art, we achieve competitive results on memory compression and superior results for compute compression for MobileNet-V2, ResNet50 and ResNeXt-101-32x8d, one of the largest ImageNet models amounting to a search space of ~10**146. A deeper analysis of the results obtained by NEMO also shows that both the graph representation and the species-based approach are critical in finding effective configurations for all workloads. Santiago Miret · Vui Seng Chua · Mattias Marder · Mariano Phielipp · Nilesh Jain · Somdeb Majumdar 🔗 Mon 2:42 p.m. - 2:51 p.m. Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update (Spotlight)    Representing DNNs with low-precision numbers is a promising approach that enables the efficient acceleration of large-scale deep neural networks (DNNs). However, previous methods typically keep a copy of weights in high precision for weight updates during training. Directly training over low-precision weights still remains an unsolved problem because of the complex interactions between low-precision number systems and the underlying learning algorithms. To address this problem, we develop a low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update training method (Madam). LNS-Madam yields low quantization error during weight update, leading to a stable convergence even if the precision is limited. By replacing SGD or Adam with the Madam optimizer, training under LNS requires less weight precision during the updates while preserving the state-of-the-art prediction accuracy. Jiawei Zhao · Steve Dai · Rangha Venkatesan · Brian Zimmer · Mustafa Ali · Ming-Yu Liu · Brucek Khailany · · Anima Anandkumar 🔗 Mon 2:51 p.m. - 3:00 p.m. Data-Driven Offline Optimization for Architecting Hardware Accelerators (Spotlight)  link » Aviral Kumar · Amir Yazdanbakhsh · Milad Hashemi · Kevin Swersky · Sergey Levine 🔗 Mon 3:00 p.m. - 3:07 p.m. Achieving Low Complexity Neural Decoders via Iterative Pruning (Spotlight)    The advancement of deep learning has led to the development of neural decoders for low latency communications. However, neural decoders can be very complex which can lead to increased computation and latency. We consider iterative pruning approaches (such as the lottery ticket hypothesis algorithm) to prune weights in neural decoders. Decoders with fewer number of weights can have lower latency and lower complexity while retaining the accuracy of the original model. This will make neural decoders more suitable for mobile and other edge devices with limited computational power. We also propose semi-soft decision decoding for neural decoders which can be used to improve the bit error rate performance of the pruned network. Vikrant Malik · Rohan Ghosh · Mehul Motani 🔗 Mon 3:10 p.m. - 3:50 p.m. Gather.town Q&A with Speakers of Contributed Talks (gather.town)  link » 🔗 Mon 3:50 p.m. - 4:25 p.m. Learned Compiler Optimizations (Invited Talk)    TBD Luis Ceze 🔗 Mon 4:30 p.m. - 5:00 p.m. ML for Autotuning Production ML Compilers (Invited Talk)    Search-based techniques have been demonstrated effective in solving complex optimization problems that arise in domain-specific compilers for machine learning (ML). Unfortunately, deploying such techniques in production compilers is impeded by several limitations. In this talk, I will present an autotuner for production ML compilers that can tune both graph-level and subgraph-level optimizations at multiple compilation stages. The autotuner applies a flexible search methodology that defines a search formulation for joint optimizations by accurately modeling the interactions between different compiler passes. The autotuner tunes tensor layouts, operator fusion decisions, tile sizes, and code generation parameters in XLA, a production ML compiler, using various search strategies. We demonstrate how to incorporate machine learning techniques such as a learned cost model and various learning-based search strategies to reduce autotuning time. Our learned cost model has high accuracy and outperforms a heavily-optimized analytical performance model. In an evaluation across 150 ML training and inference models on Tensor Processing Units (TPUs), the autotuner offers up to 2.4x and an average 5% runtime speedup over the heavily-optimized XLA compiler. The autotuner has been deployed to automatically tune the most heavily-used production models in Google’s fleet everyday. Phitchaya Phothilimtha 🔗 Mon 5:10 p.m. - 5:40 p.m. ML-guided iterative refinement for system optimization (Invited Talk)    Leveraging machine learning for system optimization can relieve researchers of designing manual heuristics, a time-consuming procedure. In this talk, we mainly discuss data-driven iterative refinement that models optimization as a sequential decision process: an initial solution to the optimization problem is iteratively improved until convergence. Each refinement step is controlled by a ML model learned from previous optimization trials, or data collected so far in this trial. We then introduce two examples in ML system, Coda and N-Bref, that de-compile assembly codes back to its source code. In both cases, first a coarse source program is proposed, and then refined by learned models to match the assembly. These approaches show strong performance compared to existing de-compilation tools that rely upon human heuristics and domain knowledge. Yuandong Tian 🔗 Mon 5:40 p.m. - 5:55 p.m. Closing Remarks (Outro) Jonathan Raiman · Mimee Xu · Martin Maas · Anna Goldie · Azade Nova · Benoit Steiner 🔗