Timezone: »
Recent years have seen rapid progress in metalearning methods, which transfer knowledge across tasks and domains to efficiently learn new tasks, optimize the learning process itself, and even generate new learning methods from scratch. Metalearning can be seen as the logical conclusion of the arc that machine learning has undergone in the last decade, from learning classifiers, to learning representations, and finally to learning algorithms that themselves acquire representations, classifiers, and policies for acting in environments. In practice, metalearning has been shown to yield new stateoftheart automated machine learning methods, novel deep learning architectures, and substantially improved oneshot learning systems. Moreover, improving one’s own learning capabilities through experience can also be viewed as a hallmark of intelligent beings, and neuroscience shows a strong connection between human and reward learning and the growing subfield of metareinforcement learning.
Some of the fundamental questions that this workshop aims to address are:
 What are the metalearning processes in nature (e.g., in humans), and how can we take inspiration from them?
 What is the relationship between metalearning, continual learning, and transfer learning?
 What interactions exist between metalearning and large pretrained / foundation models?
 What principles can we learn from metalearning to help us design the next generation of learning systems?
 What kind of theoretical principles can we develop for metalearning?
 How can we exploit our domain knowledge to effectively guide the metalearning process and make it more efficient?
 How can we design better benchmarks for different metalearning scenarios?
As prospective participants, we primarily target machine learning researchers interested in the questions and foci outlined above. Specific target communities within machine learning include, but are not limited to: metalearning, AutoML, reinforcement learning, deep learning, optimization, evolutionary computation, and Bayesian optimization. We also invite submissions from researchers who study human learning and neuroscience, to provide a broad and interdisciplinary perspective to the attendees.
Fri 7:00 a.m.  7:10 a.m.

Opening remarks
SlidesLive Video » 
🔗 
Fri 7:10 a.m.  7:40 a.m.

Invited talk: Mengye Ren
(
Invited talk
)
SlidesLive Video » 
🔗 
Fri 7:40 a.m.  8:10 a.m.

Invited talk: Lucas Beyer
(
Invited talk
)
SlidesLive Video » 
🔗 
Fri 8:10 a.m.  8:25 a.m.

Contributed Talk 1: FiT: Parameter Efficient Fewshot Transfer Learning
(
Contributed Talk
)
SlidesLive Video » 
🔗 
Fri 8:25 a.m.  8:40 a.m.

Break

🔗 
Fri 8:40 a.m.  9:40 a.m.

Poster session 1 ( poster session ) link »  🔗 
Fri 9:40 a.m.  9:55 a.m.

Contributed talk 2: Optimistic MetaGradients
(
contributed talk
)
SlidesLive Video » 
🔗 
Fri 9:55 a.m.  10:25 a.m.

Invited talk: Elena Gribovskaya
(
invited talk
)
SlidesLive Video » 
🔗 
Fri 10:25 a.m.  12:00 p.m.

Lunch break

🔗 
Fri 12:00 p.m.  12:30 p.m.

Invited talk: Chelsea Finn
(
invited talk
)
SlidesLive Video » 
🔗 
Fri 12:30 p.m.  1:00 p.m.

Invited talk: Greg Yang
(
invited talk
)
SlidesLive Video » 
🔗 
Fri 1:00 p.m.  1:15 p.m.

Contributed talk 3: The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence
(
contributed talk
)
SlidesLive Video » 
🔗 
Fri 1:15 p.m.  2:15 p.m.

Poster session 2
(
poster session
)

🔗 
Fri 2:15 p.m.  2:30 p.m.

Contributed talk 4: HyperSound: Generating Implicit Neural Representations of Audio Signals with Hypernetworks
(
contributed talk
)
SlidesLive Video » 
🔗 
Fri 2:30 p.m.  3:00 p.m.

Invited talk: Percy Liang
(
invited talk
)
SlidesLive Video » 
🔗 
Fri 3:00 p.m.  3:50 p.m.

Discussion panel
(
discussion panel
)
SlidesLive Video » 
🔗 
Fri 3:50 p.m.  4:00 p.m.

Closing remarks

🔗 


LOTUS: Learning to learn with Optimal Transport in Unsupervised Scenarios
(
Poster
)
link »
SlidesLive Video » Automated machine learning has been widely researched and adopted for supervised tasks such as classification and regression. Unsupervised scenarios, lacking a ground truth to optimize on, are much harder to automate. We propose a novel zeroshot metalearning approach that recommends which algorithms and hyperparameters to use on new unsupervised tasks by learning from prior supervised proxy datasets. Our premise is that the selection of optimal unsupervised algorithms depends on the inherent properties of the data distribution. We first build a large metadataset evaluating many algorithms and hyperparameter settings on prior datasets, leverage optimal transport to find the prior datasets with the most similar underlying distribution, and then recommend the (tuned) algorithm that proved to work best for that data distribution. We evaluate the robustness of our approach on one particular task, i.e. outlier detection, and find that it outperforms state of the art methods in unsupervised outlier detection. 
prabhant singh · Joaquin Vanschoren 🔗 


Testtime adaptation with slotcentric models
(
Poster
)
link »
SlidesLive Video » We consider the problem of segmenting scenes into constituent objects and their parts. Current supervised visual detectors, though impressive within their training distribution, often fail to segment outofdistribution scenes into their constituent entities. Recent testtime adaptation methods use auxiliary selfsupervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses can be insufficient for instance segmentation tasks, without also considering architectural inductive biases. For image segmentation, recent slotcentric generative models break such dependence on supervision by attempting to segment scenes into entities in a selfsupervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Generating Fast and Slow Networks (GFSNets), a semisupervised instance segmentation model equipped with a slotcentric image rendering component that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives. We show that testtime adaptation greatly improves segmentation in outofdistribution scenes. We evaluate GFSNets in scene segmentation benchmarks and show substantial outofdistribution performance improvements against stateoftheart supervised feed forward detectors and selfsupervised domain adaptation models. 
Mihir Prabhudesai · Sujoy Paul · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Anirudh Goyal · Deepak Pathak · Katerina Fragkiadaki · Gaurav Aggarwal · Thomas Kipf 🔗 


MetaLearning Makes a Better Multimodal Fewshot Learner
(
Poster
)
link »
SlidesLive Video » Multimodal fewshot learning is challenging due to the large domain gap between vision and language modalities. As an effort to bridge this gap, we introduce a metalearning approach for multimodal fewshot learning, to leverage its strong ability of accruing knowledge across tasks. The full model is based on frozen foundation vision and language models to use their already learned capacity. To translate the visual features into the latent space of the language model, we introduce a lightweight metamapper, acting as a metalearner. By updating only the parameters of the metamapper, our model learns to quickly adapt to unseen samples with only a few gradient updates. Unlike prior multimodal fewshot learners, which need a handengineered task induction, our model is able to induce the task in a completely datadriven manner. The experiments on recent multimodal fewshot benchmarks demonstrate that our metalearning approach yields better multimodal fewshot learners while being computationally more efficient compared to its counterparts. 
Ivona Najdenkoska · Xiantong Zhen · Marcel Worring 🔗 


Efficient Bayesian Learning Curve Extrapolation using PriorData Fitted Networks
(
Poster
)
link »
SlidesLive Video » Learning curve extrapolation aims to predict model performance in later epochs of a machine learning training, based on the performance in the first k epochs. In this work, we argue that, while the varying difficulty of extrapolating learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of priordata fitted neural networks (PFNs) in this context. PFNs use a transformer, pretrained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We present preliminary results, demonstrating that PFNs can more accurately approximate the posterior predictive distribution multiple orders of magnitude faster than MCMC, as well as obtain a lower average error predicting final accuracy obtained by real learning curve data from LCBench. 
Steven Adriaensen · Herilalaina Rakotoarison · Samuel Müller · Frank Hutter 🔗 


Adversarial Cheap Talk
(
Poster
)
link »
Adversarial attacks in reinforcement learning (RL) often assume highlyprivileged access to the victim’s parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim’s observation, resulting in a minimal range of influence. The Adversary cannot occlude ground truth, influence underlying environment dynamics or reward signals, introduce nonstationarity, add stochasticity, see the Victim’s actions, or access their parameters. Additionally, we present a simple metalearning algorithm called Adversarial Cheap Talk (ACT) to train Adversaries in this setting. We demonstrate that an Adversary trained with ACT can still significantly influence the Victim’s training and testing performance, despite the highly constrained setting. Affecting traintime performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner’s function approximation, or instead helping the Victim’s performance by outputting useful features. Finally, we show that an ACT Adversary can manipulate messages during traintime to directly and arbitrarily control the Victim at testtime. 
Chris Lu · Timon Willi · Alistair Letcher · Jakob Foerster 🔗 


Achieving a Better StabilityPlasticity Tradeoff via Auxiliary Networks in Continual Learning
(
Poster
)
link »
SlidesLive Video » In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances drop dramatically after being optimized for a new task. Since then, the continual learning community has proposed several solutions aiming to equip the neural network with the ability to learn the current task (plasticity) while still achieving high accuracy on the old tasks (stability). Despite remarkable improvements, the plasticitystability tradeoff is still far from being solved, and its underlying mechanism is poorly understood. In this work, we propose Auxiliary Network Continual Learning (ANCL), a new method that combines the continually learned model with an additional auxiliary network that is solely optimized on the new task. More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability, surpassing strong baselines on CIFAR100. By analyzing the solutions of several continual learning methods based on the socalled mode connectivity assumption, we propose a new hyperparamter's search technique which dynamically adjust the regularization parameter to achieve better stabilityplasticity tradeoff. 
Sanghwan Kim · Lorenzo Noci · Antonio Orvieto · Thomas Hofmann 🔗 


Optimistic MetaGradients
(
Poster
)
link »
We study the connection between gradientbased metalearning and convex optimisation. We observe that gradient descent with momentum is as a special case of metagradients, and building on recent results in optimisation, we prove convergence rates for metalearning in the single task setting. While a metalearned update rule can yield faster convergence up to constant factor,it is not sufficient for acceleration. Instead, some form of optimism is required. We show that optimism in metalearning can be captured through the recently proposed Bootstrapped MetaGradient method, providing deeper insight into its underlying mechanics. 
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh 🔗 


Transfer NAS with Metalearned Bayesian Surrogates
(
Poster
)
link »
SlidesLive Video » While neural architecture search (NAS) is an intenselyresearched area, approaches typically still suffer from either (i) high computational costs or (ii) lack of robustness across datasets and experiments. Furthermore, most methods start searching for an optimal architecture from scratch, ignoring prior knowledge. This is in contrast to the manual design process by researchers and engineers that leverage previous deep learning experiences by, e.g., transferring architectures from previously solved, related problems.We propose to adopt this human design strategy and introduce a novel surrogate for NAS, that is metalearned across prior architecture evaluations across different datasets. We utilize Bayesian Optimization (BO) with deepkernel Gaussian Processes, graph neural networks for the architecture embeddings and a transformerbased set encoder of datasets. As a result, our method consistently achieves stateoftheart results on six computer vision datasets, while being as fast as oneshot NAS methods. 
Gresa Shala · Thomas Elsken · Frank Hutter · Josif Grabocka 🔗 


GrayBox Gaussian Processes for Automated Reinforcement Learning
(
Poster
)
link »
SlidesLive Video » Despite having achieved spectacular milestones in an array of important realworld applications, most Reinforcement Learning (RL) methods are very brittle concerning their hyperparameters. Notwithstanding the crucial importance of setting the hyperparameters in training stateoftheart agents, the task of hyperparameter optimization (HPO) in RL is understudied. In this paper, we propose a novel graybox Bayesian Optimization technique for HPO in RL, that enriches Gaussian Processes with reward curve estimations based on generalized logistic functions. We thus about the performance of learning algorithms, transferring information across configurations and about epochs of the learning algorithm. In a very largescale experimental protocol, comprising 5 popular RL methods (DDPG, A2C, PPO, SAC, TD3), 22 environments (OpenAI Gym: Mujoco, Atari, Classic Control), and 7 HPO baselines, we demonstrate that our method significantly outperforms current HPO practices in RL. 
Gresa Shala · André Biedenkapp · Frank Hutter · Josif Grabocka 🔗 


AutoRLBench 1.0
(
Poster
)
link »
SlidesLive Video » It is well established that Reinforcement Learning (RL) is very brittle and sensitive to the choice of hyperparameters. This prevents RL methods from being usable out of the box.The field of automated RL (AutoRL) aims at automatically configuring the RL pipeline, to both make RL usable by a broader audience, as well as reveal its full potential. Still, there has been little progress towards this goal as new AutoRL methods often are evaluated with incompatible experimental protocols.Furthermore, the typically high cost of experimentation prevents a thorough and meaningful comparison of different AutoRL methods or established hyperparameter optimization (HPO) methods from the automated Machine Learning (AutoML) community.To alleviate these issues, we propose the first tabular AutoRL Benchmark for studying the hyperparameters of RL algorithms. We consider the hyperparameter search spaces of five well established RL methods (PPO, DDPG, A2C, SAC, TD3) across 22 environments for which we compute and provide the reward curves. This enables HPO methods to simply query our benchmark as a lookup table, instead of actually training agents. Thus, our benchmark offers a testbed for very fast, fair, and reproducible experimental protocols for comparing future blackbox, graybox, and online HPO methods for RL. 
Gresa Shala · Sebastian Pineda Arango · André Biedenkapp · Frank Hutter · Josif Grabocka 🔗 


PersAFL: Personalized Asynchronous Federated Learning
(
Poster
)
link »
We study the personalized federated learning problem under asynchronous updates. In this problem, each client seeks to obtain a personalized model that simultaneously outperforms local and global models. We consider two optimizationbased frameworks for personalization: (i) ModelAgnostic MetaLearning (MAML) and (ii) Moreau Envelope (ME). MAML involves learning a joint model adapted for each client through finetuning, whereas ME requires a bilevel optimization problem with implicit gradients to enforce personalization via regularized losses. We focus on improving the scalability of personalized federated learning by removing the synchronous communication assumption. Moreover, we extend the studied function class by removing boundedness assumptions on the gradient norm. Our main technical contribution is a unified proof for asynchronous federated learning with bounded staleness that we apply to MAML and ME personalization frameworks. For the smooth and nonconvex functions class, we show the convergence of our method to a firstorder stationary point. We illustrate the performance of our method and its tolerance to staleness through experiments for classification tasks over heterogeneous datasets. 
M. Taha Toghani · Soomin Lee · Cesar Uribe 🔗 


Bayesian Optimization with a Neural Network Metalearned on Synthetic Data Only
(
Poster
)
link »
SlidesLive Video » Bayesian Optimization (BO) is an effective approach to optimize blackbox functions, relying on a probabilistic surrogate to model the response surface. In this work, we propose to use a Priordata Fitted Network (PFN) as a cheap and flexible surrogate. PFNs are neural networks that approximate the Posterior Predictive Distribution (PPD) in a single forwardpass. Most importantly, they can approximate the PPD for any prior distribution that we can sample from efficiently. Additionally, we show what is required for PFNs to be used in a standard BO setting with common acquisition functions. We evaluated the performance of a PFN surrogate for Hyperparameter optimization (HPO), a major application of BO. While the method can still fail for some search spaces, we fare comparable or better than the stateoftheart on the HPOB and PD1 benchmark. 
Samuel Müller · Sebastian Pineda Arango · Matthias Feurer · Josif Grabocka · Frank Hutter 🔗 


Recommendation for New Drugs with Limited Prescription Data
(
Poster
)
link »
Drug recommendation assists doctors in prescribing personalized medications to patients based on their health conditions. However, newly approved drugs do not have much historical prescription data and cannot leverage existing drug recommendation methods. To address this, we propose EDGE, which maintains a drugdependent multiphenotype fewshot learner to bridge the gap between existing and new drugs. Experiment results show that EDGE can adapt to the recommendation for a new drug with limited prescription data from a few patients. 
Zhenbang Wu · Huaxiu Yao · Zhe Su · David Liebovitz · Lucas Glass · James Zou · Chelsea Finn · Jimeng Sun 🔗 


Towards Automated Design of Bayesian Optimization via Exploratory Landscape Analysis
(
Poster
)
link »
SlidesLive Video » Bayesian optimization (BO) algorithms form a class of surrogatebased heuristics, aimed at efficiently computing highquality solutions for numerical blackbox optimization problems. The BO pipeline is highly modular, with different design choices for the initial sampling strategy, the surrogate model, the acquisition function (AF), the solver used to optimize the AF, etc. We demonstrate in this work that a dynamic selection of the AF can benefit the BO design. More precisely, we show that already a naive random forest regression model, built on top of exploratory landscape analysis features that are computed from the initial design points, suffices to recommend AFs that outperform any static choice, when considering performance over the classic BBOB benchmark suite for derivativefree numerical optimization methods on the COCO platform. Our work hence paves a way towards AutoMLassisted, onthefly BO designs that adjust their behavior on a runbyrun basis. 
Carolin Benjamins · Anja Jankovic · Elena Raponi · Koen van der Blom · Marius Lindauer · Carola Doerr 🔗 


OneShot Optimal Design for Gaussian Process Analysis of Randomized Experiments
(
Poster
)
link »
SlidesLive Video » Bayesian Optimization provides a sampleefficient approach to optimize Internet systems that are evaluated with randomized experiments. Such evaluations are often resource and time consuming in order to measure noisy and longterm outcomes. Thus, the initial randomized design, i.e. determining number of test groups and sample sizes, plays a critical role in building an accurate Gaussian Process model to optimize efficiently and decreasing experimentation cost. We develop a simulationbased method with metalearned priors to decide the optimal design for the initial batch of GPmodeled randomized experiments. The metalearning is performed on a large corpus of randomized experiments conducted at Meta and obtains sensible GP priors for simulating across different designs. The oneshot optimal design policy is derived by training a machine learning model with simulation data to map experiment characteristics to an optimal design. Our evaluations show that our proposed optimal design significantly improves resourceefficiency while achieving a target GP model accuracy. 
Jelena Markovic · Qing Feng · Eytan Bakshy 🔗 


Learning to Prioritize Planning Updates in Modelbased Reinforcement Learning
(
Poster
)
link »
SlidesLive Video » Prioritizing the states and actions from which policy improvement is performed can improve the sample efficiency of modelbased reinforcement learning systems. Although much is already known about prioritizing planning updates, more needs to be understood to operationalize these ideas in complex settings that involve nonstationary and stochastic transition dynamics, large numbers of states, and scalable function approximation architectures. Our paper presents an online metalearning algorithm to address these needs. The algorithm finds distributions that encode priority in their probability mass. The paper evaluates the algorithm in a domain with a changing goal and with a fixed, generative transition model. Results show that prioritizing planning updates from samples of the metalearned distribution significantly improves sample efficiency over fixed baseline distributions. Additionally, they point to a number of interesting opportunities for future research. 
Brad Burega · John Martin · Michael Bowling 🔗 


GraViTE: Gradientbased Vision Transformer Search with Entangled Weights
(
Poster
)
link »
SlidesLive Video » Differentiable oneshot neural architecture search methods have recently become popular since they can exploit weightsharing to efficiently search in large architectural search spaces. These methods traditionally perform a continuous relaxation of the discrete search space to search for an optimal architecture. However, they suffer from large memory requirements, making their application to parameterheavy architectures like transformers difficult. Recently, singlepath oneshot methods have been introduced which often use weight entanglement to alleviate this issue by sampling the weights of the subnetworks from the largest model, which is itself the supernet. In this work, we propose a continuous relaxation of weight entanglementbased architectural representation. Our Gradientbased Vision Transformer Search with Entangled Weights (GraViTE) combines the best properties of both differentiable oneshot NAS and weight entanglement. We observe that our method imparts much better regularization properties and memory efficiency to the trained supernet. We study three oneshot optimizers on the Vision Transformer search space and observe that our method outperforms existing baselines on multiple datasets while being upto 35% more parameter efficient on ImageNet1k. 
Rhea Sukthanker · Arjun Krishnakumar · sharat patil · Frank Hutter 🔗 


Expanding the Deployment Envelope of Behavior Prediction via Adaptive MetaLearning
(
Poster
)
link »
SlidesLive Video » Learningbased behavior prediction methods are increasingly being deployed in realworld autonomous systems, e.g., in fleets of selfdriving vehicles, which are beginning to commercially operate in major cities across the world. Despite their advancements, however, the vast majority of prediction systems are specialized to a set of wellexplored geographic regions or operational design domains, complicating deployment to additional cities, countries, or continents. Towards this end, we present a novel method for efficiently adapting behavior prediction models to new environments. Our approach leverages recent advances in metalearning, specifically Bayesian regression, to augment existing behavior prediction models with an adaptive layer that enables efficient domain transfer via offline finetuning, online adaptation, or both. Experiments across multiple realworld datasets demonstrate that our method can efficiently adapt to a variety of unseen environments. 
Boris Ivanovic · James Harrison · Marco Pavone 🔗 


PriorBand: HyperBand + Human Expert Knowledge
(
Poster
)
link »
SlidesLive Video » Hyperparameters of Deep Learning (DL) pipelines are crucial for their performance. While a large number of methods for hyperparameter optimization (HPO) have been developed, they are misaligned with the desiderata of a modern DL researcher. Since often only a few trials are possible in the development of new DL methods, manual experimentation is still the most prevalent approach to set hyperparameters,relying on the researcher’s intuition and cheap preliminary explorations. To resolve this shortcoming of HPO for DL, we propose PriorBand, an HPO algorithm tailored to DL, able to utilize both expert beliefs and cheap proxy tasks. Empirically, we demonstrate the efficiency of PriorBand across a range of DL models and tasks using as little as the cost of 10 training runs and show its robustness against poor expert beliefs and misleading proxy tasks. 
Neeratyoy Mallik · Carl Hvarfner · Danny Stoll · Maciej Janowski · Edward Bergman · Marius Lindauer · Luigi Nardi · Frank Hutter 🔗 


The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence
(
Poster
)
link »
SlidesLive Video » Recently, it has been observed that a transfer learning solution might be all we need to solve many fewshot learning benchmarks  thus raising important questions about when and how metalearning algorithms should be deployed. In this paper, we seek to clarify these questions by 1. proposing a novel metric  the {\it diversity coefficient}  to measure the diversity of tasks in a fewshot learning benchmark and 2. by comparing ModelAgnostic MetaLearning (MAML) and transfer learning under fair conditions (same architecture, same optimizer, and all models trained to convergence).Using the diversity coefficient, we show that the popular MiniImageNet and CIFARFS fewshot learning benchmarks have low diversity. This novel insight contextualizes claims that transfer learning solutions are better than metalearned solutions in the regime of low diversity under a fair comparison. Specifically, we empirically find that a low diversity coefficient correlates with a high similarity between transfer learning and MAML learned solutions in terms of accuracy at metatest time and classification layer similarity (using feature based distance metrics like SVCCA, PWCCA, CKA, and OPD). To further support our claim, we find this metatest accuracy holds even as the model size changes. Therefore, we conclude that in the low diversity regime, MAML and transfer learning have equivalent metatest performance when both are compared fairly.We also hope our work inspires more thoughtful constructions and quantitative evaluations of metalearning benchmarks in the future. 
Brando Miranda · Patrick Yu · YuXiong Wang · Sanmi Koyejo 🔗 


Towards Discovering Neural Architectures from Scratch
(
Poster
)
link »
SlidesLive Video » The discovery of neural architectures from scratch is the longstanding goal of Neural Architecture Search (NAS). Searching over a wide spectrum of neural architectures can facilitate the discovery of previously unconsidered but wellperforming architectures. In this work, we take a large step towards discovering neural architectures from scratch by expressing architectures algebraically. This algebraic view leads to a more general method for designing search spaces, which allows us to compactly represent search spaces that are 100s of orders of magnitude larger than common spaces from the literature. Further, we propose a Bayesian Optimization strategy to efficiently search over such huge spaces, and demonstrate empirically that both our search space design and our search strategy can be superior to existing baselines. We open source our algebraic NAS approach and provide APIs for PyTorch and TensorFlow. 
Simon Schrodi · Danny Stoll · Robin Ru · Rhea Sukthanker · Thomas Brox · Frank Hutter 🔗 


HyperSound: Generating Implicit Neural Representations of Audio Signals with Hypernetworks
(
Poster
)
link »
SlidesLive Video » Implicit neural representations (INRs) are a rapidly growing research field, which provides alternative ways to represent multimedia signals. Recent applications of INRs include image superresolution, compression of highdimensional signals, or 3D rendering. However, these solutions usually focus on visual data, and adapting them to the audio domain is not trivial. Moreover, it requires a separately trained model for every data sample. To address this limitation, we propose HyperSound, a metalearning method leveraging hypernetworks to produce INRs for audio signals unseen at training time. We show that our approach can reconstruct sound waves with quality comparable to other stateoftheart models. 
Filip Szatkowski · Karol J. Piczak · Przemysław Spurek · Jacek Tabor · Tomasz Trzcinski 🔗 


On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition
(
Poster
)
link »
SlidesLive Video » Face recognition systems are used widely but are known to exhibit bias across a range of sociodemographic dimensions, such as gender and race. An array of works proposing preprocessing, training, and postprocessing methods have failed to close these gaps. Here, we take a very different approach to this problem, identifying that both architectures and hyperparameters of neural networks are instrumental in reducing bias. We first run a largescale analysis of the impact of architectures and training hyperparameters on several common fairness metrics and show that the implicit convention of choosing highaccuracy architectures may be suboptimal for fairness. Motivated by our findings, we run the first neural architecture search for fairness, jointly with a search for hyperparameters. We output a suite of models which Paretodominate all other competitive architectures in terms of accuracy and fairness. Furthermore, we show that these models transfer well to other face recognition datasets with similar and distinct protected attributes. We release our code and raw result files so that researchers and practitioners can replace our fairness metrics with a bias measure of their choice. 
Samuel Dooley · Rhea Sukthanker · John Dickerson · Colin White · Frank Hutter · Micah Goldblum 🔗 


FewShot Calibration of Set Predictors via MetaLearned CrossValidationBased Conformal Prediction
(
Poster
)
link »
SlidesLive Video » Conventional frequentist learning is known to yield poorly calibrated models that fail to reliably quantify the uncertainty of their decisions. Bayesian learning can improve calibration, but formal guarantees apply only under restrictive assumptions about correct model specification. Conformal prediction (CP) offers a general framework for the design of set predictors with calibration guarantees that hold regardless of the underlying data generation mechanism. However, when training data are limited, CP tends to produce large, and hence uninformative, predicted sets. This paper introduces a novel metalearning solution that aims at reducing the set prediction size. Unlike prior work, the proposed metalearning scheme, referred to as metaXB, (i) builds on crossvalidationbased CP, rather than the less efficient validationbased CP; and (ii) preserves formal pertask calibration guarantees, rather than less stringent taskmarginal guarantees. 
Sangwoo Park · Kfir M. Cohen · Osvaldo Simeone 🔗 


Multiobjective Treestructured Parzen Estimator Meets Metalearning
(
Poster
)
link »
SlidesLive Video » Hyperparameter optimization (HPO) is essential for the better performance of deep learning, and practitioners often need to consider the tradeoff between multiple metrics, such as error rate, latency, memory requirements, robustness, and algorithmic fairness. Due to this demand and the heavy computation of deep learning, the acceleration of multiobjective (MO) optimization becomes ever more important. Although metalearning has been extensively studied to speedup HPO, existing methods are not applicable to the MO treestructured parzen estimator (MOTPE), a simple yet powerful MO HPO algorithm. In this paper, we extend TPE’s acquisition function to the metalearning setting, using a task similarity defined by the overlap in promising regions of each task. In a comprehensive set of experiments, we demonstrate that our method accelerates MOTPE on tabular HPO benchmarks and yields stateoftheart performance. Our method was also validated externally by winning the AutoML 2022 competition on "Multiobjective Hyperparameter Optimization for Transformers". 
Shuhei Watanabe · Noor Awad · Masaki Onishi · Frank Hutter 🔗 


Unsupervised Metalearning via Fewshot Pseudosupervised Contrastive Learning
(
Poster
)
link »
SlidesLive Video » Unsupervised metalearning aims to learn generalizable knowledge across a distribution of tasks constructed from unlabeled data. Here, the main challenge is how to construct diverse tasks for metalearning without label information; recent works have proposed to create, e.g., pseudolabeling via pretrained representations or creating synthetic samples via generative models. However, such a task construction strategy is fundamentally limited due to heavy reliance on the immutable pseudolabels during metalearning and the quality of the representations or the generated samples. To overcome the limitations, we propose a simple yet effective unsupervised metalearning framework, coined Pseudosupervised Contrast (PsCo), for fewshot classification. We are inspired by the recent selfsupervised learning literature; PsCo utilizes a momentum network and a queue of previous batches to improve pseudolabeling and construct diverse tasks in a progressive manner. Our extensive experiments demonstrate that PsCo outperforms existing unsupervised metalearning methods under various indomain and crossdomain fewshot classification benchmarks. We also validate that PsCo is easily scalable to a largescale benchmark, while recent priorart metaschemes are not. 
Huiwon Jang · Hankook Lee · Jinwoo Shin 🔗 


UncertaintyAware MetaLearning for Multimodal Task Distributions
(
Poster
)
link »
SlidesLive Video » Metalearning is a popular approach for learning new tasks with limited data (i.e., fewshot learning) by leveraging the commonalities among different tasks. However, metalearned models can perform poorly when context data is limited, or when data is drawn from an outofdistribution (OoD) task. Especially in safetycritical settings, this necessitates an uncertaintyaware approach to metalearning. In this work, we present UNLIMITD (uncertaintyaware metalearning for multimodal6 task distributions), a novel method for metalearning that (1) makes probabilistic predictions on indistribution tasks efficiently, (2) is capable of detecting OoD context data at test time, and (3) performs on heterogeneous, multimodal task distributions. To achieve this goal, we take a probabilistic perspective and train a parametric, tuneable distribution over tasks on the metadataset. We construct this distribution by performing Bayesian inference on a linearized neural network, leveraging Gaussian process theory. We demonstrate that UNLIMITD's predictions compare favorably to, and outperform in most cases, the standard baselines, especially in the lowdata regime. Furthermore, we show that UNLIMITD is effective in detecting data from OoD tasks. Finally, we confirm that both of these findings continue to hold in the multimodal taskdistribution setting. 
Cesar Almecija · Apoorva Sharma · YoungJin Park · Navid Azizan 🔗 


Lightweight Prompt Learning with General Representation for Rehearsalfree Continual Learning
(
Poster
)
link »
SlidesLive Video » Recently, the promptbased continual learning has become a new stateoftheart by using small prompts to induce a large pretrained model toward each target task. However, we figure out that they still suffer from memory problem as the number of prompts should increase if the model learns very many tasks. To improve this limit, inspired by the human hippocampus, we propose Lightweight Prompt Learning with General Representation (LPG), a novel rehearsalfree continual learning method. Throughout the study, we experimentally show our LPG's promising performances and corresponding analyses. We expect our proposition to spotlight a novel continual learning paradigm that utilizes a single prompt to hedge memory problems as well as sustain precise performance. 
Hyunhee Chung · Kyung Ho Park 🔗 


MetaRL for MultiAgent RL: Learning to Adapt to Evolving Agents
(
Poster
)
link »
SlidesLive Video » In MultiAgent RL, agents learn and evolve together, and each agent has to interact with a changing set of other agents. While generally viewed as a problem of nonstationarity, we propose that this can be viewed as a MetaRL problem. We demonstrate an approach for learning Stackelberg equilibria, a type of equilibrium that features a bilevel optimization problem, where the inner level is a "bestresponse" of one or more follower agents to an evolving leader agent. Various approaches have been proposed in the literature to implement this bestresponse, most often treating each leader policy and the learning problem it induces for the follower(s) as a separate instance.We propose that the problem can be viewed as a meta (reinforcement) learning problem: Learning to learn to bestrespond to different leader behaviors, by leveraging commonality in the induced follower learning problems. We demonstrate an approach using contextual policies and show that it matches performance of existing approaches using significantly fewer environment samples in experiments. We discuss how more advanced metaRL techniques could allow this to scale to richer domains. 
Matthias Gerstgrasser · David Parkes 🔗 


Neural Architecture for Online Ensemble Continual Learning
(
Poster
)
link »
SlidesLive Video » Continual learning with an increasing number of classes is a challenging task. The difficulty rises when each example is presented exactly once, which requires the model to learn online. Recent methods with classic parameter optimization procedures have been shown to struggle in such setups or have limitations like nondifferentiable components or memory buffers. For this reason, we present the fully differentiable ensemble method that allows us to efficiently train an ensemble of neural networks in the endtoend regime. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods. The conducted experiments have also shown a significant increase in the performance for small ensembles, which demonstrates the capability of obtaining relatively high classification accuracy with a reduced number of classifiers. 
Mateusz Wójcik · Witold Kościukiewicz · Adam Gonczarek · Tomasz Kajdanowicz 🔗 


MetaLearning via Classifier(free) Guidance
(
Poster
)
link »
SlidesLive Video » We aim to develop metalearning techniques that achieve higher zeroshot performance than the state of the art on unseen tasks. To do so, we take inspiration from recent advances in generative modeling and languageconditioned image synthesis to propose metalearning techniques that use natural language guidance for zeroshot task adaptation. We first train an unconditional generative hypernetwork model to produce neural network weights; then we train a second "guidance" model that, given a natural language task description, traverses the hypernetwork latent space to find highperformance taskadapted weights in a zeroshot manner. We explore two alternative approaches for latent space guidance: "HyperCLIP"based classifier guidance and a conditional Hypernetwork Latent Diffusion Model ("HyperLDM"), which we show to benefit from the classifierfree guidance technique common in image generation. Finally, we demonstrate that our approaches outperform existing metalearning methods with zeroshot learning experiments on our MetaVQA dataset. 
Elvis Nava · Seijin Kobayashi · Yifei Yin · Robert Katzschmann · Benjamin F. Grewe 🔗 


MARS: Metalearning as score matching in the function space
(
Poster
)
link »
SlidesLive Video » We approach metalearning through the lens of functional Bayesian neural network inference which views the prior as a stochastic process and performs inference in the function space. Specifically, we view the metatraining tasks as samples from the datagenerating process and formalize metalearning as empirically estimating the law of this stochastic process. Our approach can seamlessly acquire and represent complex prior knowledge by metalearning the score function of the datagenerating process marginals. In a comprehensive benchmark, we demonstrate that our method achieves stateoftheart performance in terms of predictive accuracy and substantial improvements in the quality of uncertainty estimates. 
Kruno Lehman · Jonas Rothfuss · Andreas Krause 🔗 


Debiasing MetaGradient Reinforcement Learning by Learning the Outer Value Function
(
Poster
)
link »
SlidesLive Video » Metagradient Reinforcement Learning (RL) allows agents to selftune their hyperparameters in an online fashion during training.In this paper, we identify a bias in the metagradient of current metagradient RL approaches.This bias comes from using the critic that is trained using the metalearned discount factor for the advantage estimation in the outer objective which requires a different discount factor.Because the metalearned discount factor is typically lower than the one used in the outer objective, the resulting bias may cause the metagradient to favor myopic policies.We propose a simple solution to this issue: we alleviate this bias by using an alternative, \emph{outer} value function in the estimation of the outer loss. To obtain this outer value function we add a second head to the critic network and train it alongside the classic critic, using the outer loss discount factor.On an illustrative toy problem, we show that the bias can cause catastrophic failure of current metagradient RL approaches, and show that our proposed solution fixes it.We then apply our method to more complex environments and demonstrate that fixing the metagradient bias significantly improves performance. 
Clément Bonnet · Laurence Midgley · Alexandre Laterre 🔗 


GramML: Exploring ContextFree Grammars with ModelFree Reinforcement Learning
(
Poster
)
link »
SlidesLive Video » One concern of AutoML systems is how to discover the best pipeline configuration to solve a particular task in the shortest amount of time. Recent approaches tackle the problem using techniques based on learning a model that helps relate the configuration space and the objective being optimized. However, relying on such a model poses some difficulties. First, both pipelines and datasets have to be represented with metafeatures. Second, there exists a strong dependence on the chosen model and its hyperparameters. In this paper, we present a simple yet effective modelfree reinforcement learning approach based on an adaptation of the Monte Carlo tree search (MCTS) algorithm for trees and contextfree grammars. We run experiments on the OpenMLCC18 benchmark suite and show superior performance compared to the stateoftheart. 
Hernan C. Vazquez · Jorge Sanchez · Rafael Carrascosa 🔗 


Efficient Queries Transformer Neural Processes
(
Poster
)
link »
SlidesLive Video » Neural Processes (NPs) are popular methods in metalearning that can estimate predictive uncertainty on target datapoints by conditioning on a context dataset. Previous stateoftheart method Transformer Neural Processes (TNPs) achieve strong performance but require quadratic computation with respect to the number of context datapoints per query, limiting its applications. Conversely, existing subquadratic NP variants perform significantly worse than that of TNPs. Tackling this issue, we propose Efficient Queries Transformer Neural Processes (EQTNPs), a more computationally efficient NP variant. The model encodes the context dataset into a set of vectors that is linear in the number of context datapoints. When making predictions, the model retrieves higherorder information from the context dataset via multiple crossattention mechanisms on the context vectors. We empirically show that EQTNPs achieve results competitive with the stateoftheart. 
Leo Feng · Hossein Hajimirsadeghi · Yoshua Bengio · Mohamed Osama Ahmed 🔗 


Metalearning of Blackbox Solvers Using Deep Reinforcement Learning
(
Poster
)
link »
SlidesLive Video » Blackbox optimization does not require any specification on the function we are looking to optimize. As such, it represents one of the most general problems in optimization, and is central in many scientific areas. However in many practical cases, one must solve a sequence of blackbox problems from functions originating from a specific class and hence sharing similar patterns. Classical algorithms such as evolutionary or random methods would treat each problem independently and would be oblivious of the general underlying structure. In this paper, we introduce MELBA, an algorithm that exploits the similarities among a given class of functions to learn a taskspecific solver that is tailored to efficiently optimize every function from this task. More precisely, given a class of functions, the proposed algorithm learns a Transformerbased Reinforcement Learning (RL) blackbox solver. First, the Transformer embeds a previously gathered set of evaluation points and their image through the function into a latent state that characterizes the current stage of the optimization process. Then, the next evaluation point is sampled according to the latent state. The blackbox solver is trained using PPO and the global regret on a training set. We show experimentally the effectiveness of our solvers on various synthetic and reallife tasks including the hyperparameter optimization of ML models (SVM, XGBoost) and demonstrate that our approach is competitive with existing methods. 
Cedric Malherbe · Aladin Virmaux · Ludovic Dos Santos · Sofian Chaybouti 🔗 


Contextual SqueezeandExcitation
(
Poster
)
link »
SlidesLive Video » Several applications require effective knowledge transfer across tasks in the lowdata regime. For instance in personalization a pretrained system is adapted by learning on small amounts of labeled data belonging to a specific user (context). This setting requires high accuracy under low computational complexity, meaning low memory footprint in terms of parameters storage and adaptation cost. Metalearning methods based on Featurewise Linear Modulation generators (FiLM) satisfy these constraints as they can adapt a backbone without expensive finetuning. However, there has been limited research on viable alternatives to FiLM generators. In this paper we focus on this area of research and propose a new adaptive block called Contextual SqueezeandExcitation (CaSE). CaSE is more efficient than FiLM generators for a variety of reasons: it does not require a separate set encoder, has fewer learnable parameters, and only uses a scale vector (no shift) to modulate activations. We empirically show that CaSE is able to outperform FiLM generators in terms of parameter efficiency (a 75% reduction in the number of adaptation parameters) and classification accuracy (a 1.5% average improvement on the 26 datasets of the VTAB+MD benchmark). 
Massimiliano Patacchiola · John Bronskill · Aliaksandra Shysheya · Katja Hofmann · Sebastian Nowozin · Richard Turner 🔗 


Conditional Neural Processes for Molecules
(
Poster
)
link »
SlidesLive Video » Neural processes (NPs) are models for transfer learning with properties reminiscent of Gaussian Processes (GPs). They are adept at modelling data consisting of few observations of many related functions on the same input space and are trained by minimizing a variational objective, which is computationally much less expensive than the Bayesian updating required by GPs. So far, most studies of NPs have focused on lowdimensional datasets which are not representative of realistic transfer learning tasks. Drug discovery is one application area that is characterized by datasets consisting of many chemical properties or functions which are sparsely observed, yet depend on shared features or representations of the molecular inputs. This paper applies the conditional neural process (CNP) to DOCKSTRING, a dataset of docking scores for benchmarking ML models. CNPs show competitive performance in fewshot learning tasks relative to supervised learning baselines common in QSAR modelling, as well as an alternative model for transfer learning based on pretraining and refining neural network regressors. We present a Bayesian optimization experiment which showcases the probabilistic nature of CNPs and discuss shortcomings of the model in uncertainty quantification. 
Miguel GarciaOrtegon · Andreas Bender · Sergio Bacallado 🔗 


MetaLearning GeneralPurpose Learning Algorithms with Transformers
(
Poster
)
link »
SlidesLive Video » Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Metalearning, or learningtolearn, instead aims to learn those aspects, and promises to unlock greater capabilities with less manual effort. One particularly ambitious goal of metalearning is to train general purpose learning algorithms from scratch, using only black box models with minimal inductive bias. A general purpose learning algorithm is one which takes in training data, and produces testset predictions across a wide range of problems, without any explicit definition of an inference model, training loss, or optimization algorithm. In this paper we show that Transformers and other blackbox models can be metatrained to act as general purpose learning algorithms, and can generalize to learn on different datasets than used during metatraining. We characterize phase transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to metatrain at all, induced by changes in model size, number of tasks used during metatraining, and metaoptimization hyperparameters. We further show that the capabilities of metatrained algorithms are bottlenecked by the accessible state size (memory) determining the next prediction, unlike standard models which are thought to be bottlenecked by parameter count. 
Louis Kirsch · Luke Metz · James Harrison · Jascha SohlDickstein 🔗 


Betty: An Automatic Differentiation Library for Multilevel Optimization
(
Poster
)
link »
SlidesLive Video »
Gradientbased multilevel optimization (MLO) has gained attention as a framework for studying numerous problems, ranging from hyperparameter optimization and metalearning to neural architecture search and reinforcement learning. However, gradients in MLO, which are obtained by composing bestresponse Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. We take an initial step towards closing this gap by introducing Betty, a software library for largescale MLO. At its core, we devise a novel dataflow graph for MLO, which allows us to (1) develop efficient automatic differentiation for MLO that reduces the computational complexity from $\mathcal{O}(d^3)$ to $\mathcal{O}(d^2)$, (2) incorporate systems support such as mixedprecision and dataparallel training for scalability, and (3) facilitate implementation of MLO programs of arbitrary complexity while allowing a modular interface for diverse algorithmic and systems design choices. We empirically demonstrate that Betty can be used to implement an array of MLO programs, while also observing up to 11% increase in test accuracy, 14% decrease in GPU memory usage, and 20% decrease in training wall time over existing implementations on multiple benchmarks. We also showcase that Betty enables scaling MLO to models with hundreds of millions of parameters.

Sang Keun Choe · Willie Neiswanger · Pengtao Xie · Eric Xing 🔗 


FiT: Parameter Efficient Fewshot Transfer Learning
(
Poster
)
link »
SlidesLive Video » Model parameter efficiency is key for enabling fewshot learning, inexpensive model updates for personalization, and communication efficient federated learning. In this work, we develop FiLM Transfer (FiT) which combines ideas from transfer learning (fixed pretrained backbones and finetuned FiLM adapter layers) and metalearning (automatically configured Naive Bayes classifiers and episodic training) to yield parameter efficient models with superior classification accuracy at lowshot. We experiment with FiT on a range of downstream datasets and show that it achieves better classification accuracy than the leading Big Transfer (BiT) algorithm at lowshot and achieves stateofthe art accuracy on the challenging VTAB1k benchmark, with fewer than 1% of the updateable parameters. 
Aliaksandra Shysheya · John Bronskill · Massimiliano Patacchiola · Sebastian Nowozin · Richard Turner 🔗 


Topological Continual Learning with Wasserstein Distance and Barycenter
(
Poster
)
link »
SlidesLive Video » Continual learning in neural networks suffers from a phenomenon called catastrophic forgetting, in which a network quickly forgets what was learned in a previous task. The human brain, however, is able to continually learn new tasks and accumulate knowledge throughout life. Neuroscience findings suggest that continual learning success in the human brain is potentially associated with its modular structure and memory consolidation mechanisms. In this paper we propose a novel topological regularization that penalizes cycle structure in a neural network during training using principled theory from persistent homology and optimal transport. The penalty encourages the network to learn modular structure during training. The penalization is based on the closedform expressions of the Wasserstein distance and barycenter for the topological features of a 1skeleton representation for the network. Our topological continual learning method combines the proposed regularization with a tiny episodic memory to mitigate forgetting. We demonstrate that our method is effective in both shallow and deep network architectures for multiple image classification datasets. 
Tananun Songdechakraiwut · Xiaoshuang Yin · Barry Van Veen 🔗 


Multiple Modes for Continual Learning
(
Poster
)
link »
Adapting model parameters to incoming streams of data is a crucial factor to deep learning scalability. Interestingly, prior continual learning strategies in online settings inadvertently anchor their updated parameters to a local parameter subspace to remember old tasks, else drift away from the subspace and forget. From this observation, we formulate a tradeoff between constructing multiple parameter modes and allocating tasks per mode. ModeOptimized Task Allocation (MOTA), our contributed adaptation strategy, trains multiple modes in parallel, then optimizes task allocation per mode. We empirically demonstrate improvements over baseline continual learning strategies and across varying distribution shifts, namely sub10 population, domain, and task shift. 
Siddhartha Datta · Nigel Shadbolt 🔗 


Interpolating Compressed Parameter Subspaces
(
Poster
)
link »
Though distribution shifts have caused growing concern for machine learning scalability, solutions tend to specialize towards a specific type of distribution shift. We learn that constructing a Compressed Parameter Subspaces (CPS), a geometric structure representing distanceregularized parameters mapped to a set of traintime distributions, can maximize average accuracy over a broad range of distribution shifts concurrently. We show sampling parameters within a CPS can mitigate backdoor, adversarial, permutation, stylization and rotation perturbations. Regularizing a hypernetwork with CPS can also reduce task forgetting. 
Siddhartha Datta · Nigel Shadbolt 🔗 


HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection
(
Poster
)
link »
SlidesLive Video » It is well known that different algorithms perform differently well on an instance of an algorithmic problem, motivating algorithm selection (AS): Given an instance of an algorithmic problem, which is the most suitable algorithm to solve it? As such, the AS problem has received considerable attention resulting in various approaches  many of which either solve a regression or ranking problem under the hood. Although both of these formulations yield very natural ways to tackle AS, they have considerable weaknesses. On the one hand, correctly predicting the performance of an algorithm on an instance is a sufficient, but not a necessary condition to produce a correct ranking over algorithms and in particular ranking the best algorithm first. On the other hand, classical ranking approaches often do not account for concrete performance values available in the training data, but only leverage rankings composed from such data. We propose HARRIS  Hybrid rAnking and RegRessIon foreSts  a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses. HARRIS' decisions are based on a forest model, whose trees are created based on splits optimized on a hybrid ranking and regression loss function. As our preliminary experimental study on ASLib shows, HARRIS improves over standard algorithm selection approaches on some scenarios showing that combining ranking and regression in trees is indeed promising for AS. 
Lukas Fehring · Jonas Hanselle · Alexander Tornede 🔗 
Author Information
Huaxiu Yao (Stanford University)
Eleni Triantafillou (Google Brain)
Fabio Ferreira (Universität Freiburg)
Joaquin Vanschoren (Eindhoven University of Technology)
Qi Lei (New York University)
More from the Same Authors

2020 : Learning Flexible Classifiers with ShotCONditional Episodic (SCONE) Training »
Eleni Triantafillou 
2022 : WildTime: A Benchmark of intheWild Distribution Shift over Time »
Caroline Choi · Huaxiu Yao · Yoonho Lee · Pang Wei Koh · Chelsea Finn 
2022 : MultiDomain LongTailed Learning by Augmenting Disentangled Representations »
Huaxiu Yao · Xinyu Yang · Allan Zhou · Chelsea Finn 
2022 : Surgical FineTuning Improves Adaptation to Distribution Shifts »
Yoonho Lee · Annie Chen · Fahim Tajwar · Ananya Kumar · Huaxiu Yao · Percy Liang · Chelsea Finn 
2022 : Relational OutofDistribution Generalization »
Xinyu Yang · Xinyi Pan · Shengchao Liu · Huaxiu Yao 
2022 : LOTUS: Learning to learn with Optimal Transport in Unsupervised Scenarios »
prabhant singh · Joaquin Vanschoren 
2022 : Recommendation for New Drugs with Limited Prescription Data »
Zhenbang Wu · Huaxiu Yao · Zhe Su · David Liebovitz · Lucas Glass · James Zou · Chelsea Finn · Jimeng Sun 
2022 : Surgical FineTuning Improves Adaptation to Distribution Shifts »
Yoonho Lee · Annie Chen · Fahim Tajwar · Ananya Kumar · Huaxiu Yao · Percy Liang · Chelsea Finn 
2023 Competition: NeurIPS 2023 Machine Unlearning Competition »
Eleni Triantafillou · Fabian Pedregosa · Meghdad Kurmanji · Kairan ZHAO · Gintare Karolina Dziugaite · Peter Triantafillou · Ioannis Mitliagkas · Vincent Dumoulin · Lisheng Sun · Peter Kairouz · Julio C Jacques Junior · Jun Wan · Sergio Escalera · Isabelle Guyon 
2022 : Q & A »
Eleni Triantafillou 
2022 Tutorial: The Role of Metalearning for Fewshot Learning »
Eleni Triantafillou 
2022 : Tutorial »
Eleni Triantafillou 
2022 Poster: MetaAlbum: Multidomain MetaDataset for FewShot Image Classification »
Ihsan Ullah · Dustin CarriónOjeda · Sergio Escalera · Isabelle Guyon · Mike Huisman · Felix Mohr · Jan N. van Rijn · Haozhe Sun · Joaquin Vanschoren · Phan Anh Vu 
2022 Poster: GRASP: Navigating Retrosynthetic Planning with Goaldriven Policy »
Yemin Yu · Ying Wei · Kun Kuang · Zhengxing Huang · Huaxiu Yao · Fei Wu 
2022 Poster: WildTime: A Benchmark of intheWild Distribution Shift over Time »
Huaxiu Yao · Caroline Choi · Bochuan Cao · Yoonho Lee · Pang Wei Koh · Chelsea Finn 
2022 Poster: CMixup: Improving Generalization in Regression »
Huaxiu Yao · Yiping Wang · Linjun Zhang · James Zou · Chelsea Finn 
2021 : Eleni Triantafillou Q&A »
Eleni Triantafillou 
2021 : Eleni Triantafillou »
Eleni Triantafillou 
2021 Workshop: 5th Workshop on MetaLearning »
Erin Grant · Fábio Ferreira · Frank Hutter · Jonathan Richard Schwarz · Joaquin Vanschoren · Huaxiu Yao 
2021 Poster: Functionally Regionalized Knowledge Transfer for Lowresource Drug Discovery »
Huaxiu Yao · Ying Wei · LongKai Huang · Ding Xue · Junzhou Huang · Zhenhui (Jessie) Li 
2021 Panel: The Role of Benchmarks in the Scientific Progress of Machine Learning »
Lora Aroyo · Samuel Bowman · Isabelle Guyon · Joaquin Vanschoren 
2021 : MetaDL: Few Shot Learning Competition with Novel Datasets from Practical Domains + Q&A »
Adrian El Baz · Isabelle Guyon · Zhengying Liu · Jan N. Van Rijn · Haozhe Sun · Sébastien Treguer · WeiWei Tu · Ihsan Ullah · Joaquin Vanschoren · Phan Ahn Vu 
2021 Poster: Metalearning with an Adaptive Task Scheduler »
Huaxiu Yao · Yu Wang · Ying Wei · Peilin Zhao · Mehrdad Mahdavi · Defu Lian · Chelsea Finn 
2020 Poster: Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet LogSobolev »
Xiao Wang · Qi Lei · Ioannis Panageas 
2018 : MetaDataset: A Dataset of Datasets for Learning to Learn from Few Examples »
Eleni Triantafillou 
2017 Poster: FewShot Learning Through an Information Retrieval Lens »
Eleni Triantafillou · Richard Zemel · Raquel Urtasun