During the past 15 years or so, I have worked on a series of seemingly distinct but eventually related problems, including machine learning algorithms, generative modeling with neural networks, machine translation, language modeling, medical imaging, a bit of healthcare, protein modeling and a bit of drug discovery. I chose to work on some of these problems intentionally, while it was pure serendipity that I worked on some others. It was only in hindsight that these seemingly different problems turned out to be closely related to each other from both technical, social and personal perspectives. In this talk, I plan to do my own retrospective on my own choices, be them intentional or not, on these problems and share with you my thoughts what our own discipline, which is sometimes called computer science, data science, machine learning or artificial intelligence, is.
The brain processes information through many layers of neurons. This deep architecture is representationally powerful, but it complicates learning by making it hard to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame to a neuron by computing exactly how it contributed to an error. To do this, it multiplies error signals by matrices consisting of all the synaptic weights on the neuron's axon and farther downstream. This operation requires a precisely choreographed transport of synaptic weight information, which is thought to be impossible in the brain. Here we present a surprisingly simple algorithm for deep learning, which assigns blame by multiplying error signals by random synaptic weights. We show that a network can learn to extract useful information from signals sent through these random feedback connections. In essence, the network learns to learn. We demonstrate that this new mechanism performs as quickly and accurately as backpropagation on a variety of problems and describe the principles which underlie its function. Our demonstration provides a plausible basis for how a neuron can be adapted using error signals generated at distal locations in the brain, and thus dispels long-held assumptions about the algorithmic constraints on learning in neural circuits.
Deep neural networks have revolutionized artificial intelligence, yet their inner workings remain poorly understood. This talk presents mathematical analyses of the nonlinear dynamics of learning in several solvable deep network models, offering theoretical insights into the role of depth. These models reveal how learning algorithms, data structure, initialization schemes, and architectural choices interact to produce hidden representations that afford complex generalization behaviors. A recurring theme across these analyses is a neural race: competing pathways within a deep network vie to explain the data, with an implicit bias toward shared representations. These shared representations in turn shape the network’s capacity for systematic generalization, multitasking, and transfer learning. I will show how such principles manifest across diverse architectures—including feedforward, recurrent, and linear attention networks. Together, these results provide analytic foundations for understanding how environmental statistics, network architecture, and learning dynamics jointly structure the emergence of neural representations and behavior.