Timezone: »

Machine Learning for Intelligent Transportation Systems
Li Erran Li · Trevor Darrell

Thu Dec 08 11:00 PM -- 09:30 AM (PST) @ Room 124 + 125
Event URL: https://sites.google.com/site/nips2016intelligenttrans/home »

Our transportation systems are poised for a transformation as we make progress on autonomous vehicles, vehicle-to-vehicle (V2V) and vehicle-to-everything (V2X) communication infrastructures, and smart road infrastructures such as smart traffic lights. There are many challenges in transforming our current transportation systems to the future vision. For example, how do we achieve near-zero fatality? How do we optimize efficiency through intelligent traffic management and control of fleets? How do we optimize for traffic capacity during rush hours? To meet these requirements in safety, efficiency, control, and capacity, the systems must be automated with intelligent decision making.

Machine learning will be essential to enable intelligent transportation systems. Machine learning has made rapid progress in self-driving, e.g. real-time perception and prediction of traffic scenes, and has started to be applied to ride-sharing platforms such as Uber (e.g. demand forecasting) and crowd-sourced video scene analysis companies such as Nexar (understanding and avoiding accidents). To address the challenges arising in our future transportation system such as traffic management and safety, we need to consider the transportation systems as a whole rather than solving problems in isolation. New machine learning solutions are needed as transportation places specific requirements such as extremely low tolerance on uncertainty and the need to intelligently coordinate self-driving cars through V2V and V2X.

The goal of this workshop is to bring together researchers and practitioners from all areas of intelligent transportations systems to address core challenges with machine learning. These challenges include, but are not limited to: predictive modeling of risk and accidents through telematics, modeling, simulation and forecast of demand and mobility patterns in large scale urban transportation systems, machine learning approaches for control and coordination of traffic leveraging V2V and V2X infrastructures, efficient pedestrian detection, pedestrian intent detection, intelligent decision-making for self-driving cars, scene classification, real-time perception and prediction of traffic scenes, deep reinforcement learning from human drivers, uncertainty propagation in deep neural networks, efficient inference with deep neural networks.

The workshop will include invited speakers, panels, presentations of accepted papers and posters. We invite papers in the form of short, long and position papers to address the core challenges mentioned above. We encourage researchers and practitioners on self-driving cars, transportation systems and ride-sharing platforms to participate.

Thu 11:30 p.m. - 11:45 p.m. [iCal]
Opening Remarks (Talk)
Thu 11:45 p.m. - 12:15 a.m. [iCal]

Abstract: Recent advances in deep reinforcement learning have enabled a wide range of capabilities, including learning to play Atari games, learning (simulated) locomotion, and learning (real robot) visuomotor skills. A key issue in the application to real robotics, however, is safety during learning. In this talk I will discuss approaches to make learning safer through incorporation of classical model-predictive control into learning and through safe adaptive transfer of skills from simulated to real environments.

Bio: Pieter Abbeel (Associate Professor, UC Berkeley EECS) works in machine learning and robotics, in particular his research is on making robots learn from people (apprenticeship learning) and how to make robots learn through their own trial and error (reinforcement learning). His robots have learned: advanced helicopter aerobatics, knot-tying, basic assembly, and organizing laundry. He has won various awards, including best paper awards at ICML and ICRA, the Sloan Fellowship, the Air Force Office of Scientific Research Young Investigator Program (AFOSR-YIP) award, the Office of Naval Research Young Investigator Program (ONR-YIP) award, the DARPA Young Faculty Award (DARPA-YFA), the National Science Foundation Faculty Early Career Development Program Award (NSF-CAREER), the Presidential Early Career Award for Scientists and Engineers (PECASE), the CRA-E Undergraduate Research Faculty Mentoring Award, the MIT TR35, the IEEE Robotics and Automation Society (RAS) Early Career Award, and the Dick Volz Best U.S. Ph.D. Thesis in Robotics and Automation Award.

Pieter Abbeel
Fri 12:15 a.m. - 12:45 a.m. [iCal]

Abstract: An important property of embedded learning systems is the ever-changing environment they create for all algorithms operating in the system. Optimizing the performance of those algorithms becomes a perpetual on-line activity rather than a one-off task. I will review some of these challenges in autonomous vehicles. I will discuss active optimization methods and their application in robotics and scientific applications, focusing on scaling up the dimensionality and managing multi-fidelity evaluations. I will finish with lessons learned and thoughts on future directions as these methods move into embedded systems.

Bio: Dr. Jeff Schneider is the engineering lead for machine learning at Uber's Advanced Technologies Center. He is currently on leave from Carnegie Mellon University where he is a research professor in the school of computer science. He has 20 years experience developing, publishing, and applying machine learning algorithms in government, science, and industry. He has over 100 publications and regularly gives talks and tutorials on the subject.

Previously, Jeff was the co-founder and CEO of Schenley Park Research, a company dedicated to bringing machine learning to industry. Later, he developed a machine learning based CNS drug discovery system and commercialized it during two years as Psychogenics' Chief Informatics Officer. Through his research, commercial, and consulting efforts, he has worked with dozens of companies and government agencies around the world.

Fri 12:45 a.m. - 1:30 a.m. [iCal]
  1. Speeding up Semantic Segmentation for Autonomous Driving (Michael Treml, José Arjona-Medina, Thomas Unterthiner, Rupesh Durgesh, Felix Friedmann, Peter Schuberth, Andreas Mayr, Martin Heusel, Markus Hofmarcher, Michael Widrich, Bernhard Nessler, Sepp Hochreiter)

  2. Multi-Path Feedback Recurrent Neural Network for Scene Parsing (Xiaojie Jin, Yunpeng Chen, Zequn Jie, Jiashi Feng, Shuicheng Yan)

  3. Increasing the Stability of CNNs using a Denoising Layer Regularized by Local Lipschitz Constant (Hamed H. Aghdam, Elnaz J. Heravi, Domenec Puig)

Fri 1:30 a.m. - 2:00 a.m. [iCal]
Posters and Break
Fri 2:00 a.m. - 2:30 a.m. [iCal]

Abstract: Today, there are two major paradigms for vision-based autonomous driving systems: mediated perception approaches that parse an entire scene to make a driving decision, and behavior reflex approaches that directly map an input image to a driving action by a regressor. In this paper, we propose a third paradigm: a direct perception based approach to estimate the affordance for driving. We propose to map an input image to a small number of key perception indicators that directly relate to the affordance of a road/traffic state for driving. Our representation provides a set of compact yet complete descriptions of the scene to enable a simple controller to drive autonomously. Falling in between the two extremes of mediated perception and behavior reflex, we argue that our direct perception representation provides the right level of abstraction. We evaluate our approach in a virtual racing game as well as real world driving and show that our model can work well to drive a car in a very diverse set of virtual and realistic environments.

Bio: Jianxiong Xiao (a.k.a., Professor X) is the Founder and CEO of AutoX, Inc., a high-tech startup currently in stealth mode. Previously, he was an Assistant Professor in the Department of Computer Science at Princeton University and the founding director of the Princeton Computer Vision and Robotics Labs from 2013 to 2016. He received his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT) in 2013. Before that, he received a BEng. and MPhil. in Computer Science from the Hong Kong University of Science and Technology in 2009. His research focuses on bridging the gap between computer vision and robotics by building extremely robust and dependable computer vision systems for robot perception. In particular, he is a pioneer in the fields of 3D Deep Learning, Autonomous Driving, RGB-D Recognition and Mapping, Big Data, Large-scale Crowdsourcing, and Deep Learning for Robotics. His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012 and the Google Research Best Papers Award for 2012, and has appeared in the popular press. Jianxiong was awarded the Google U.S./Canada Fellowship in Computer Vision in 2012, the MIT CSW Best Research Award in 2011, and two Google Faculty Awards in 2014 and in 2015 respectively. He co-lead the MIT+Princeton joint team to participate in the Amazon Picking Challenge in 2016, and won the 3rd and 4th place worldwide. More information can be found at: http://www.jianxiongxiao.com.

Fri 2:30 a.m. - 3:00 a.m. [iCal]


End-to-End Learning has been demonstrated for controlling steering on a drive-by-wire car. The key software component in this system is a Convolutional Neural Network (CNN) that takes as input the stream from a video camera mounted behind the vehicle windshield and then, as output, provides steering commands to the vehicle. The CNN runs on an NVIDIA Drive PX board. The system has successfully driven on divided highways, narrow two lane roads, and roads without lane markings. The CNN was trained using data gathered by capturing on-board video from vehicles driven by humans while simultaneously recording those vehicles steering commands.


Larry Jackel is President of North-C Technologies, where he does professional consulting. From 2003-2007 he was a DARPA Program Manager in the IPTO and TTO offices. He conceived and managed programs in Universal Network-Based Document Storage and in Autonomous Ground Robot navigation and Locomotion. For most of his scientific career Jackel was a manager and researcher in Bell Labs and then AT&T Labs. He has created and managed research groups in Microscience and Microfabrication, in Machine Learning and Pattern Recognition, and in Carrier-Scale Telecom Services. Jackel holds a PhD in Experimental Physics from Cornell University with a thesis in superconducting electronics. He is a Fellow of the American Physical Society and the IEEE.

Fri 3:00 a.m. - 3:30 a.m. [iCal]

Abstract: The revolution of self-driving cars will happen in the near future. Most solutions rely on expensive 3D sensors such as LIDAR as well as hand-annotated maps. Unfortunately, this is neither cost effective nor scalable, as one needs to have a very detailed up-to-date map of the world. In this talk, I’ll review our current efforts in the domain of autonomous driving. In particular, I'll present our work on stereo, optical flow, appearance-less localization, 3D object detection as well as creating HD maps from visual information alone. This results in a much more scalable and cost-effective solution to self-driving cars.

Bio: Raquel Urtasun is an Associate Professor in the Department of Computer Science at the University of Toronto and a Canada Research Chair in Machine Learning and Computer Vision. Prior to this, she was an Assistant Professor at the Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with the University of Chicago. She received her Ph.D. degree from the Computer Science department at Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. Her research interests include machine learning, computer vision and robotics. Her recent work involves perception algorithms for self-driving cars, deep structured models and exploring problems at the intersection of vision and language. She is a recipient of an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, two Google Faculty Research Awards, a Connaught New Researcher Award and a Best Paper Runner up Prize awarded at the Conference on Computer Vision and Pattern Recognition (CVPR). She is also Program Chair of CVPR 2018, an Editor of the International Journal in Computer Vision (IJCV) and has served as Area Chair of multiple machine learning and vision conferences (i.e., NIPS, UAI, ICML, ICLR, CVPR, ECCV, ICCV).

Raquel Urtasun
Fri 3:30 a.m. - 4:40 a.m. [iCal]
Lunch (Break)
Fri 4:40 a.m. - 5:10 a.m. [iCal]

Abstract: Future robots, intelligent vehicles, and smart spaces will require a deep and accurate understanding of the environment under diverse and challenging conditions, advanced reasoning capabilities and robust planning under uncertainty. In particular, understanding behaviors of people, such as drivers, road occupants or users interacting with smart machines will be critical for safety, contextualized assistance and natural interfaces. In this talk, I will give an overview of our recent work in computer vision for visual understanding of human actions and behaviors. Our approach extends recent advances in deep learning to enable learning with weak supervision, detecting human actions in videos at scale and anticipating events before they occur.

Bio: Juan Carlos Niebles received an Engineering degree in Electronics from Universidad del Norte (Colombia) in 2002, an M.Sc. degree in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign in 2007, and a Ph.D. degree in Electrical Engineering from Princeton University in 2011. He is a Senior Research Scientist at the Stanford AI Lab and Associate Director of Research at the Stanford-Toyota Center for AI Research since 2015. He is also an Assistant Professor of Electrical and Electronic Engineering in Universidad del Norte (Colombia) since 2011. His research interests are in computer vision and machine learning, with a focus on visual recognition and understanding of human actions and activities, objects, scenes, and events. He is a recipient of a Google Faculty Research award (2015), the Microsoft Research Faculty Fellowship (2012), a Google Research award (2011) and a Fulbright Fellowship (2005).

Juan Carlos Niebles
Fri 5:10 a.m. - 7:40 a.m. [iCal]

Abstract: Cars tend to treat people like obstacles whose motion needs to be anticipated, so that the car can best stay out of their way. This results in ultra-defensive cars that cannot coordinate with people, because they miss on a key aspect of coordination: it's not just the car interpreting and responding to the actions of people, people also interpret and respond to the car's actions. We introduce a mathematical formulation of interaction that accounts for this, and show how learning and optimal control can be leveraged to generate car behavior that results in natural coordination strategies, like the car negotiating a merge or inching forward at an intersection to test whether it can go.

Bio: Anca is an Assistant Professor in the EECS Department at UC Berkeley. Her goal is to enable robots to work with, around, and in support of people. She run the InterACT Lab, where they focus on algorithms for human-robot interaction -- algorithms that move beyond the robot's function in isolation, and generate robot behavior that also accounts for interaction and coordination with end-users. She works across different applications, from assistive robots, to manufacturing, to autonomous cars, and draw from optimal control, planning, estimation, learning, and cognitive science. She also helped fund and serve on the steering committee for the Berkeley AI Research (BAIR) Lab, and am a co-PI of the Center for Human-Compatible AI.

Anca Dragan
Fri 5:40 a.m. - 6:00 a.m. [iCal]
  1. Similarity Mapping with Enhanced Siamese Network for Multi-Object Tracking Minyoung Kim, Stefano Alletto, Luca Rigazio

  2. End-to-End Deep Reinforcement Learning for Lane Keeping Assist Ahmad El Sallab, Mohammed Abdou, Etienne Perot and Senthil Yogamani

  3. Efficient decomposition method for the stochastic optimization of public transport schedules Sofia Zaourar-Michel

  4. Mapping Occupancy of Dynamic Environments using Big Data Gaussian Process Classification Ransalu Senanayake, Simon O’Callaghan, Fabio Ramos

  5. Nonnegative Matrix Factorisation of Bike Sharing System Temporal Network Ronan Hamon, Pierre Borgnat, Cédric Févotte, Patrick Flandrin

  6. Safe and optimal path planning in uncertain skies Ashish Kapoor

Fri 6:00 a.m. - 6:30 a.m. [iCal]
Posters and Coffe (Posters and Break)
Fri 6:30 a.m. - 7:00 a.m. [iCal]

Abstract: For about 80 years, people have been dreaming of cars that are able to drive by themselves. These days, this vision is starting to become reality. For the first time, cars found their way over a long distance in the DARPA Grand Challenge in 2005. Two years later, the famous DARPA Urban Challenge took place. In both events, all finalists based their systems on active sensors, and Google also started their impressive work with a high-end laser scanner accompanied by radars.

In 2013, we let a new S-class vehicle (a.k.a. Bertha) drive itself from Mannheim to Pforzheim, following the route that Bertha Benz took 125 years ago. Bertha’s environment perception was based on close to production radars and (stereo) cameras. For the visual object recognition classical box-based classifiers based on HOG and SVM or shallow neural nets were used. The experiment showed that despite the fact that the used stereo system allows for fully autonomous emergency braking in today’s Mercedes-Benz production cars, the state-of-the-art in computer vision around 2013 was not sufficient to deliver the deep understanding of the scene that we need for cars driving themselves safely in complex urban traffic. The advent of Deep Neural Networks and the fact that GPUs allow to run powerful nets like the GoogLeNet in real-time totally changed the situation. In our current vision system about 80% of all tasks are solved by DNNs or use information delivered by them. The talk sketches the most important building blocks of this system.

Since we do not believe in a purely box-based recognition system we use a Fully Convolutional Network as the core of our vision system. For training and benchmarking we have introduced the Cityscapes Dataset and benchmark suite, publicly available since early 2016. In September, we registered the 1000th download. Within only one year, the pixel level semantic segmentation performance raised up from 65% IoU to more than 77% (October 2016). The results of the semantic labeling stage are subsequently fused with the stereo based Stixel-World, a super-pixel representation of the depth image using small rectangular shaped regions. The result is a very compact representation of the traffic scene including geometry, motion and semantics. In addition, safety demands to watch out for unexpected small objects (down to a height of 5cm) on the street. We fuse the results of a specially trained FCN with a boosted stereo analysis to detect more than 80% of all targets at distances up to 100m at a false positive rate of 1/min only. If depth is not available from stereo or Lidar, it has to be derived from monocular images. We solve the depth-from-mono problem jointly with scene labeling and instance segmentation. It turns out that these sub-tasks support each other well, resulting in close to ground truth results. All schemes run in real-time on a standard GPU. Given the fact that many suppliers have efficient HW components for CNNs on their roadmap, this raises hope that we can use these powerful techniques in the near future in our cars, both for driver assistance and autonomous driving.

Bio: Uwe Franke received the Ph.D. degree in electrical engineering from the Technical University of Aachen, Germany, in 1988 for his work on content based image coding.

Since 1989 he has been with Daimler Research and Development and has been constantly working on the development of vision based driver assistance systems. He developed Daimler’s lane departure warning system introduced in 2000. Since 2000 he has been head of Daimler’s Image Understanding Group. The stereo technology developed by his group is the basis for the Mercedes Benz stereo camera system introduced in 2013. Recent work is on image understanding for autonomous driving, in particular Deep Neural Networks.

He was nominated for the “Deutscher Zukunftspreis”, Germany’s most prestigious award for Technology and Innovation given by the German President and awarded the Karl-Heinz Beckurts Prize 2012.

Fri 7:00 a.m. - 7:30 a.m. [iCal]

Abstract: Convolutional neural networks have achieved impressive success in many tasks in computer vision such as image classification, object detection / recognition or semantic segmentation. While these networks have proven effective in all these applications, they come at a high memory and computational cost, thus not feasible for embedded platforms where power and computational resources are limited. In addition, the process to train the network reduces productivity as it not only requires large computer servers but also takes a significant amount of time (several weeks) with the additional cost of engineering the architecture. Recent works have shown there is significant redundancy in the parameters of deep architectures and therefore, could be replaced by more compact architectures. In this talk, I first introduce our efficient architecture based on filter-compositions and then, a novel approach to automatically determining the optimal number of neurons per layer in the architecture during the training process. As a result, we are able to deliver competitive accuracy and achieve up to 230fps in an embedded platform (Jetson TX-1). Moreover, these networks enable rapid prototyping as their entire training process only requires a few days.

Bio: Dr. Jose M. Alvarez is a computer vision researcher at Data61 at CSIRO (formerly NICTA) working on efficient methods for large-scale dynamic scene understanding and deep learning. Dr. Alvarez graduated with his Ph.D. from Autonomous University of Barcelona (UAB) in October 2010. During his Ph.D., his research was focused on developing robust road detection algorithms for everyday driving tasks under real-world conditions. Dr. Alvarez visited the ISLA group at the University of Amsterdam (in 2008 and 2009), and the Group Research Electronics at Volkswagen (in 2010). Dr. Alvarez was awarded the best Ph.D. Thesis award in 2010 from the Autonomous University of Barcelona. Subsequently, Dr. Alvarez worked as a postdoctoral researcher at the Courant Institute of Mathematical Science, New York University. In 2012, Dr. Alvarez moved to the computer vision group at NICTA, Australia. Since 2014, Dr. Alvarez serves as associate editor for IEEE Transactions on Intelligent Transportation Systems.

Fri 7:30 a.m. - 8:00 a.m. [iCal]

Abstract: Domain adaptation is a branch of machine learning that transfers knowledge from offline training domains to new test domains. Traditional supervised learning suffers from poor generalization when the test data distribution differs from training. This problem arises in many practical applications, including perception for autonomous vehicles. For example, if the perception model is trained on a dataset collected in specific weather conditions and/or geographical locations, its performance is likely to drop significantly in novel test conditions and locations. This is true even for deep neural models that are trained on large scale datasets. I will discuss our recent work focusing on domain adaptation in unsupervised scenarios, where the target domain is assumed to have no annotated labels. Specifically, I will describe a generalized framework based on end-to-end unsupervised domain alignment using domain-adaptive losses, such as the adversarial, maximum mean discrepancy, and correlation alignment losses. This work is in collaboration with the vision group at UC Berkeley.

Bio: Prof. Kate Saenko is an Assistant Professor at the Computer Science Department at Boston University, and the director of the Computer Vision and Learning Group and member of the IVC group. Previously, she was an Assistant Professor at the UMass Lowell CS department, Postdoctoral Researcher at the International Computer Science Institute, a Visiting Scholar at UC Berkeley EECS and a Visiting Postdoctoral Fellow in the School of Engineering and Applied Science at Harvard University. Her research interests are in developing machine learning for image and language understanding, multimodal perception for autonomous systems, and adaptive intelligent human-computer interfaces.

Kate Saenko
Fri 8:00 a.m. - 8:30 a.m. [iCal]

Abstract: Robust perception models should be learned from training data with diverse visual appearances and realistic behaviors. Exising datasets are limited in geographic extend, and can be biased to a source domain. We will overview two recent projects which makes use of a large scale dashcam video dataset. First, we'll present a novel domain adaptive dilation FCN, which adapts and improved performance on unlabeled data. Our model leverages both adversarial domain adaptation losses, and MIL-based boostrapping. We show results adapting from synthetic to real domains, and from classic driving datasets to in-the-wild dashcam data. Second, we'll show a model for end-to-end learning of driving policies from dashcam videos. Current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm. We provide a novel large-scale dataset of crowd-sourced driving behavior suitable for training our model, and report results predicting the driver action on held out sequences across diverse conditions. Bio: Prof. Darrell is on the faculty of the CS Division of the EECS Department at UC Berkeley and he is also appointed at the UC-affiliated International Computer Science Institute (ICSI). Darrell’s group develops algorithms for large-scale perceptual learning, including object and activity recognition and detection, for a variety of applications including multimodal interaction with robots and mobile devices. His interests include computer vision, machine learning, computer graphics, and perception-based human computer interfaces. Prof. Darrell was previously on the faculty of the MIT EECS department from 1999-2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996-1999, and received the S.M., and PhD. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988, having started his career in computer vision as an undergraduate researcher in Ruzena Bajcsy's GRASP lab.

Trevor Darrell
Fri 8:30 a.m. - 9:00 a.m. [iCal]
Fri 9:00 a.m. - 9:10 a.m. [iCal]
Closing Remarks (Talk)

Author Information

Li Erran Li (Pony.ai)

Li Erran Li is the head of machine learning at Scale and an adjunct professor at Columbia University. Previously, he was chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and machine learning platform team at Uber where he worked on deep learning for autonomous driving, led the machine learning platform team technically, and drove strategy for company-wide artificial intelligence initiatives. He started his career at Bell Labs. Li’s current research interests are machine learning, computer vision, learning-based robotics, and their application to autonomous driving. He has a PhD from the computer science department at Cornell University. He’s an ACM Fellow and IEEE Fellow.

Trevor Darrell (UC Berkeley)

More from the Same Authors