Timezone: »

6th Workshop on Automated Knowledge Base Construction (AKBC)
Jay Pujara · Dor Arad · Bhavana Dalvi Mishra · Tim Rocktäschel

Fri Dec 08 08:00 AM -- 06:30 PM (PST) @ 102 C
Event URL: http://www.akbc.ws/ »

Extracting knowledge from text, images, audio, and video and translating these extractions into a coherent, structured knowledge base (KB) is a task that spans the areas of machine learning, natural language processing, computer vision, databases, search, data mining and artificial intelligence. Over the past two decades, machine learning techniques used for information extraction, graph construction, and automated knowledge base construction have evolved from simple rule learning to end-to-end neural architectures with papers on the topic consistently appearing at NIPS. Hence, we believe this workshop will appeal to NIPS attendees and be a valuable contribution.

Furthermore, there has been significant interest and investment in knowledge base construction in both academia and industry in recent years. Most major internet companies and many startups have developed knowledge bases that power digital assistants (e.g. Siri, Alexa, Google Now) or provide the foundations for search and discovery applications. A similarly abundant set of knowledge systems have been developed at top universities such as Stanford (DeepDive), Carnegie Mellon (NELL), the University of Washington (OpenIE), the University of Mannheim (DBpedia), and the Max Planck Institut Informatik (YAGO, WebChild), among others. Our workshop serves as a forum for researchers working on knowledge base construction in both academia and industry.

With this year’s workshop we would like to continue the successful tradition of the previous five AKBC workshops. AKBC fills a unique need in the field, bringing together industry leaders and academic researchers. Our workshop is focused on stellar invited talks from high-profile speakers who identify the pressing research areas where current methods fall short and propose visionary approaches that will lead to the next generation of knowledge bases. Our workshop prioritizes a participatory environment where attendees help identify the most promising research, contribute to surveys on controversial questions, and suggest debate topics for speaker panels. In addition, for the first time, AKBC will address a longstanding issue in the AKBC, that of equitable comparison and evaluation across methods, by including a shared evaluation platform, Stanford’s KBP Online (https://kbpo.stanford.edu/), which will allow crowdsourced labels for KBs without strong assumptions about the data or methods used. Together, this slate of high-profile research talks, outstanding contributed papers, an interactive research environment, and a novel evaluation service will ensure AKBC is a popular addition to the NIPS program.

Fri 9:00 a.m. - 9:30 a.m.

Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph.

In this talk we describe four scientific directions we are investigating in building and using such a graph, namely, harvesting product knowledge from the web, hands-off-the-wheel knowledge integration and cleaning, human-in-the-loop knowledge learning, and graph mining and graph-enhanced search. This talk will present our progress to achieve near-term goals in each direction, and show the many research opportunities towards our moon-shot goals.

Luna Dong
Fri 9:30 a.m. - 10:00 a.m.

Deep learning with large supervised training sets has had significant impact on many research challenges, from speech recognition to machine translation. However, applying these ideas to problems in computational semantics has been difficult, at least in part due to modest dataset sizes and relatively complex structured prediction tasks.

In this talk, I will present two recent results on end-to-end deep learning for classic challenge problems in computational semantics: semantic role labeling and coreference resolution. In both cases, we will introduce relative simple deep neural network approaches that use no preprocessing (e.g. no POS tagger or syntactic parser) and achieve significant performance gains, including over 20% relative error reductions when compared to non-neural methods. I will also discuss our first steps towards scaling the amount of data such methods can be trained on by many orders of magnitude, including semi-supervised learning via contextual word embeddings and supervised learning through crowdsourcing. Our hope is that these advances, when combined, will enable very high quality semantic analysis in any domain from easily gathered supervision.

Luke Zettlemoyer
Fri 10:00 a.m. - 10:30 a.m.

Graph Convolutional Networks (GCNs) is an effective tool for modeling graph structured data. We investigate their applicability in the context of both extracting semantic relations from text (specifically, semantic role labeling) and modeling relational data (link prediction). For semantic role labeling, we introduce a version of GCNs suited to modeling syntactic dependency graphs and use them as sentence encoders. Relying on these linguistically-informed encoders, we achieve the best reported scores on standard benchmarks for Chinese and English. For link prediction, we propose Relational GCNs (RGCNs), GCNs developed specifically to deal with highly multi-relational data, characteristic of realistic knowledge bases. By explicitly modeling neighbourhoods of entities, RGCNs accumulate evidence over multiple inference steps in relational graphs and yield competitive results on standard link prediction benchmarks.

Joint work with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianna van den Berg and Peter Bloem.

Ivan Titov
Fri 10:30 a.m. - 11:30 a.m.
Poster Session - Session 1 (Poster Session)
Fri 11:30 a.m. - 12:00 p.m.

Representation learning has become an invaluable approach for making statistical inferences from relational data. However, while complex relational datasets often exhibit a latent hierarchical structure, state-of-the-art embedding methods typically do not account for this property. In this talk, I will introduce a novel approach to learning such hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincaré ball. I will discuss how the underlying hyperbolic geometry allows us to learn parsimonious representations which simultaneously capture hierarchy and similarity. Furthermore, I will show that Poincaré embeddings can outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.

Maximilian Nickel
Fri 12:00 p.m. - 12:30 p.m.

We are getting better at teaching end-to-end neural models how to answer questions about content in natural language text. However, progress has been mostly restricted to extracting answers that are directly stated in text. In this talk, I will present our work towards teaching machines not only to read, but also to reason with what was read and to do this in a interpretable and controlled fashion. Our main hypothesis is that this can be achieved by the development of neural abstract machines that follow the blueprint of program interpreters for real-world programming languages. We test this idea using two languages: an imperative (Forth) and a declarative (Prolog/Datalog) one. In both cases we implement differentiable interpreters that can be used for learning reasoning patterns. Crucially, because they are based on interpretable host languages, the interpreters also allow users to easily inject prior knowledge and inspect the learnt patterns. Moreover, on tasks such as math word problems and relational reasoning our approach compares favourably to state-of-the-art methods.

Sebastian Riedel
Fri 2:00 p.m. - 2:30 p.m.

Existing pipelines for constructing KBs primarily support a restricted set of data types, such as focusing on the text of the documents when extracting information, ignoring the various modalities of evidence that we regularly encounter, such as images, semi-structured tables, video, and audio. Similarly, approaches that reason over incomplete and uncertain KBs are limited to basic entity-relation graphs, ignoring the diversity of data types that are useful for relational reasoning, such as text, images, and numerical attributes. In this work, we present a novel AKBC pipeline that takes the first steps in combining textual and relational evidence with other sources like numerical, image, and tabular data. We focus on two tasks: single entity attribute extraction from documents and relational knowledge graph completion. For each, we introduce new datasets that contain multimodal information, propose benchmark evaluations, and develop models that build upon advances in deep neural encoders for different data types.

Sameer Singh
Fri 2:30 p.m. - 2:15 p.m.
Go for a Walk and Arrive at the Answer: Reasoning Over Knowledge Bases with Reinforcement Learning (Contributed Talk)
Fri 2:45 p.m. - 3:00 p.m.
Multi-graph Affinity Embeddings for Multilingual Knowledge Graphs (Contributed Talk)
Fri 3:00 p.m. - 3:15 p.m.
A Study of Automatically Acquiring Explanatory Inference Patterns from Corpora of Explanations: Lessons from Elementary Science Exams (Contributed Talk)
Fri 3:15 p.m. - 3:45 p.m.

The Never Ending Language Learner (NELL) research project has produced a computer program that has been running continuously since January 2010, learning to build a large knowledge base by extracting structured beliefs (e.g., PersonFoundedCompany(Gates,Microsoft), BeverageServedWithBakedGood(tea,crumpets)) from unstructured text on the web. This talk will provide an update on new NELL research results, reflect on the lessons learned from this effort, and discuss specific challenges for future systems that attempt to build large knowledge bases automatically.

Tom Mitchell
Fri 3:45 p.m. - 4:45 p.m.
Poster Session - Session 2 (Poster Session)
Ambrish Rawat, Armand Joulin, Peter A Jansen, Jay Yoon Lee, Muhao Chen, Frank F. Xu, Pat Verga, Brendan Juba, Anca Dumitrache, Sharmistha Jat, Robert Logan, Dhanya Sridhar, Fan Yang, Rajarshi Das, Pouya Pezeshkpour, Nicholas Monath
Fri 4:45 p.m. - 5:45 p.m.
Discussion Panel and Debate (Discussion Panel)
Fri 5:45 p.m. - 6:00 p.m.
Closing Remarks

Author Information

Jay Pujara (University of Southern California)
Dor Arad (Stanford University)
Bhavana Dalvi Mishra (Allen Institute for Artificial Intelligence)
Tim Rocktäschel (University of Oxford)

Tim Rocktäschel is a Research Scientist at Facebook AI Research (FAIR) London and a Lecturer in the Department of Computer Science at University College London (UCL). At UCL, he is a member of the UCL Centre for Artificial Intelligence and the UCL Natural Language Processing group. Prior to that, he was a Postdoctoral Researcher in the Whiteson Research Lab, a Stipendiary Lecturer in Computer Science at Hertford College, and a Junior Research Fellow in Computer Science at Jesus College, at the University of Oxford. Tim obtained his Ph.D. in the Machine Reading group at University College London under the supervision of Sebastian Riedel. He received a Google Ph.D. Fellowship in Natural Language Processing in 2017 and a Microsoft Research Ph.D. Scholarship in 2013. In Summer 2015, he worked as a Research Intern at Google DeepMind. In 2012, he obtained his Diploma (equivalent to M.Sc) in Computer Science from the Humboldt-Universität zu Berlin. Between 2010 and 2012, he worked as Student Assistant and in 2013 as Research Assistant in the Knowledge Management in Bioinformatics group at Humboldt-Universität zu Berlin. Tim's research focuses on sample-efficient and interpretable machine learning models that learn from world, domain, and commonsense knowledge in symbolic and textual form. His work is at the intersection of deep learning, reinforcement learning, natural language processing, program synthesis, and formal logic.

More from the Same Authors