Timezone: »
Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.
Author Information
Emily Reif (Google)
Ann Yuan (Google)
Martin Wattenberg (Google)
Fernanda Viégas and Martin Wattenberg co-lead Google’s PAIR (People+AI Research) initiative, part of Google Brain. Their work in machine learning focuses on transparency and interpretability, as part of a broad agenda to improve human/AI interaction. They are well known for their contributions to social and collaborative visualization, and the systems they’ve created are used daily by millions of people. Their visualization-based artwork has been exhibited worldwide, and is part of the permanent collection of Museum of Modern Art in New York.
Fernanda Viegas (Google)
Andy Coenen (Google)
Adam Pearce (Google)
Been Kim (Google)
More from the Same Authors
-
2021 : SynthBio: A Case Study in Faster Curation of Text Datasets »
Ann Yuan · Daphne Ippolito · Vitaly Nikolaev · Chris Callison-Burch · Andy Coenen · Sebastian Gehrmann -
2021 : Interpretability of Machine Learning in Computer Systems: Analyzing a Caching Model »
Leon Sixt · Evan Liu · Marie Pellat · James Wexler · Milad Hashemi · Been Kim · Martin Maas -
2020 Poster: Debugging Tests for Model Explanations »
Julius Adebayo · Michael Muelly · Ilaria Liccardi · Been Kim -
2020 : Q&A for invited speaker, Fernanda Viégas »
Fernanda Viegas -
2020 : Communicating imperfection »
Fernanda Viegas -
2020 Poster: Evaluating Attribution for Graph Neural Networks »
Benjamin Sanchez-Lengeling · Jennifer Wei · Brian Lee · Emily Reif · Peter Wang · Wesley Qian · Kevin McCloskey · Lucy Colwell · Alexander Wiltschko -
2020 Poster: On Completeness-aware Concept-Based Explanations in Deep Neural Networks »
Chih-Kuan Yeh · Been Kim · Sercan Arik · Chun-Liang Li · Tomas Pfister · Pradeep Ravikumar -
2019 Poster: Towards Automatic Concept-based Explanations »
Amirata Ghorbani · James Wexler · James Zou · Been Kim -
2019 Poster: A Benchmark for Interpretability Methods in Deep Neural Networks »
Sara Hooker · Dumitru Erhan · Pieter-Jan Kindermans · Been Kim -
2018 : Interpretability for when NOT to use machine learning by Been Kim »
Been Kim -
2018 Poster: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Spotlight: Human-in-the-Loop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale Doshi-Velez -
2018 Poster: Sanity Checks for Saliency Maps »
Julius Adebayo · Justin Gilmer · Michael Muelly · Ian Goodfellow · Moritz Hardt · Been Kim -
2018 Spotlight: Sanity Checks for Saliency Maps »
Julius Adebayo · Justin Gilmer · Michael Muelly · Ian Goodfellow · Moritz Hardt · Been Kim -
2018 Poster: To Trust Or Not To Trust A Classifier »
Heinrich Jiang · Been Kim · Melody Guan · Maya Gupta -
2018 Tutorial: Visualization for Machine Learning »
Fernanda Viégas · Martin Wattenberg