Timezone: »
Unsupervised translation generally refers to the challenging task of translating between two languages without parallel translations, i.e., from two separate monolingual corpora. In this work, we propose an information-theoretic framework of unsupervised translation that can be well suited even for the case where the source language is that of highly intelligent animals, such as whales, and the target language is a human language, such as English. We identify two conditions that combined allow for unsupervised translation: (1) there is access to an prior distribution over the target language that estimates the likelihood that a sentence was translated from the source language; and (2) most alterations of translations are deemed implausible (i.e., unlikely) by the prior. We then give an (inefficient) algorithm which, given access to the prior and enough unlabeled source examples as input, outputs a provably accurate translation function. Surprisingly, our analysis suggests that the amount of source data required (information theoretically) for unsupervised translation is not significantly greater than that of supervised translation, i.e., the standard case where one has parallel translated data for training. To support the viability of our theory, we propose a simplified probabilistic model of language: the random sub-tree language model, in which sentences correspond to paths in a randomly-labeled tree. We prove that random sub-tree languages satisfy conditions (1-2) with high probability, and are therefore translatable by our algorithm.Our theory is motivated by a recent initiative to translate whale communication using modern machine translation techniques. The recordings of whale communications that are being collected have no parallel human-language data. We are further motivated by recent empirical work, reported in the machine learning literature, demonstrating that unsupervised translation is possible in certain settings.
Author Information
Shafi Goldwasser (University of California - Berkeley)
David Gruber (Project CETI)
Adam Kalai (Microsoft Research New England (-(-_(-_-)_-)-))
Orr Paradise (University of California, Berkeley)
More from the Same Authors
-
2021 : Programming Puzzles »
Tal Schuster · Ashwin Kalyan · Alex Polozov · Adam Kalai -
2021 Spotlight: Towards optimally abstaining from prediction with OOD test examples »
Adam Kalai · Varun Kanade -
2022 : A Theory of Unsupervised Translation for Understanding Animal Communication »
Shafi Goldwasser · David Gruber · Adam Tauman Kalai · Orr Paradise -
2022 : Language Models Can Teach Themselves to Program Better »
Patrick Haluptzok · Matthew Bowers · Adam Kalai -
2022 : Adversarial poisoning attacks on reinforcement learning-driven energy pricing »
Sam Gunn · Doseok Jang · Orr Paradise · Lucas Spangher · Costas J Spanos -
2023 Poster: A Theory of Unsupervised Translation Motivated by Understanding Animal Communication »
Shafi Goldwasser · David Gruber · Adam Tauman Kalai · Orr Paradise -
2022 Spotlight: Are GANs overkill for NLP? »
David Alvarez-Melis · Vikas Garg · Adam Kalai -
2022 Poster: Are GANs overkill for NLP? »
David Alvarez-Melis · Vikas Garg · Adam Kalai -
2022 Poster: Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms »
Surbhi Goel · Sham Kakade · Adam Kalai · Cyril Zhang -
2022 Poster: Uni[MASK]: Unified Inference in Sequential Decision Problems »
Micah Carroll · Orr Paradise · Jessy Lin · Raluca Georgescu · Mingfei Sun · David Bignell · Stephanie Milani · Katja Hofmann · Matthew Hausknecht · Anca Dragan · Sam Devlin -
2021 : Programming Puzzles »
Tal Schuster · Ashwin Kalyan · Alex Polozov · Adam Kalai -
2021 Poster: Towards optimally abstaining from prediction with OOD test examples »
Adam Kalai · Varun Kanade -
2018 Poster: Supervising Unsupervised Learning »
Vikas Garg · Adam Kalai -
2018 Spotlight: Supervising Unsupervised Learning »
Vikas Garg · Adam Kalai -
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir -
2009 Poster: Potential-Based Agnostic Boosting »
Adam Kalai · Varun Kanade