Ndapa Nakashole: Generalizing Representations of Language for Documents Analysis across Different Domains
in
Workshop: Document Intelligence
Abstract
Abstract. Labeled data for tasks such as information extraction, question answering, text classification, and other types of document analysis are often drawn from a limited set of document types and genres because of availability, and cost. At test time, we would like to apply the trained models to different document types and genres. However, a model trained on one dataset often fails to generalize to data drawn from distributions other than that of the training data. In this talk, I will talk about our work on generalizing representations of language, and discuss some of the document types we are studying. Biography: Ndapa Nakashole is an Assistant Professor at the University of California, San Diego, where she teaches and carries out research on Statistical Natural Language Processing. Before that she was postdoctoral scholar at Carnegie Mellon University. She obtained her PhD from Saarland University and the Max Planck Institute for Informatics. She completed undergraduate studies in Computer Science at the University of Cape Town, South Africa.