Timezone: »

 
Workshop
Document Intelligence
Nigel Duffy · Rama Akkiraju · Tania Bedrax Weiss · Paul Bennett · Hamid Reza Motahari-Nezhad

Sat Dec 14 08:00 AM -- 06:00 PM (PST) @ West 208 + 209
Event URL: https://sites.google.com/view/di2019 »

Business documents are central to the operation of business. Such documents include sales agreements, vendor contracts, mortgage terms, loan applications, purchase orders, invoices, financial statements, employment agreements and a wide many more. The information in such business documents is presented in natural language, and can be organized in a variety of ways from straight text, multi-column formats, and a wide variety of tables. Understanding these documents is made challenging due to inconsistent formats, poor quality scans and OCR, internal cross references, and complex document structure. Furthermore, these documents often reflect complex legal agreements and reference, explicitly or implicitly, regulations, legislation, case law and standard business practices.
The ability to read, understand and interpret business documents, collectively referred to here as “Document Intelligence”, is a critical and challenging application of artificial intelligence (AI) in business. While a variety of research has advanced the fundamentals of document understanding, the majority have focused on documents found on the web which fail to capture the complexity of analysis and types of understanding needed across business documents. Realizing the vision of document intelligence remains a research challenge that requires a multi-disciplinary perspective spanning not only natural language processing and understanding, but also computer vision, knowledge representation and reasoning, information retrieval, and more -- all of which have been profoundly impacted and advanced by neural network-based approaches and deep learning in the last few years.
We propose to organize a workshop for AI researchers, academics and industry practitioners to discuss the opportunities and challenges for document intelligence.

Sat 8:00 a.m. - 8:10 a.m. [iCal]
Opening Remarks (Discussion)
Sat 8:10 a.m. - 9:05 a.m. [iCal]

Abstract: In December 2006, a change to the US Federal Rules of Civil Procedure made “electronically stored information” – effectively every bit of storage in an enterprise – fair game for discovery requests in civil litigation. The result was a multi-billion dollar electronic discovery industry, a remarkable embrace by lawyers and judges of the artifacts of experimental machine learning (learning curves, effectiveness estimates, active learning,...), and a torrent of technical challenges for machine learning, natural language processing, information retrieval, and statistics. I will discuss the state of e-discovery science and technology, and its spread to new applications such as internal investigation and breach response. Biography: David D. Lewis, Ph.D. is Chief Data Scientist at Brainspace, a Cyxtera business, where he leads their research efforts as well as the machine learning software development team. Prior to joining Brainspace, he was variously a freelance consultant, corporate researcher (Bell Labs, AT&T Labs), research professor, and software company co-founder. Dave has published more than 40 peer-reviewed scientific publications and 9 patents. He was elected a Fellow of the American Association for Advancement of Science in 2006 for foundational work in text categorization, and won a Test of Time Award from ACM SIGIR in 2017 for his paper w/ Gale introducing uncertainty sampling

Dave Lewis
Sat 9:05 a.m. - 10:00 a.m. [iCal]

Abstract. Labeled data for tasks such as information extraction, question answering, text classification, and other types of document analysis are often drawn from a limited set of document types and genres because of availability, and cost. At test time, we would like to apply the trained models to different document types and genres. However, a model trained on one dataset often fails to generalize to data drawn from distributions other than that of the training data. In this talk, I will talk about our work on generalizing representations of language, and discuss some of the document types we are studying. Biography: Ndapa Nakashole is an Assistant Professor at the University of California, San Diego, where she teaches and carries out research on Statistical Natural Language Processing. Before that she was postdoctoral scholar at Carnegie Mellon University. She obtained her PhD from Saarland University and the Max Planck Institute for Informatics. She completed undergraduate studies in Computer Science at the University of Cape Town, South Africa.

Ndapa Nakashole
Sat 10:00 a.m. - 10:30 a.m. [iCal]
Coffee Break (Break)
Sat 10:30 a.m. - 12:05 p.m. [iCal]

Papers presented as re follows: “Repurposing Decoder-Transformer Language Models for Abstractive Summarization” by Luke de Oliveira, and Alfredo Láinez Rodrigo. “From Stroke to Finite Automata: An Offline Recognition Approach”, by Kehinde Aruleba. “Post-OCR parsing: building simple and robust parser via BIO tagging”, by Wonseok Hwang, Seonghyeon Kim, Minjoon Seo, Jinyeong Yim, Seunghyun Park, Sungrae Park, Junyeop Lee, Bado Lee, Hwalsuk Lee. “CrossLang: the system of cross-lingual plagiarism detection”, by Oleg Bakhteev, Alexandr Ogaltsov, Andrey Khazov, Kamil Safin, Rita Kuznetsova. “BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding”, Timo I. Denk, Christian Reisswig. “Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection” by Christian Reisswig, Anoop R Katti, Marco Spinaci, Johannes Höhne. “SVDocNet: Spatially Variant U-Net for Blind Document Deblurring”, by Bharat Mamidibathula, Prabir Kumar Biswas. “Semantic Structure Extraction for Spreadsheet Tables with a Multi-task Learning Architecture", by Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, Dongmei Zhang. “Document Enhancement System Using Auto-encoders", by Mehrdad J. Gangeh, Sunil R. Tiyyagura, Sridhar V. Dasaratha, Hamid Motahari, Nigel P. Duffy. “CORD: A Consolidated Receipt Dataset for Post-OCR Parsing”, by Seunghyun Park, Seung Shin, Bado Lee, Junyeop Lee, Jaeheung Surh, Minjoon Seo, Hwalsuk Lee. “On recognition of Cyrillic Text”, by Kostiantyn Liepieshov, Oles Dobosevych. “Representation Learning in Geology and GilBERT”, by Zikri Bayraktar, Hedi Driss, Marie Lefranc. “Neural Contract Element Extraction Revisited”, by Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos. “Doc2Dial: a Framework for Dialogue Composition Grounded in Business Documents”, by Song Feng, Kshitij Fadni, Q. Vera Liao, Luis A. Lastras. “On Domain Transfer When Predicting Intent in Text”, by Petar Stojanov, Ahmed Hassan Awadallah, Paul Bennett, Saghar Hosseini. “BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding", by Emad Elwany, Dave Moore, Gaurav Oberoi. “Information Extraction from Text Regions with Complex Structure”, by Kaixuan Zhang, Zejiang Shen, Jie Zhou, Melissa Dell. “DeepErase: Unsupervised Ink Artifact Removal in Document Text Images”, Yike Qi, W. Ronny Huang, Qianqian Li, Jonathan L. Degange. “Towards Neural Similarity Evaluator” , Hassan Kané, Yusuf Kocyigit, Pelkins Ajanoh, Ali Abdalla, Mohamed Coulibali.

Sat 12:05 p.m. - 1:30 p.m. [iCal]

Papers presented as re follows: “Repurposing Decoder-Transformer Language Models for Abstractive Summarization” by Luke de Oliveira, and Alfredo Láinez Rodrigo. “From Stroke to Finite Automata: An Offline Recognition Approach”, by Kehinde Aruleba. “Post-OCR parsing: building simple and robust parser via BIO tagging”, by Wonseok Hwang, Seonghyeon Kim, Minjoon Seo, Jinyeong Yim, Seunghyun Park, Sungrae Park, Junyeop Lee, Bado Lee, Hwalsuk Lee. “CrossLang: the system of cross-lingual plagiarism detection”, by Oleg Bakhteev, Alexandr Ogaltsov, Andrey Khazov, Kamil Safin, Rita Kuznetsova. “BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding”, Timo I. Denk, Christian Reisswig. “Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection” by Christian Reisswig, Anoop R Katti, Marco Spinaci, Johannes Höhne. “SVDocNet: Spatially Variant U-Net for Blind Document Deblurring”, by Bharat Mamidibathula, Prabir Kumar Biswas. “Semantic Structure Extraction for Spreadsheet Tables with a Multi-task Learning Architecture", by Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, Dongmei Zhang. “Document Enhancement System Using Auto-encoders", by Mehrdad J. Gangeh, Sunil R. Tiyyagura, Sridhar V. Dasaratha, Hamid Motahari, Nigel P. Duffy. “CORD: A Consolidated Receipt Dataset for Post-OCR Parsing”, by Seunghyun Park, Seung Shin, Bado Lee, Junyeop Lee, Jaeheung Surh, Minjoon Seo, Hwalsuk Lee. “On recognition of Cyrillic Text”, by Kostiantyn Liepieshov, Oles Dobosevych. “Representation Learning in Geology and GilBERT”, by Zikri Bayraktar, Hedi Driss, Marie Lefranc. “Neural Contract Element Extraction Revisited”, by Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos. “Doc2Dial: a Framework for Dialogue Composition Grounded in Business Documents”, by Song Feng, Kshitij Fadni, Q. Vera Liao, Luis A. Lastras. “On Domain Transfer When Predicting Intent in Text”, by Petar Stojanov, Ahmed Hassan Awadallah, Paul Bennett, Saghar Hosseini. “BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding", by Emad Elwany, Dave Moore, Gaurav Oberoi. “Information Extraction from Text Regions with Complex Structure”, by Kaixuan Zhang, Zejiang Shen, Jie Zhou, Melissa Dell. “DeepErase: Unsupervised Ink Artifact Removal in Document Text Images”, Yike Qi, W. Ronny Huang, Qianqian Li, Jonathan L. Degange. “Towards Neural Similarity Evaluator” , Hassan Kané, Yusuf Kocyigit, Pelkins Ajanoh, Ali Abdalla, Mohamed Coulibali.

Timo I. Denk, Ion Androutsopoulos, Oleg Bakhteev, Hassan Kane, Petar Stojanov, Seunghyun Park, Bharat Mamidibathula, Kostiantyn Liepieshov, Johannes Höhne, Song Feng, Zikri Bayraktar, Kehinde Aruleba, ALEKSANDR OGALTSOV, Rita Kuznetsova, Paul Bennett, , Kshtij Fadnis, Luis Lastras, Mehrdad Jabbarzadeh Gangeh, Christian Reisswig, Emad Elwany, Ilias Chalkidis, Jonathan DeGange, Kaixuan Zhang, Luke de Oliveira, Muhammed Koçyiğit, Haoyu Dong, Vera Liao, Wonseok Hwang
Sat 1:30 p.m. - 2:30 p.m. [iCal]

Abstract: Enterprise applications and Business processes rely heavily on experts and knowledge workers reading, searching and analyzing business documents to perform their daily tasks. For instance, legal professionals read contracts to identify non-standard clauses, risks and exposures. Loan officers analyze borrower business documents to understand income, expense and contractual commitments before making lending decisions. Document Intelligence is the ability for a system to read, understand and interpret business documents through the application of AI-based technologies. It has the potential to significantly improve an employee's productivity and an organization's effectiveness by augmenting the expert in their daily task. Several challenges arise in this context such as variability in document authoring, necessity to contextually understand textual and tabular content and organization/role-specific variations in semantic interpretations. Furthermore, as experts rely on document intelligence, they expect the system to exhibit key properties such as explainability, consistent model evolution and ability to enhance the system's knowledge with a few examples. In this talk, using real-world enterprise application examples, I first describe how document intelligence can play a key role in augmenting enterprise AI applications. I then outline key challenges that arise in business document understanding and desiderata that enterprise AI applications and users expect. I conclude with a set of open research challenges that need to be tackled spanning across language understanding, knowledge representation and reasoning, deep learning and systems research. Biography: Rajasekar Krishnamurthy is a Principal Research Staff Member and Senior Manager leading the Watson Discovery team in the Watson AI organization. Prior to this role, he was a Principal Research Staff Member at IBM Research - Almaden leading the NLP, Entity Resolution and Discovery department. Rajasekar's technical interests focus around helping enterprises derive business insights from a variety of unstructured content sources ranging from public and third-party data sources to governing business documents within an enterprise. Rajasekar has expertise in building scalable and usable analytics tools for individual stages in analyzing unstructured documents, such as text analytics, document structure analysis and entity resolution. He is a member of the IBM Academy of Technology. He received a B.Tech in Computer Science and Engineering from the Indian Institute of Technology-Madras, and a Ph.D.in Computer Science from the University of Wisconsin-Madison.

Rajasekar Krishnamurthy
Sat 2:30 p.m. - 3:30 p.m. [iCal]

Abstract: Automatic text generation enables computers to summarize text, describe pictures to visually impaired, write stories or articles about an event, have conversations in customer-service, chit-chat with individuals, and other settings, and customize content based on the characteristics and goal of the human interlocutor. Neural text generation (NLG) – using neural network models to generate coherent text – have seen a paradigm shift in the last years, caused by the advances in deep contextual language modeling (e.g., LSTMs, GPT, GPT2) and transfer learning (e.g., ELMo, BERT). While these tools have dramatically improved the state of NLG, particularly for low resources tasks, state-of-the-art NLG models still face many challenges: a lack of diversity in generated text, commonsense violations in depicted situations, difficulties in making use of factual information, and difficulties in designing reliable evaluation metrics. In this talk I will discuss existing work on text only transformers that specifies how to generate long-text with better discourse structure and narrative flow, generate multi-document summaries, build automatic knowledge graphs with commonsense transformers as text generators. I will conclude the talk with a discussion of current challenges and shortcomings of neural text generation, pointing to avenues for future research. Biography: Asli Celikyilmaz is a Principal Researcher at Microsoft Research in Redmond, Washington. She is also an Affiliate Professor at the University of Washington. Her research interests are mainly in deep learning and natural language, specifically on language generation with long-term coherence, language understanding, language grounding with vision, and building intelligent agents for human-computer interaction She has received several “best of” awards including NAFIPS 2007, Semantic Computing 2009, and CVPR 2019.

Asli Celikyilmaz
Sat 3:30 p.m. - 4:00 p.m. [iCal]
Coffee Break (Break)
Sat 4:00 p.m. - 5:00 p.m. [iCal]
Discussion: Document Intelligence Research Challenges & Directions (Discussion)
Sat 5:00 p.m. - 5:30 p.m. [iCal]
Best Paper Talk: BERTGrid Contextualized Embedding for 2D Document Representation and Understanding (Talk)
Sat 5:30 p.m. - 5:45 p.m. [iCal]
Summary of Workshop and Closing Remarks (Discussion)

Author Information

Nigel Duffy (EY)
Rama Akkiraju (IBM Research - Almaden)
Tania Bedrax Weiss (Google)
Paul Bennett (Microsoft Research)
Hamid Reza Motahari-Nezhad (EY AI Lab, USA)