Question Answering (QA) in its various flavors has made notable strides in recent years thanks in part to the
availability of public datasets and leaderboards. Large datasets are not representative of many real world scenarios of
interest; this is especially true for industry data and specialized field data. Small datasets cannot be used to train
QA systems from scratch: domain adaptation techniques are required. In this proposal, we use the term domain adaptation
broadly, to cover techniques that leverage out-of-domain data, or in-domain data that does not match the task at
hand.
The workshop is intended to highlight innovative approaches that have the potential to yield significant
improvement in QA scenarios where limited labeled data is available and to promote the development and use of real-world
datasets for domain adaptation.
Topics of interest include established and emerging approaches that have notable potential to substantially impact domain adaptation for QA. Notable examples are: adversarial training, automatic augmentation of a training set, unsupervised transfer learning, joint learning of QA and question generation, multi-task learning, domain-specific knowledge graphs, and using large models with few-shot learning.
The invited talks represent both the industry and the academic perspective: industry has pressing needs for techniques that address the small amount of labeled data that one can expect from customers; academia is leading the path towards innovative breakthroughs that can quickly advance the field. In addition to the invited talks, we will present a case study on a publicly available, IBM created QA dataset.
For more information about this and other IBM events at NeurIPS, follow this link.
Sun 3:00 p.m. - 3:10 p.m.
|
Introduction
This is a brief introduction - about 1 - 2 minutes. We will have a zoom session for questions on the workshop running for the 5 minutes. |
Vittorio Castelli 🔗 |
Sun 3:05 p.m. - 3:45 p.m.
|
Deploying Conversational Question Answering systems in new domains
(
Presentation
)
SlidesLive Video » Conversational Question Answering (ConvQA) systems trained over a dataset about popular characters in Wikipedia (QuAC) have attained impressive performance. Still, widespread adoption of such systems require cost-effective domain and language adaptation. In this talk I will review our experience deploying such systems in new domains. First I will show that fine-tuning a pre-trained ConvQA system on a single FAQ domain yields high-quality systems in other FAQ domains. Second, we will show that a small dataset in Basque suffices to obtain comparable performance. Third, we will also present strong results on COVID-related scientific literature. Finally I will present a technique that allows to improve the performance in new domains after deployment, using user feedback only, and no supervised in-domain training. All in all, our research seems to indicate that ConvQA is ready for cost-effective deployment in new domains. |
Eneko Agirre 🔗 |
Sun 3:45 p.m. - 3:50 p.m.
|
Q/A: Deploying Conversational Question Answering in new domains
(
Q/A live session
)
|
🔗 |
Sun 3:50 p.m. - 4:30 p.m.
|
Question Answering, an IBM Perspective
(
Presentation
)
In this talk we present our work on reading comprehension using the various RC datasets: SQuAD, Natural Questions, Multilingual QA, and TyDi (10 typologically diverse languages). We also discuss domain adaptation for question answering systems and introduce a new leaderboard for domain adaptation called Tech_QA. |
Salim Roukos 🔗 |
Sun 4:30 p.m. - 4:35 p.m.
|
Q/A Question Answering, an IBM Perspective
(
Q/A live session
)
|
🔗 |
Sun 4:35 p.m. - 5:15 p.m.
|
An introduction to transfer learning in NLP and HuggingFace
(
Presentation
)
SlidesLive Video » In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by HuggingFace, in particular our Transformers, Tokenizers and Datasets libraries and our models. Thomas Wolf is co-founder and Chief Science Officer of HuggingFace. His team is on a mission to advance and democratize NLP for everyone. Prior to HuggingFace, Thomas gained a Ph.D. in quantum physics, and later a law degree. He worked as a European Patent Attorney for 5years. https://thomwolf.io/ About HuggingFace: HuggingFace is doing open-research and open-source in the field of NLP, creating popular open-source platforms for NLP developers and researchers to use, build, and study state-of-the-art natural language processing technologies including text classification, information extraction, summarization, text generation, and conversational artificial intelligence. https://huggingface.co/ . |
Thomas Wolf 🔗 |
Sun 5:15 p.m. - 5:20 p.m.
|
Q/A: An introduction to transfer learning in NLP and HuggingFace
(
Q/A live session
)
|
🔗 |
Sun 5:20 p.m. - 6:00 p.m.
|
Multi-Stage Transfer Learning for Technical Support Domain Question Answering
(
Presentation
)
SlidesLive Video » This talk presents our research work on three transfer learning approaches Including domain-specific LM pre-training with vocabulary extension, QA model Pre-training with synthetic examples, and QA model pre-training with labeled data augmentation. These approaches are applied incrementally on the Roberta large LM and adapt it to the TechQA task in technical support domain. Substantial improvements of QA performance are observed from leaderboard testing result. |
Rong Zhang 🔗 |
Sun 6:00 p.m. - 6:05 p.m.
|
Q/A: Multi-Stage Transfer Learning for Technical Support Domain Question Answering
(
Q/A live session
)
|
🔗 |
Sun 6:05 p.m. - 6:10 p.m.
|
Concluding Remarks
(
Presentation
)
Short conclusions, we will be in the live Zoom session to answer questions until the end of the workshop. |
Vittorio Castelli 🔗 |