Timezone: »
Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position embedding for extrapolation by kernelizing positional differences. We achieve this goal using conditionally positive definite (CPD) kernels, a class of functions known for generalizing distance metrics. To maintain the inner product interpretation of self-attention, we show that a CPD kernel can be transformed into a PD kernel by adding a constant offset. This offset is implicitly absorbed in the Softmax normalization during self-attention. The diversity of CPD kernels allows us to derive various RPEs that enable length extrapolation in a principled way. Experiments demonstrate that the logarithmic variant achieves excellent extrapolation performance on three large language modeling datasets. Our implementation and pretrained checkpoints are released at~\url{https://github.com/chijames/KERPLE.git}.
Author Information
Ta-Chung Chi (Carnegie Mellon University)
I am a 4th-year PhD student at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University advised by professor Alexander I. Rudnicky. My research interests lie in the field of dialogue system and related NLP topics.
Ting-Han Fan (Princeton University)
Peter J. Ramadge (Princeton)
Alexander Rudnicky (Carnegie Mellon University)
Alexander I. Rudnicky is Professor Emeritus in the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. Dr. Rudnicky's research has spanned many aspects of spoken language, including language modeling, spoken language system architectures, multi-modal interaction, and the analysis of conversational structure. Dr. Rudnicky and his students developed the PocketSphinx recognition system and the Ravenclaw dialog manager. More recently, Dr. Rudnicky has been active in research on open-domain conversational systems. Dr. Rudnicky interests in learning include induction of concepts and task structure from conversation, and the design of intelligent systems that proactively seek to acquire knowledge from people.
More from the Same Authors
-
2020 : Invited Talk 9 Presentation - Alexander Rudnicky - Creating socialbots with human-like conversational abilities »
Alexander Rudnicky -
2021 Spotlight: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2021 : ProBF: Probabilistic Safety Certificates with Barrier Functions »
Sulin Liu · Athindran Ramesh Kumar · Jaime Fisac · Ryan Adams · Peter J. Ramadge -
2022 Poster: Learning Physics Constrained Dynamics Using Autoencoders »
Tsung-Yen Yang · Justinian Rosca · Karthik Narasimhan · Peter J. Ramadge -
2021 Poster: Safe Reinforcement Learning with Natural Language Constraints »
Tsung-Yen Yang · Michael Y Hu · Yinlam Chow · Peter J. Ramadge · Karthik Narasimhan -
2020 : Panel »
Maxine Eskenazi · Ankur Parikh · Govindarajan Thattai · Alexander Rudnicky · Jason E Weston -
2020 : Invited Talk 9 Q/A - Alexander Rudnicky »
Alexander Rudnicky -
2020 Poster: Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters »
Sulin Liu · Xingyuan Sun · Peter J. Ramadge · Ryan Adams -
2018 : Conversational moves and blended conversation »
Alexander Rudnicky -
2017 : Competition III: The Conversational Intelligence Challenge »
Mikhail Burtsev · Ryan Lowe · Iulian Vlad Serban · Yoshua Bengio · Alexander Rudnicky · Alan W Black · Shrimai Prabhumoye · Artem Rodichev · Nikita Smetanin · Denis Fedorenko · CheongAn Lee · EUNMI HONG · Hwaran Lee · Geonmin Kim · Nicolas Gontier · Atsushi Saito · Andrey Gershfeld · Artem Burachenok -
2015 Poster: A Reduced-Dimension fMRI Shared Response Model »
Cameron Po-Hsuan Chen · Janice Chen · Yaara Yeshurun · Uri Hasson · James Haxby · Peter J. Ramadge -
2015 Oral: A Reduced-Dimension fMRI Shared Response Model »
Cameron Po-Hsuan Chen · Janice Chen · Yaara Yeshurun · Uri Hasson · James Haxby · Peter J. Ramadge -
2012 Poster: Kernel Hyperalignment »
Alexander Lorbert · Peter J. Ramadge -
2012 Spotlight: Kernel Hyperalignment »
Alexander Lorbert · Peter J. Ramadge -
2011 Poster: Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries »
Zhen James Xiang · Hao Xu · Peter J. Ramadge -
2011 Oral: Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries »
Zhen James Xiang · Hao Xu · Peter J. Ramadge -
2009 Poster: Boosting with Spatial Regularization »
Zhen James Xiang · Yongxin Xi · Uri Hasson · Peter J. Ramadge -
2009 Spotlight: Boosting with Spatial Regularization »
Zhen James Xiang · Yongxin Xi · Uri Hasson · Peter J. Ramadge -
2009 Poster: fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity »
Bryan Conroy · Ben Singer · James Haxby · Peter J. Ramadge