Skip to yearly menu bar Skip to main content


Poster

A Polar Coordinate System Explicitly Represents Syntax in Language Models

Pablo J. Diego Simon · Stéphane d'Ascoli · Emmanuel Chemla · Yair Lakretz · Jean-Remi King

East Exhibit Hall A-C #3811
[ ]
Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

First formalized with symbolic representations, syntactic trees have recently been shown to be partially represented in the activations of language models: by optimizing a “Structural Probe”, one can find a linear read-out of the activations such that syntactically related words are close together. However, the distance between contextualized word embeddings provided by such Structural Probe can only represent the presence – but not the type – of syntactic relations. Here, we hypothesize that the types of syntactic relations between a pair of words can be read from the relative orientation of their contextualized embeddings. To test this hypothesis, we introduce a “Polar Probe” and optimize it with a contrastive objective encouraging all pairs of words sharing the same syntactic relation to point in a consistent and specific direction. We demonstrate three main results. First, the Polar Probe successfully recovers the type of syntactic relations, and substantially outperforms the Structural Probe by nearly two folds. Second, we confirm that this polar coordinate system exists in a variety of trained language models like BERT, Mistral and Llama-2, and is best read-out from their intermediate layers. Third, we demonstrate with a new linguistic benchmark that similar syntactic relations occurring at different levels of the syntactic tree have similar orientations. Overall, this work shows that there exists a polar coordinate system which represents syntax as a labeled directed acyclic graph in language models, where the presence and type of syntactic relations are represented by distances and angles, respectively. This syntactic code bridges the historical gap between the symbolic representations of linguistics and the vectorial representations of neural networks.

Live content is unavailable. Log in and register to view live content