Skip to yearly menu bar Skip to main content


Poster

Synthetic Programming Elicitation and Repair for Text-to-Code in Very Low-Resource Programming Languages

Federico Mora · Justin Wong · Haley Lepe · Sahil Bhatia · Karim Elmaaroufi · George Varghese · Joseph Gonzalez · Elizabeth Polgreen · Sanjit Seshia


Abstract:

Recent advances in large language models (LLMs) for code applicationshave demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair.Unsurprisingly, models struggle to even compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings including domain-specific languages for internal to tools and tool-chains and legacy languages.Inspired by program elicitation, we propose establishing a hallucinated library within a high-resource language which can be automatically compiled to the VLPL. This library enables the LLM to generate code and self-repair within the syntax of a familiar language.Specifically, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs.We empirically evaluate the performance of SPEAC in a case study and find that,compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently without sacrificing semantic correctness.

Live content is unavailable. Log in and register to view live content