Poster
in
Workshop: Lock-LLM Workshop: Prevent Unauthorized Knowledge Use from Large Language Models - Deep Dive into Un-Distillate, Un-Finetunable, Un-Compressible, Un-Editable, and Un-Usable

Un-Distillable LLMs via Entropy-Perturbed Logits

Mithil Shah · Andrew Bae · Laksh Patel

Project Page [ OpenReview]

Abstract

Large Language Models (LLMs) are vulnerable to distillation attacks, where adversaries replicate a proprietary model's knowledge into a smaller student model, leading to intellectual property theft and weakened security guarantees. We address this challenge by introducing \emph{provably un-distillable LLMs} through entropy-based obfuscation of output logits. We derive information-theoretic lower bounds on the error floor of any student model trained on obfuscated outputs, showing that distillation loss scales at least quadratically with the obfuscation strength. Experiments confirm the theory: empirical student loss exceeds the derived bounds, validating the feasibility of secure and un-distillable architectures. This work establishes the first provable foundations for resisting unauthorized distillation in LLMs.

Chat is not available.