Poster
in
Workshop: Constrained Optimization for Machine Learning

A Constrained Optimization Perspective of Unrolled Transformers

Javier Porras-Valenzuela ⋅ Samar Hadou ⋅ Alejandro Ribeiro

2025 Poster
in
Workshop: Constrained Optimization for Machine Learning

Project Page [ Poster] [ OpenReview]

Abstract

This work introduces a constrained perspective of training transformers that behave like optimization descent algorithms. To this end, we impose layerwise descent constraints on the objective function and train with a primal-dual algorithm instead of empirical risk minimization (ERM). This method produces models that monotonically descend in expectation along the layers. We apply our method to both existing transformer-based unrollings and conventional pretrained transformers in tasks of video denoising and language classification. The experimental evidence indicates that our method yield models that are more robust to perturbations.

Chat is not available.