Hypercube-Constrained Graph Learning for Protein Fitness with Dynamic Laplacian Regularization
Muhammad Daud · Xavier Cadet · Philippe Charton · Cédric Damour · Jingbo Wang · Frederic CADET
Abstract
Predicting protein fitness from sparse, noisy assays is a constraint optimization challenge, requiring models to satisfy both data-fit and sequence space constraints. EHCube4P addresses this by representing the binary mutational landscape of $2^K$ sequences as a Hamming graph $H(k,2)$ and treating fitness as a graph signal. Experimental noise is suppressed via Daubechies-3 wavelet denoising, enforcing locality-preserving sparsity, while a two-layer graph convolutional network (GCN) and multi-layer perceptron (MLP) perform semi-supervised regression. A dynamic Laplacian-based smoothness regularization term complements the regression loss, jointly optimizing predictive accuracy and topological consistency. This balances bias–variance trade-offs and constrains predictions to conform to mutational adjacency. Applied to the Tobacco 5-Epi-Aristolochene Synthase (TEAS) landscape (512 mutants, 419 measured), EHCube4P reconstructs rugged fitness surfaces with high $R^2$ scores. Ablation studies confirm each constraint’s role in improving generalization. The framework integrates graph signal processing, noise-aware feature extraction, and constrained message passing, offering a scalable, principled approach for protein engineering under strict optimization constraints
Successful Page Load