Decoder-as-Policy: Head-Only PPO Fine-Tuning of a Spike-Transformer for Low-Error Kinematic Decoding
Fengge Liang · Cong Wang · Shiqian Shen
Abstract
Spike-token transformers such as POYO achieve strong across-session decoding, yet purely supervised training can overweight variance alignment (explained variance) relative to the pointwise accuracy needed for closed-loop BCI control. We treat the decoder’s velocity head as a Gaussian policy and fine-tune it head-only: a behavior-cloning (BC) warm start followed by on-policy PPO on a control-aligned reward (negative MSE plus a small entropy bonus and an optional variance-calibration term), while keeping the POYO encoder frozen. On \textit{NLB'21 mc\_maze\_medium}, extended BC (1--2k steps) followed by PPO reveals a broad Pareto window with very low error and high explained variance, best $R^2=0.9975$ and MSE$=0.0023$ at the same validation checkpoint (step~1900), with predictive scale ($\sigma\approx0.993$). On a separate Perich\_Miller dataset trained with 400 step, the POYO+ achieved $R^2\approx0.87$ (MSE$\approx0.34$) after PPO fine tuning. We provide leakage safeguards, ablations, and reproducible configs.
Chat is not available.
Successful Page Load