2602.02425

Total: 1

#1 Repurposing Protein Language Models for Latent Flow-Based Fitness Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Amaru Caceres Arroyo, Lea Bogensperger, Ahmed Allam, Michael Krauthammer, Konrad Schindler, Dominik Narnhofer

Protein fitness optimization is challenged by a vast combinatorial landscape where high-fitness variants are extremely sparse. Many current methods either underperform or require computationally expensive gradient-based sampling. We present CHASE, a framework that repurposes the evolutionary knowledge of pretrained protein language models by compressing their embeddings into a compact latent space. By training a conditional flow-matching model with classifier-free guidance, we enable the direct generation of high-fitness variants without predictor-based guidance during the ODE sampling steps. CHASE achieves state-of-the-art performance on AAV and GFP protein design benchmarks. Finally, we show that bootstrapping with synthetic data can further enhance performance in data-constrained settings.

Subjects: Machine Learning , Quantitative Methods

Publish: 2026-02-02 18:25:33 UTC