Two for the Price of One: Integrating Large Language Models to Learn Biophysical Interactions

#1 Two for the Price of One: Integrating Large Language Models to Learn Biophysical Interactions [PDF] [Copy] [Kimi¹] [REL]

Authors: Joseph D. Clark, Tanner J. Dean, Diwakar Shukla

Deep learning models have become fundamental tools in drug design. In particular, large language models trained on biochemical sequences learn feature vectors that guide drug discovery through virtual screening. However, such models do not capture the molecular interactions important for binding affinity and specificity. Therefore, there is a need to 'compose' representations from distinct biological modalities to effectively represent molecular complexes. We present an overview of the methods to combine molecular representations and propose that future work should balance computational efficiency and expressiveness. Specifically, we argue that improvements in both speed and accuracy are possible by learning to merge the representations from internal layers of domain specific biological language models. We demonstrate that 'composing' biochemical language models performs similar or better than standard methods representing molecular interactions despite having significantly fewer features. Finally, we discuss recent methods for interpreting and democratizing large language models that could aid the development of interaction aware foundation models for biology, as well as their shortcomings.

Subjects: Biomolecules , Quantitative Methods

Publish: 2025-03-26 22:05:53 UTC

2503.21017

#1 Two for the Price of One: Integrating Large Language Models to Learn Biophysical Interactions [PDF] [Copy] [Kimi1] [REL]

#1 Two for the Price of One: Integrating Large Language Models to Learn Biophysical Interactions [PDF] [Copy] [Kimi¹] [REL]