2024.05.10.592927

Total: 1

#1 LucaOne: Generalized Biological Foundation Model with Unified Nucleic Acid and Protein Language [PDF] [Copy] [Kimi] [REL]

Authors: Yong He, Pan Fang, Yongtao Shan, Yuanfei Pan, Yanhong Wei, Yichang Chen, Yihao Chen, Yi Liu, Zhenyu Zeng, Zhan Zhou, Feng Zhu, Edward C. Holmes, Jieping Ye, Jun Li, Yuelong Shu, Mang Shi, Zhaorong Li

In recent years, significant advancements have been observed in the domain of Natural Language Processing(NLP) with the introduction of pre-trained foundational models, paving the way for utilizing similar AI technologies to interpret the language of biology. In this research, we introduce “LucaOne”, a novel pre-trained foundational model designed to integratively learn from the genetic and proteomic languages, encapsulating data from 169,861 species en-compassing DNA, RNA, and proteins. This work illuminates the potential for creating a biological language model aimed at universal bioinformatics appli-cation. Remarkably, through few-shot learning, this model efficiently learns the central dogma of molecular biology and demonstrably outperforms com-peting models. Furthermore, in tasks requiring inputs of DNA, RNA, proteins, or a combination thereof, LucaOne exceeds the state-of-the-art performance using a streamlined downstream architecture, thereby providing empirical ev-idence and innovative perspectives on the potential of foundational models to comprehend complex biological systems.

Subject: Bioinformatics

Publish: 2024-05-14