Quantitative Biology

2025-02-14 | | Total: 15

#1 Small Molecule Drug Discovery Through Deep Learning:Progress, Challenges, and Opportunities [PDF1] [Copy] [Kimi] [REL]

Authors: Kun Li, Yida Xiong, Hongzhi Zhang, Xiantao Cai, Bo Du, Wenbin Hu

Due to their excellent drug-like and pharmacokinetic properties, small molecule drugs are widely used to treat various diseases, making them a critical component of drug discovery. In recent years, with the rapid development of deep learning (DL) techniques, DL-based small molecule drug discovery methods have achieved excellent performance in prediction accuracy, speed, and complex molecular relationship modeling compared to traditional machine learning approaches. These advancements enhance drug screening efficiency and optimization, and they provide more precise and effective solutions for various drug discovery tasks. Contributing to this field's development, this paper aims to systematically summarize and generalize the recent key tasks and representative techniques in DL-based small molecule drug discovery in recent years. Specifically, we provide an overview of the major tasks in small molecule drug discovery and their interrelationships. Next, we analyze the six core tasks, summarizing the related methods, commonly used datasets, and technological development trends. Finally, we discuss key challenges, such as interpretability and out-of-distribution generalization, and offer our insights into future research directions for DL-assisted small molecule drug discovery.

Subjects: Machine Learning , Biomolecules

Publish: 2025-02-13 05:24:52 UTC


#2 DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [PDF] [Copy] [Kimi1] [REL]

Authors: Montgomery Bohde, Mrunali Manjrekar, Runzhong Wang, Shuiwang Ji, Connor W. Coley

Mass spectrometry plays a fundamental role in elucidating the structures of unknown molecules and subsequent scientific discoveries. One formulation of the structure elucidation task is the conditional $\textit{de novo}$ generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. The encoder utilizes a transformer architecture and models mass spectra domain knowledge such as peak formulae and neutral losses, and the decoder is a discrete graph diffusion model restricted by the heavy-atom composition of a known chemical formula. To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs, which are available in virtually infinite quantities, compared to structure-spectrum pairs that number in the tens of thousands. Extensive experiments on established benchmarks show that DiffMS outperforms existing models on $\textit{de novo}$ molecule generation. We provide several ablations to demonstrate the effectiveness of our diffusion and pretraining approaches and show consistent performance scaling with increasing pretraining dataset size. DiffMS code is publicly available at https://github.com/coleygroup/DiffMS.

Subjects: Machine Learning , Quantitative Methods

Publish: 2025-02-13 18:29:48 UTC


#3 Interpreting and Steering Protein Language Models through Sparse Autoencoders [PDF] [Copy] [Kimi] [REL]

Authors: Edith Natalia Villegas Garcia, Alessio Ansuini

The rapid advancements in transformer-based language models have revolutionized natural language processing, yet understanding the internal mechanisms of these models remains a significant challenge. This paper explores the application of sparse autoencoders (SAE) to interpret the internal representations of protein language models, specifically focusing on the ESM-2 8M parameter model. By performing a statistical analysis on each latent component's relevance to distinct protein annotations, we identify potential interpretations linked to various protein characteristics, including transmembrane regions, binding sites, and specialized motifs. We then leverage these insights to guide sequence generation, shortlisting the relevant latent components that can steer the model towards desired targets such as zinc finger domains. This work contributes to the emerging field of mechanistic interpretability in biological sequence models, offering new perspectives on model steering for sequence design.

Subjects: Machine Learning , Biomolecules

Publish: 2025-02-13 10:11:36 UTC


#4 Global Stabilization of Chemostats with Nonzero Mortality and Substrate Dynamics [PDF] [Copy] [Kimi] [REL]

Authors: Iasson Karafyllis, Epiphane Loko, Miroslav Krstic, Antoine Chaillet

In "chemostat"-type population models that incorporate substrate (nutrient) dynamics, the dependence of the birth (or growth) rate on the substrate concentration introduces nonlinear coupling that creates a challenge for stabilization that is global, namely, for all positive concentrations of the biomass and nutrients. This challenge for global stabilization has been overcome in the literature using relatively simple feedback when natural mortality of the biomass is absent. However, under natural mortality, it takes fortified, more complex feedback, outside of the existing nonlinear control design toolbox, to avoid biomass extinction from nutrient-depleted initial conditions. Such fortified feedback, the associated control Laypunov function design, and Lyapunov analysis of global stability are provided in this paper. We achieve global stabilization for two different chemostat models: (i) a lumped model, with two state variables, and (ii) a three-state model derived from an age-structured infinite-dimensional model. The proposed feedback stabilizers are explicit, applicable to both the lumped and the age-structured models, and coincide with simple feedback laws proposed in the literature when the mortality rate is zero. Global stabilization means subject to constraints: all positive biomass and nutrient concentrations are within the region of attraction of the desired equilibrium, and, additionally, this is achieved with a dilution input that is guaranteed to remain positive. For the lumped case with Haldane kinetics, we show that the reproduction rate dominating the mortality (excluding the reproduction and mortality being in balance) is not only sufficient but also necessary for global stabilization. The obtained results are illustrated with simple examples.

Subjects: Optimization and Control , Systems and Control , Populations and Evolution

Publish: 2025-02-13 13:25:06 UTC


#5 Drivers of cooperation in social dilemmas on higher-order networks [PDF] [Copy] [Kimi] [REL]

Authors: Onkar Sadekar, Andrea Civilini, Vito Latora, Federico Battiston

Understanding cooperation in social dilemmas requires models that capture the complexity of real-world interactions. While network frameworks have provided valuable insights to model the evolution of cooperation, they are unable to encode group interactions properly. Here, we introduce a general higher-order network framework for multi-player games on structured populations. Our model considers multi-dimensional strategies, based on the observation that social behaviours are affected by the size of the group interaction. We investigate dynamical and structural coupling between different orders of interactions, revealing the crucial role of nested multilevel interactions, and showing how such features can enhance cooperation beyond the limit of traditional models with uni-dimensional strategies. Our work identifies the key drivers promoting cooperative behaviour commonly observed in real-world group social dilemmas.

Subjects: Physics and Society , Computer Science and Game Theory , Social and Information Networks , Populations and Evolution

Publish: 2025-02-13 16:15:45 UTC


#6 Extinction and Metastability of Pheromone-Roads in Stochastic Models for Foraging Walks of Ants [PDF] [Copy] [Kimi] [REL]

Authors: Saori Morimoto, Makoto Katori, Hiraku Nishimori

Macroscopic changes of group behavior of eusocial insects are studied from the viewpoint of non-equilibrium phase transitions. Recent combined study of experiments and mathematical modeling by the group led by the third author suggests that a species of garden ant switches the individual foraging walk from pheromone-mediated to visual-cues-mediated depending on situation. If an initial pheromone-road between the nest and food sources is a detour, ants using visual cues can pioneer shorter paths. These shorter paths are reinforced by pheromone secreted by following ants, and then the detour ceases to exist. Once the old pheromone-road extincts, there will be almost no chance to reconstruct it. Hence the extinction of pheromone-road is expected to be regarded as a phase transition to an absorbing state. We propose a discrete-time model on a square lattice consisting of switching random walks interacting though time-dependent pheromone field. The numerical study shows that the critical phenomena of the present extinction transitions of pheromone-roads do not seem to belong to the directed percolation universality class associated with the usual absorbing-state transition. The new aspects are cased by the coexistence and competition with newly creating pheromone-roads. In a regime in the extinction phase, the annihilating road shows metastability and takes long time-period to be replaced by a new road.

Subjects: Statistical Mechanics , Adaptation and Self-Organizing Systems , Populations and Evolution

Publish: 2025-02-13 16:39:07 UTC


#7 Spatial Transcriptomics Iterative Hierarchical Clustering (stIHC): A Novel Method for Identifying Spatial Gene Co-Expression Modules [PDF] [Copy] [Kimi] [REL]

Authors: Catherine Higgins, Jingyi Jessica Li, Michelle Carey

Recent advancements in spatial transcriptomics technologies allow researchers to simultaneously measure RNA expression levels for hundreds to thousands of genes while preserving spatial information within tissues, providing critical insights into spatial gene expression patterns, tissue organization, and gene functionality. However, existing methods for clustering spatially variable genes (SVGs) into co-expression modules often fail to detect rare or unique spatial expression patterns. To address this, we present spatial transcriptomics iterative hierarchical clustering (stIHC), a novel method for clustering SVGs into co-expression modules, representing groups of genes with shared spatial expression patterns. Through three simulations and applications to spatial transcriptomics datasets from technologies such as 10x Visium, 10x Xenium, and Spatial Transcriptomics, stIHC outperforms clustering approaches used by popular SVG detection methods, including SPARK, SPARK-X, MERINGUE, and SpatialDE. Gene Ontology enrichment analysis confirms that genes within each module share consistent biological functions, supporting the functional relevance of spatial co-expression. Robust across technologies with varying gene numbers and spatial resolution, stIHC provides a powerful tool for decoding the spatial organization of gene expression and the functional structure of complex tissues.

Subjects: Methodology , Genomics

Publish: 2025-02-13 18:31:24 UTC


#8 Neuronal Correlates of Semantic Event Classes during Presentation of Complex Naturalistic Stimuli: Anatomical Patterns, Context-Sensitivity, and Potential Impact on shared Human-Robot Ontologies [PDF] [Copy] [Kimi] [REL]

Authors: Florian Ahrens, Mihai Pomarlan, Daniel Beßler, Michael Beetz, Manfred Herrmann

The present study forms part of a research project that aims to develop cognition-enabled robotic agents with environmental interaction capabilities close to human proficiency. This approach is based on human-derived neuronal data in combination with a shared ontology to enable robots to learn from human experiences. To gain further insight into the relation between human neuronal activity patterns and ontological classes, we introduced General Linear Model (GLM) analyses on fMRI data of participants who were presented with complex naturalistic video stimuli comparable to the robot tasks. We modeled four event classes (pick, place, fetch and deliver) attached to different environmental and object-related context and employed a Representational Similarity Analysis (RSA) on associated brain activity patterns as a starting point for an automatic hierarchical clustering. Based on the default values for the Hemodynamic Response Function (HRF), the activity patterns were reliably grouped according to their parent classes of object interaction and navigation. Although fetch and deliver events were also distinguished by neuronal patterns, pick and place events demonstrated higher ambiguity with respect to neuronal activation patterns. Introducing a shorter HRF time-to-peak leads to a more reliable grouping of all four semantic classes, despite contextual factors. These data might give novel insights into the neuronal representation of complex stimuli and may enable further research in ontology validation in cognition-enabled robotics.

Subject: Neurons and Cognition

Publish: 2025-02-12 18:36:30 UTC


#9 Orthology and Near-Cographs in the Context of Phylogenetic Networks [PDF] [Copy] [Kimi] [REL]

Authors: Anna Lindeberg, Guillaume E. Scholz, Nicolas Wieseke, Marc Hellmuth

Orthologous genes, which arise through speciation, play a key role in comparative genomics and functional inference. In particular, graph-based methods allow for the inference of orthology estimates without prior knowledge of the underlying gene or species trees. This results in orthology graphs, where each vertex represents a gene, and an edge exists between two vertices if the corresponding genes are estimated to be orthologs. Orthology graphs inferred under a tree-like evolutionary model must be cographs. However, real-world data often deviate from this property, either due to noise in the data, errors in inference methods or, simply, because evolution follows a network-like rather than a tree-like process. The latter, in particular, raises the question of whether and how orthology graphs can be derived from or, equivalently, are explained by phylogenetic networks. Here, we study the constraints imposed on orthology graphs when the underlying evolutionary history follows a phylogenetic network instead of a tree. We show that any orthology graph can be represented by a sufficiently complex level-k network. However, such networks lack biologically meaningful constraints. In contrast, level-1 networks provide a simpler explanation, and we establish characterizations for level-1 explainable orthology graphs, i.e., those derived from level-1 evolutionary histories. To this end, we employ modular decomposition, a classical technique for studying graph structures. Specifically, an arbitrary graph is level-1 explainable if and only if each primitive subgraph is a near-cograph (a graph in which the removal of a single vertex results in a cograph). Additionally, we present a linear-time algorithm to recognize level-1 explainable orthology graphs and to construct a level-1 network that explains them, if such a network exists.

Subjects: Populations and Evolution , Discrete Mathematics , Combinatorics

Publish: 2025-02-12 19:36:38 UTC


#10 Persistent Sheaf Laplacian Analysis of Protein Flexibility [PDF] [Copy] [Kimi] [REL]

Authors: Nicole Hayes, Xiaoqi Wei, Hongsong Feng, Ekaterina Merkurjev, Guo-Wei Wei

Protein flexibility, measured by the B-factor or Debye-Waller factor, is essential for protein functions such as structural support, enzyme activity, cellular communication, and molecular transport. Theoretical analysis and prediction of protein flexibility are crucial for protein design, engineering, and drug discovery. In this work, we introduce the persistent sheaf Laplacian (PSL), an effective tool in topological data analysis, to model and analyze protein flexibility. By representing the local topology and geometry of protein atoms through the multiscale harmonic and non-harmonic spectra of PSLs, the proposed model effectively captures protein flexibility and provides accurate, robust predictions of protein B-factors. Our PSL model demonstrates an increase in accuracy of 32% compared to the classical Gaussian network model (GNM) in predicting B-factors for a dataset of 364 proteins. Additionally, we construct a blind machine learning prediction method utilizing global and local protein features. Extensive computations and comparisons validate the effectiveness of the proposed PSL model for B-factor predictions.

Subjects: Biomolecules , Quantitative Methods

Publish: 2025-02-12 20:26:24 UTC


#11 $^{18}$F-FDG brain PET hypometabolism in post-SARS-CoV-2 infection: substrate for persistent/delayed disorders? [PDF] [Copy] [Kimi] [REL]

Authors: Eric Guedj, Matthieu Million, Pierre Dudouet, Hervé Tissot-Dupont, Fabienne Bregeon, Serge Cammilleri, Didier Raoult

Purpose: Several brain complications of SARS-CoV-2 infection have been reported. It has been moreover speculated that this neurotropism could potentially cause a delayed outbreak of neuropsychiatric and neurodegenerative diseases of neuroinflammatory origin. A propagation mechanism has been proposed across the cribriform plate of the ethmoid bone, from the nose to the olfactory epithelium, and possibly afterward to other limbic structures, and deeper parts of the brain including the brainstem. Methods: Review of clinical examination, and whole-brain voxel-based analysis of $^{18}$F-FDG PET metabolism in comparison with healthy subjects (p voxel<0.001, p-cluster<0.05, uncorrected), of two patients with confirmed diagnosis of SARS-CoV-2 explored at the post-viral stage of the disease. Results: Hypometabolism of the olfactory/rectus gyrus was found on the two patients, especially one with 4-week prolonged anosmia. Additional hypometabolisms were found within amygdala, hippocampus, parahippocampus, cingulate cortex, pre-/post-central gyrus, thalamus/hypothalamus, cerebellum, pons, and medulla in the other patient who complained of delayed onset of a painful syndrome. Conclusion: These preliminary findings reinforce the hypotheses of SARS-CoV-2 neurotropism through the olfactory bulb and the possible extension of this impairment to other brain structures. $^{18}$F-FDG PET hypometabolism could constitute a cerebral quantitative biomarker of this involvement. Post-viral cohort studies are required to specify the exact relationship between such hypometabolisms and the possible persistent disorders, especially involving cognitive or emotion disturbances, residual respiratory symptoms, or painful complaints.

Subject: Neurons and Cognition

Publish: 2025-02-13 08:50:03 UTC


#12 Data Sharing in the PRIMED Consortium: Design, implementation, and recommendations for future policymaking [PDF] [Copy] [Kimi] [REL]

Authors: Johanna L. Smith, Quenna Wong, Whitney Hornsby, Matthew P. Conomos, Benjamin D. Heavner, Iftikhar J. Kullo, Bruce M. Psaty, Stephen S. Rich, Bamidele Tayo, Pradeep Natarajan, Sara C. Nelson, Polygenic Risk Methods in Diverse Populations, Consortium Data Sharing Working Group, Polygenic Risk Methods in Diverse Populations, Consortium

Sharing diverse genomic and other biomedical datasets is critical to advance scientific discoveries and their equitable translation to improve human health. However, data sharing remains challenging in the context of legacy datasets, evolving policies, multi-institutional consortium science, and international stakeholders. The NIH-funded Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium was established to improve the performance of polygenic risk estimates for a broad range of health and disease outcomes with global impacts. Improving polygenic risk score performance across genetically diverse populations requires access to large, diverse cohorts. We report on the design and implementation of data sharing policies and procedures developed in PRIMED to aggregate and analyze data from multiple, heterogeneous sources while adhering to existing data sharing policies for each integrated dataset. We describe two primary data sharing mechanisms: coordinated dbGaP applications and a Consortium Data Sharing Agreement, as well as provide alternatives when individual-level data cannot be shared within the Consortium (e.g., federated analyses). We also describe technical implementation of Consortium data sharing in the NHGRI Analysis Visualization and Informatics Lab-space (AnVIL) cloud platform, to share derived individual-level data, genomic summary results, and methods workflows with appropriate permissions. As a Consortium making secondary use of pre-existing data sources, we also discuss challenges and propose solutions for release of individual- and summary-level data products to the broader scientific community. We make recommendations for ongoing and future policymaking with the goal of informing future consortia and other research activities.

Subject: Other Quantitative Biology

Publish: 2025-02-12 17:31:39 UTC


#13 Trajectory Inference for Single Cell Omics [PDF] [Copy] [Kimi] [REL]

Authors: Alexandre Hutton, Jesse G. Meyer

Trajectory inference is used to order single-cell omics data along a path that reflects a continuous transition between cells. This approach is useful for studying processes like cell differentiation, where a stem cell matures into a specialized cell type, or investigating state changes in pathological conditions. In the current article, we provide a general introduction to trajectory inference, explaining the concepts and assumptions underlying the different methods. We then briefly discuss the strengths and weaknesses of different trajectory inference methods. We also describe best practices for using trajectory inference, such as how to validate the results and how to interpret them in the context of biological knowledge. Finally, the article will discuss some of the applications of trajectory inference in single-cell omics research. These applications include studying cell differentiation, development, and disease. We provide examples of how trajectory inference has been used to gain new insights into these processes.

Subjects: Quantitative Methods , Genomics , Molecular Networks

Publish: 2025-02-13 14:19:33 UTC


#14 Inverse problems with experiment-guided AlphaFold [PDF] [Copy] [Kimi] [REL]

Authors: Advaith Maddipatla, Nadav Bojan Sellam, Meital Bojan, Sanketh Vedula, Paul Schanda, Ailie Marx, Alex M. Bronstein

Proteins exist as a dynamic ensemble of multiple conformations, and these motions are often crucial for their functions. However, current structure prediction methods predominantly yield a single conformation, overlooking the conformational heterogeneity revealed by diverse experimental modalities. Here, we present a framework for building experiment-grounded protein structure generative models that infer conformational ensembles consistent with measured experimental data. The key idea is to treat state-of-the-art protein structure predictors (e.g., AlphaFold3) as sequence-conditioned structural priors, and cast ensemble modeling as posterior inference of protein structures given experimental measurements. Through extensive real-data experiments, we demonstrate the generality of our method to incorporate a variety of experimental measurements. In particular, our framework uncovers previously unmodeled conformational heterogeneity from crystallographic densities, and generates high-accuracy NMR ensembles orders of magnitude faster than the status quo. Notably, we demonstrate that our ensembles outperform AlphaFold3 and sometimes better fit experimental data than publicly deposited structures to the Protein Data Bank (PDB). We believe that this approach will unlock building predictive models that fully embrace experimentally observed conformational diversity.

Subject: Biomolecules

Publish: 2025-02-13 14:38:53 UTC


#15 Conditional success of adaptive therapy: the role of treatment-pausing thresholds revealed by mathematical modeling [PDF] [Copy] [Kimi] [REL]

Authors: Lanfei Sun, Haifeng Zhang, Kai Kang, Xiaoxin Wang, Leyi Zhang, Yanan Cai, Changjing Zhuge, Lei Zhang

Adaptive therapy (AT) improves cancer treatment by controlling the competition between sensitive and resistant cells via treatment holidays. This study highlights the critical role of treatment-pausing thresholds in AT for tumors composed of drug-sensitive and resistant cells. Using a Lotka-Volterra model, the research compares AT with maximum tolerated dose therapy and intermittent therapy, showing that AT's success largely depends on the threshold at which treatment is paused and resumed, as well as the competition between sensitive and resistant cells. Three scenarios of comparison between AT and others are identified: uniform-decline, conditional-improve, and uniform-improve, illustrating that optimizing the treatment-pausing threshold is crucial for AT effectiveness. Tumor composition, including initial tumor burden and the proportion of resistant cells, influences outcomes. Adjusting threshold values enables AT to suppress resistant subclones, preserving sensitive cells, ultimately improving progression-free survival. These findings underscore the importance of personalized strategies for cancer management and enhancing long-term therapeutic outcomes.

Subject: Other Quantitative Biology

Publish: 2025-02-13 15:09:28 UTC