Quantitative Biology

Date: Thu, 9 May 2024 | Total: 17

#1 What doesn't kill Gaia makes her stronger [PDF] [Copy] [Kimi]

Authors: Rudy Arthur ; Arwen E. Nicholson ; Nathan J. Mayne

Life on Earth has experienced numerous upheavals over its approximately 4 billion year history. In previous work we have discussed how interruptions to stability lead, on average, to increases in habitability over time, a tendency we called Entropic Gaia. Here we continue this exploration, working with the Tangled Nature Model of co-evolution, to understand how the evolutionary history of life is shaped by periods of acute environmental stress. We find that while these periods of stress pose a risk of complete extinction, they also create opportunities for evolutionary exploration which would otherwise be impossible, leading to more populous and stable states among the survivors than in alternative histories without a stress period. We also study how the duration, repetition and number of refugia into which life escapes during the perturbation affects the final outcome. The model results are discussed in relation to both Earth history and the search for alien life.

#2 Impact of phylogeny on the inference of functional sectors from protein sequence data [PDF] [Copy] [Kimi]

Authors: Nicola Dietler ; Alia Abbara ; Subham Choudhury ; Anne-Florence Bitbol

Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that natural selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.

#3 GP-MoLFormer: A Foundation Model For Molecular Generation [PDF5] [Copy] [Kimi2]

Authors: Jerret Ross ; Brian Belgodere ; Samuel C. Hoffman ; Vijil Chenthamarakshan ; Youssef Mroueh ; Payel Das

Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFormer, an autoregressive molecular string generator that is trained on more than 1.1B chemical SMILES. GP-MoLFormer uses a 46.8M parameter transformer decoder model with linear attention and rotary positional encodings as the base architecture. We explore the utility of GP-MoLFormer in generating novel, valid, and unique SMILES. Impressively, we find GP-MoLFormer is able to generate a significant fraction of novel, valid, and unique SMILES even when the number of generated molecules is in the 10 billion range and the reference set is over a billion. We also find strong memorization of training data in GP-MoLFormer generations, which has so far remained unexplored for chemical language models. Our analyses reveal that training data memorization and novelty in generations are impacted by the quality of the training data; duplication bias in training data can enhance memorization at the cost of lowering novelty. We evaluate GP-MoLFormer's utility and compare it with that of existing baselines on three different tasks: de novo generation, scaffold-constrained molecular decoration, and unconstrained property-guided optimization. While the first two are handled with no additional training, we propose a parameter-efficient fine-tuning method for the last task, which uses property-ordered molecular pairs as input. We call this new approach pair-tuning. Our results show GP-MoLFormer performs better or comparable with baselines across all three tasks, demonstrating its general utility.

#4 Gradient sensing limit of a cell when controlling the elongating direction [PDF] [Copy] [Kimi]

Authors: Kento Nakamura ; Tetsuya J. Kobayashi

Eukaryotic cells perform chemotaxis by determining the direction of chemical gradients based on stochastic sensing of concentrations at the cell surface. To examine the efficiency of this process, previous studies have investigated the limit of estimation accuracy for gradients. However, these studies assume that the cell shape and gradient are constant, and do not consider how adaptive regulation of cell shape affects the estimation limit. Dynamics of cell shape during gradient sensing is biologically ubiquitous and can influence the estimation by altering the way the concentration is measured, and cells may strategically regulate their shape to improve estimation accuracy. To address this gap, we investigate the estimation limits in dynamic situations where cells change shape adaptively depending on the sensed signal. We approach this problem by analyzing the stationary solution of the Bayesian nonlinear filtering equation. By applying diffusion approximation to the ligand-receptor binding process and the Laplace method for the posterior expectation under a high signal-to-noise ratio regime, we obtain an analytical expression for the estimation limit. This expression indicates that estimation accuracy can be improved by elongating perpendicular to the estimated direction, which is also confirmed by numerical simulations. Our analysis provides a basis for clarifying the interplay between estimation and control in gradient sensing and sheds light on how cells optimize their shape to enhance chemotactic efficiency.

#5 Quantifying Smooth Muscles Regional Organization in the Rat Bladder Using Immunohistochemistry, Multiphoton Microscopy and Machine Learning [PDF] [Copy] [Kimi]

Authors: Alireza Asadbeygi ; Yasutaka Tobe ; Naoki Yoshimura ; Sean D. Stocker ; Simon Watkins ; Paul Watton ; Anne M. Robertson

The smooth muscle bundles (SMBs) in the bladder act as contractile elements which enable the bladder to void effectively. In contrast to skeletal muscles, these bundles are not highly aligned, rather they are oriented more heterogeneously throughout the bladder wall. In this work, for the first time, this regional orientation of the SMBs is quantified across the whole bladder, without the need for optical clearing or cryosectioning. Immunohistochemistry staining was utilized to visualize smooth muscle cell actin in multiphoton microscopy (MPM) images of bladder smooth muscle bundles (SMBs). Feature vectors for each pixel were generated using a range of filters, including Gaussian blur, Gaussian gradient magnitude, Laplacian of Gaussian, Hessian eigenvalues, structure tensor eigenvalues, Gabor, and Sobel gradients. A Random Forest classifier was subsequently trained to automate the segmentation of SMBs in the MPM images. Finally, the orientation of SMBs in each bladder region was quantified using the CT-FIRE package. This information is essential for biomechanical models of the bladder that include contractile elements.

#6 The Canadian VirusSeq Data Portal & Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology [PDF] [Copy] [Kimi]

Authors: Erin E. Gill ; Baofeng Jia ; Carmen Lia Murall ; Raphaël Poujol ; Muhammad Zohaib Anwar ; Nithu Sara John ; Justin Richardsson ; Ashley Hobb ; Abayomi S. Olabode ; Alexandru Lepsa ; Ana T. Duggan ; Andrea D. Tyler ; Arnaud N'Guessan ; Atul Kachru ; Brandon Chan ; Catherine Yoshida ; Christina K. Yung ; David Bujold ; Dusan Andric ; Edmund Su ; Emma J. Griffiths ; Gary Van Domselaar ; Gordon W. Jolly ; Heather K. E. Ward ; Henrich Feher ; Jared Baker ; Jared T. Simpson ; Jaser Uddin ; Jiannis Ragoussis ; Jon Eubank ; Jörg H. Fritz ; José Héctor Gálvez ; Karen Fang ; Kim Cullion ; Leonardo Rivera ; Linda Xiang ; Matthew A. Croxen ; Mitchell Shiell ; Natalie Prystajecky ; Pierre-Olivier Quirion ; Rosita Bajari ; Samantha Rich ; Samira Mubareka ; Sandrine Moreira ; Scott Cain ; Steven G. Sutcliffe ; Susanne A. Kraemer ; Yann Joly ; Yelizar Alturmessov ; CPHLN consortium ; CanCOGeN consortium ; VirusSeq Data Portal Academic ; Health network ; Marc Fiume ; Terrance P. Snutch ; Cindy Bell ; Catalina Lopez-Correa ; Julie G. Hussin ; Jeffrey B. Joy ; Caroline Colijn ; Paul M. K. Gordon ; William W. L. Hsiao ; Art F. Y. Poon ; Natalie C. Knox ; Mélanie Courtot ; Lincoln Stein ; Sarah P. Otto ; Guillaume Bourque ; B. Jesse Shapiro ; Fiona S. L. Brinkman

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). The Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. Here we also highlight Duotang, a web platform that presents genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

#7 Lipid-mediated hydrophobic gating in the BK potassium channel [PDF] [Copy] [Kimi]

Authors: Lucia Coronel ; Giovanni Di Muccio ; Brad Rothberg ; Alberto Giacomello ; Vincenzo Carnevale

The large-conductance, calcium-activated potassium (BK) channel lacks the typical intracellular bundle-crossing gate present in most ion channels of the 6TM family. This observation, initially inferred from Ca$^{2+}$-free-pore accessibility experiments and recently corroborated by a CryoEM structure of the non-conductive state, raises a puzzling question: how can gating occur in absence of steric hindrance? To answer this question, we carried out molecular simulations and accurate free energy calculations to obtain a microscopic picture of the sequence of events that, starting from a Ca$^{2+}$-free state leads to ion conduction upon Ca$^{2+}$ binding. Our results highlight an unexpected role for annular lipids, which turn out to be an integral part of the gating machinery. Due to the presence of fenestrations, the "closed" Ca$^{2+}$-free pore can be occupied by the methyl groups from the lipid alkyl chains. This dynamic occupancy triggers and stabilizes the nucleation of a vapor bubble into the inner pore cavity, thus hindering ion conduction. By contrast, Ca$^{2+}$ binding results into a displacement of these lipids outside the inner cavity, lowering the hydrophobicity of this region and thus allowing for pore hydration and conduction. This lipid-mediated hydrophobic gating rationalizes several seemingly problematic experimental observations, including the state-dependent pore accessibility of blockers.

#8 Determining cell population size from cell fraction in cell plasticity models [PDF] [Copy] [Kimi]

Authors: Yuman Wang ; Shuli Chen ; Jie Hu ; Da Zhou

Quantifying the size of cell populations is crucial for understanding biological processes such as growth, injury repair, and disease progression. Often, experimental data offer information in the form of relative frequencies of distinct cell types, rather than absolute cell counts. This emphasizes the need to devise effective strategies for estimating absolute cell quantities from fraction data. In response to this challenge, we present two computational approaches grounded in stochastic cell population models: the first-order moment method (FOM) and the second-order moment method (SOM). These methods explicitly establish mathematical mappings from cell fraction to cell population size using moment equations of the stochastic models. Notably, our investigation demonstrates that the SOM method obviates the requirement for a priori knowledge of the initial population size, highlighting the utility of incorporating variance details from cell proportions. The robustness of both the FOM and SOM methods was analyzed from different perspectives. Additionally, we extended the application of the FOM and SOM methods to various biological mechanisms within the context of cell plasticity models. Our methodologies not only assist in mitigating the inherent limitations of experimental techniques when only fraction data is available for detecting cell population size, but they also offer new insights into utilizing the stochastic characteristics of cell population dynamics to quantify interactions between different biomasses within the system.

#9 Exploring a Cognitive Architecture for Learning Arithmetic Equations [PDF] [Copy] [Kimi1]

Author: Cole Gawin

The acquisition and performance of arithmetic skills and basic operations such as addition, subtraction, multiplication, and division are essential for daily functioning, and reflect complex cognitive processes. This paper explores the cognitive mechanisms powering arithmetic learning, presenting a neurobiologically plausible cognitive architecture that simulates the acquisition of these skills. I implement a number vectorization embedding network and an associative memory model to investigate how an intelligent system can learn and recall arithmetic equations in a manner analogous to the human brain. I perform experiments that provide insights into the generalization capabilities of connectionist models, neurological causes of dyscalculia, and the influence of network architecture on cognitive performance. Through this interdisciplinary investigation, I aim to contribute to ongoing research into the neural correlates of mathematical cognition in intelligent systems.

#10 Consciousness Driven Spike Timing Dependent Plasticity [PDF] [Copy] [Kimi]

Authors: Sushant Yadav ; Santosh Chaudhary ; Rajesh Kumar

Spiking Neural Networks (SNNs), recognized for their biological plausibility and energy efficiency, employ sparse and asynchronous spikes for communication. However, the training of SNNs encounters difficulties coming from non-differentiable activation functions and the movement of spike-based inter-layer data. Spike-Timing Dependent Plasticity (STDP), inspired by neurobiology, plays a crucial role in SNN's learning, but its still lacks the conscious part of the brain used for learning. Considering the issue, this research work proposes a Consciousness Driven STDP (CD-STDP), an improved solution addressing inherent limitations observed in conventional STDP models. CD-STDP, designed to infuse the conscious part as coefficients of long-term potentiation (LTP) and long-term depression (LTD), exhibit a dynamic nature. The model connects LTP and LTD coefficients to current and past state of synaptic activities, respectively, enhancing consciousness and adaptability. This consciousness empowers the model to effectively learn while understanding the input patterns. The conscious coefficient adjustment in response to current and past synaptic activity extends the model's conscious and other cognitive capabilities, offering a refined and efficient approach for real-world applications. Evaluations on MNIST, FashionMNIST and CALTECH datasets showcase $CD$-STDP's remarkable accuracy of 98.6%, 85.61% and 99.0%, respectively, in a single hidden layer SNN. In addition, analysis of conscious elements and consciousness of the proposed model on SNN is performed.

#11 Is artificial consciousness achievable? Lessons from the human brain [PDF1] [Copy] [Kimi1]

Authors: Michele Farisco ; Kathinka Evers ; Jean-Pierre Changeux

We here analyse the question of developing artificial consciousness from an evolutionary perspective, taking the evolution of the human brain and its relation with consciousness as a reference model. This kind of analysis reveals several structural and functional features of the human brain that appear to be key for reaching human-like complex conscious experience and that current research on Artificial Intelligence (AI) should take into account in its attempt to develop systems capable of conscious processing. We argue that, even if AI is limited in its ability to emulate human consciousness for both intrinsic (structural and architectural) and extrinsic (related to the current stage of scientific and technological knowledge) reasons, taking inspiration from those characteristics of the brain that make conscious processing possible and/or modulate it, is a potentially promising strategy towards developing conscious AI. Also, it is theoretically possible that AI research can develop partial or potentially alternative forms of consciousness that is qualitatively different from the human, and that may be either more or less sophisticated depending on the perspectives. Therefore, we recommend neuroscience-inspired caution in talking about artificial consciousness: since the use of the same word consciousness for humans and AI becomes ambiguous and potentially misleading, we propose to clearly specify what is common and what differs in AI conscious processing from full human conscious experience.

#12 Motion Capture Analysis of Verb and Adjective Types in Austrian Sign Language [PDF] [Copy] [Kimi2]

Authors: Julia Krebs ; Evie Malaia ; Ronnie B. Wilbur ; Isabella Fessl ; Hans-Peter Wiesinger ; Hermann Schwameder ; Dietmar Roehm

Across a number of sign languages, temporal and spatial characteristics of dominant hand articulation are used to express semantic and grammatical features. In this study of Austrian Sign Language (\"Osterreichische Geb\"ardensprache, or \"OGS), motion capture data of four Deaf signers is used to quantitatively characterize the kinematic parameters of sign production in verbs and adjectives. We investigate (1) the difference in production between verbs involving a natural endpoint (telic verbs; e.g. arrive) and verbs lacking an endpoint (atelic verbs; e.g. analyze), and (2) adjective signs in intensified vs. non-intensified (plain) forms. Motion capture data analysis using linear-mixed effects models (LME) indicates that both the endpoint marking in verbs, as well as marking of intensification in adjectives, are expressed by movement modulation in \"OGS. While the semantic distinction between verb types (telic/atelic) is marked by higher peak velocity and shorter duration for telic signs compared to atelic ones, the grammatical distinction (intensification) in adjectives is expressed by longer duration for intensified compared to non-intensified adjectives. The observed individual differences of signers might be interpreted as personal signing style.

#13 Stochastic spatial Lotka-Volterra predator-prey models [PDF] [Copy] [Kimi]

Author: Uwe C. Täuber

Stochastic, spatially extended models for predator-prey interaction display spatio-temporal structures that are not captured by the Lotka-Volterra mean-field rate equations. These spreading activity fronts reflect persistent correlations between predators and prey that can be analyzed through field-theoretic methods. Introducing local restrictions on the prey population induces a predator extinction threshold, with the critical dynamics at this continuous active-to-absorbing state transition governed by the scaling exponents of directed percolation. Novel features in biologically motivated model variants include the stabilizing effect of a periodically varying carrying capacity that describes seasonally oscillating resource availability; enhanced mean species densities and local fluctuations caused by spatially varying reaction rates; and intriguing evolutionary dynamics emerging when variable interaction rates are affixed to individuals combined with trait inheritance to their offspring. The basic susceptible-infected-susceptible and susceptible-infected-recovered models for infectious disease spreading near their epidemic thresholds are respectively captured by the directed and dynamic isotropic percolation universality classes. Systems with three cyclically competing species akin to spatial rock-paper-scissors games may display striking spiral patterns, yet conservation laws can prevent such noise-induced structure formation. In diffusively coupled inhomogeneous settings, one may observe the stabilization of vulnerable ecologies prone to finite-size extinction or fixation due to immigration waves emanating from the interfaces.

#14 Predicting the binding of small molecules to proteins through invariant representation of the molecular structure [PDF2] [Copy] [Kimi1]

Authors: R. Beccaria ; A. Lazzeri ; G. Tiana

We present a computational scheme for predicting the ligands that bind to a pocket of known structure. It is based on the generation of a general abstract representation of the molecules, which is invariant to rotations, translations and permutations of atoms, and has some degree of isometry with the space of conformations. We use these representations to train a non-deep machine learning algorithm to classify the binding between pockets and molecule pairs, and show that this approach has a better generalization capability than existing methods.

#15 Clustering and spatial distribution of mitochondria in dendritic trees [PDF] [Copy] [Kimi]

Authors: Mario Hidalgo-Soria ; Elena F. Koslover

Neuronal dendrites form densely branched tree architectures through which mitochondria must be distributed to supply the cell's energetic needs. Dendritic mitochondria circulate through the tree, undergoing fusion and fission to form clusters of varying sizes. We present a mathematical model for the distribution of such actively-driven particles in a branched geometry. Our model demonstrates that `balanced' trees (wherein cross-sectional area is conserved across junctions and thicker branches support more bushy subtrees) enable symmetric yet distally enriched particle distributions and promote dispersion into smaller clusters. These results highlight the importance of tree architecture and radius-dependent fusion in governing the distribution of neuronal mitochondria.

#16 ACEGEN: Reinforcement learning of generative chemical agents for drug discovery [PDF2] [Copy] [Kimi2]

Authors: Albert Bou ; Morgan Thomas ; Sebastian Dittert ; Carles Navarro Ramírez ; Maciej Majewski ; Ye Wang ; Shivam Patel ; Gary Tresadern ; Mazen Ahmad ; Vincent Moens ; Woody Sherman ; Simone Sciabola ; Gianni De Fabritiis

In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capability, flexibility, and reliability remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern decision-making library that offers efficient and thoroughly tested reusable components. ACEGEN provides a robust, flexible, and efficient platform for molecular design. We validate its effectiveness by benchmarking it across various algorithms and conducting multiple drug discovery case studies. ACEGEN is accessible at https://github.com/acellera/acegen-open.

#17 Metabolism, information, and viability in a simulated physically-plausible protocell [PDF] [Copy] [Kimi]

Authors: Kristoffer R. Thomsen ; Artemy Kolchinsky ; Steen Rasmussen

Critical experimental design issues connecting energy transduction and inheritable information within a protocell are explored and elucidated. The protocell design utilizes a photo-driven energy transducer (a ruthenium complex) to turn resource molecules into building blocks, in a manner that is modulated by a combinatorial DNA-based co-factor. This co-factor molecule serves as part of an electron relay for the energy transduction mechanism, where the charge-transport rates depend on the sequence that contains an oxo-guanine. The co-factor also acts as a store of inheritable information due to its ability to replicate non-enzymatically through template-directed ligation. Together, the energy transducer and the co-factor act as a metabolic catalyst that produces co-factor DNA building blocks as well as fatty acids (from picolinium ester and modified DNA oligomers), where the fatty acids self-assemble into vesicles on which exterior surface both the co-factor (DNA) and the energy transducer are anchored with hydrophobic tails. Here we use simulations to study how the co-factor sequence determines its fitness as reflected by charge transfer and replication rates. To estimate the impact on the protocell, we compare these rates with previously measured metabolic rates from a similar system where the charge transfer is directly between the ruthenium complex and the oxo-guanine (without DNA replication and charge transport). Replication and charge transport turn out to have different and often opposing sequence requirements. Functional information of the co-factor molecules is used to probe the feasibility of randomly picking co-factor sequences from a limited population of co-factors molecules, where a good co-factor can enhance both metabolic biomass production and its own replication rate.