PLOS Computational Biology: New Articles

FKSUDDAPre: A drug–disease association prediction framework based on F-TEST feature selection and AMDKSU resampling with interpretability analysis

2026-02-05T14:00:00Z

by Yun Zuo, Chenyi Zhang, Ge Hua, Qiao Ning, Xiangrong Liu, Xiangxiang Zeng, Zhaohong Deng

In drug discovery and therapeutic research, the prediction of drug-disease associations (DDAs) holds significant scientific and clinical value. Drug molecules exert their effects by precisely identifying disease-related biological targets, systematically modulating the entire pharmacological process from absorption, distribution, and metabolism to final efficacy. Accurate prediction of drug-disease associations not only facilitates an in-depth understanding of molecular mechanisms of drug action but also provides critical theoretical foundations for drug repositioning and personalized medicine. While traditional prediction methods based on in vitro experiments and clinical statistics yield reliable results, they suffer from inherent drawbacks such as long development cycles, substantial resource consumption, and low throughput. In contrast, emerging machine learning techniques offer a promising solution to these bottlenecks, enabling the intelligent and efficient discovery of potential drug–disease association networks and significantly improving drug development efficiency. However, it is noteworthy that existing machine learning methods still face significant challenges in practical applications: the complexity of feature construction raises the threshold for data processing; data sparsity constrains the depth of information mining; and the pervasive issue of sample imbalance poses a severe challenge to the model’s predictive accuracy and generalization performance. In this study, we developed an efficient and accurate framework for drug-disease association prediction named FKSUDDAPre. The model employs a multi-modal feature fusion strategy: on one hand, it leverages an ensemble of Mol2vec and K- BERT to deeply capture the semantic features of drug molecular fingerprints; on the other hand, it integrates Medical Subject Headings (MeSH) with DeepWalk to effectively reduce the dimensionality of disease features while preserving their relational structure. To address the class imbalance problem, FKSUDDAPre designed an optimization algorithm called AMDKSU, which combined clustering with an improved distance metric strategy, significantly enhancing the discriminative power of the sample set. For data processing, F-test was employed for feature importance ranking, effectively reducing data dimensionality and improving model generalization. For the predictive architecture, FKSUDDAPre proposed a novel ensemble framework composed of XGBoost, Decision Tree, Random Forest, and HyperFast. By employing a dynamic weight allocation strategy, this ensemble effectively harnesses the complementary strengths of these models to achieve significantly enhanced predictive performance. Rigorous validation demonstrated the system’s outstanding performance across multiple evaluation metrics, with an average AUC of 0.9725, improving the AUC by approximately 3.88% compared to the best-performing baseline model. In the prediction of Alzheimer’s disease and Parkinson’s disease, 80% and 60% of the top 10 candidate drugs recommended by FKSUDDAPre, respectively, had been confirmed by literature, demonstrating the model’s good practical application potential. Furthermore, we conducted a LIME-based feature importance analysis on the model’s predictions, visualizing the correlations between features and the target variable to demonstrate the model’s interpretability. A cross-platform, user-friendly visualization tool had also been developed using the PyQt5 framework.

Modeling human visuomotor adaptation with a disturbance observer framework

2026-02-04T14:00:00Z

by Gaurav Sharma, Bernard Marius ’t Hart, Jean-Jacques Orban de Xivry, Denise Y.P. Henriques, Mireille E. Broucke

A fundamental problem of visuomotor adaptation research is to understand how the brain is capable to asymptotically remove a predictable exogenous disturbance from a visual error signal using limited sensor information by re-calibration of hand movement. From a control theory perspective, the most striking aspect of this problem is that it falls squarely in the realm of the internal model principle of control theory. Despite this fact, the relationship between the internal model principle and models of visuomotor adaptation is currently not well developed. This paper aims to close this gap by proposing an abstract discrete-time state space model of visuomotor adaptation based on the internal model principle. The proposed DO Model, a metonym for its most important component, a disturbance observer, addresses key modeling requirements: modular architecture, physically relevant signals, parameters tied to atomic behaviors, and capacity for abstraction. The two main computational modules are a disturbance observer, a recently developed class of internal models, and a feedforward system that learns from the disturbance observer to improve feedforward motor commands.

Phase resetting in human stem cell derived cardiomyocytes explains complex cardiac arrhythmias

2026-02-04T14:00:00Z

by Khady Diagne, Thomas M. Bury, Morgan E. Pettebone, Marc W. Deyell, Zachary Laksman, Alvin Shrier, Leon Glass, Gil Bub, Emilia Entcheva

Phase resetting of cardiac oscillators underlies some complex arrhythmias. Here we use optogenetic stimulation to construct phase response curves (PRC) for spheroids of human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CM) and a computational cardiomyocyte model to identify ionic mechanisms shaping the PRC. The clinical utility of the human PRCs is demonstrated by adding a patient-based conduction delay to the same equations to explain complex multi-day Holter ECG dynamics and cardiac arrhythmias. Periodic stimulation of these patient-based models and the computational model of human iPSC-CM reveal similar bifurcation patterns and entrainment zones. Cell therapy by injecting iPSC-CM into diseased hearts can induce ectopic foci-based engraftment arrhythmias. The PRC analysis offers a potential strategy to entrain these foci in a parameter space that avoids such arrhythmias.

SHADE: A multilevel Bayesian framework for modeling directional spatial interactions in tissue microenvironments

2026-02-04T14:00:00Z

by Joel Eliason, Michele Peruzzi, Arvind Rao

Motivation: Understanding how different cell types interact spatially within tissue microenvironments is critical for deciphering immune dynamics, tumor progression, and tissue organization. Many current spatial analysis methods assume symmetric associations or compute image-level summaries separately without sharing information across patients and cohorts, limiting biological interpretability and statistical power. Results: We present SHADE (Spatial Hierarchical Asymmetry via Directional Estimation), a multilevel Bayesian framework for modeling asymmetric spatial interactions across scales. SHADE quantifies direction-specific cell-cell associations using smooth spatial interaction curves (SICs) and integrates data across tissue sections, patients, and cohorts. Through simulation studies, SHADE demonstrates improved accuracy, robustness, and interpretability over existing methods. Application to colorectal cancer multiplexed imaging data demonstrates SHADE’s ability to quantify directional spatial patterns while controlling for tissue architecture confounders and capturing substantial patient-level heterogeneity. The framework successfully identifies biologically interpretable spatial organization patterns, revealing that local microenvironmental structure varies considerably across patients within molecular subtypes.

TARPON—A Telomere Analysis and Research Pipeline Optimized for Nanopore

2026-02-04T14:00:00Z

by Nathaniel Deimler, David V. Ho, Norbert Paul, Zoë Gill, Peter Baumann

Long-read sequencing has transformed many areas of biology and holds significant promise for telomere research by enabling analysis of nucleotide-level resolution chromosome arm–specific telomere length in both model organisms and humans. However, the adoption of new technologies, particularly in clinical or diagnostic contexts, requires careful validation to recognize potential technical and computational limitations. We present TARPON (Telomere Analysis and Research Pipeline Optimized for Nanopore), a best-practices Nextflow pipeline designed for the analysis of telomeres sequenced on the Oxford Nanopore Technologies (ONT) platform. TARPON can be executed via the command line or integrated into ONT’s EPI2ME agent, providing a user-friendly graphical interface for those without computational training. Nextflow’s container-based architecture eliminates dependency conflicts, thereby streamlining deployment across platforms. TARPON isolates telomeric repeat–containing reads, assigns strand specificity, and identifies enrichment probes that can be used both for demultiplexing and for confirming capture-based library preparation. To ensure that the analysis is restricted to full-length telomeres, reads lacking a capture probe or non-telomeric sequence on the opposite end are excluded. A sliding-window approach defines the subtelomere-to-telomere boundary, followed by quality filtering to remove low-quality or subtelomeric reads that passed earlier steps. The pipeline generates customizable statistics, text-based summaries, and publication-ready visualizations (HTML, PNG, PDF). While default settings are optimized for diagnostic workflows, all parameters are easily adjustable via the GUI or command line to support diverse applications. These include telomere analyses in variant-rich samples (e.g., ALT-positive tumors) and organisms with non-canonical telomeric repeats such as some insects (GTTAG) and certain plants (GGTTTAG). TARPON is the first complete and experimentally validated pipeline for Nanopore-based telomere analysis requiring no data pre-processing or prior bioinformatics expertise, while offering flexibility for advanced users.

Modelling chemotaxis of branched cells in complex environments provides insights into immune cell navigation

2026-02-03T14:00:00Z

by Jiayi Liu, Jonathan E. Ron, Giulia Rinaldi, Ivanna Williantarra, Antonios Georgantzoglou, Ingrid de Vries, Michael Sixt, Milka Sarris, Nir S. Gov

Cell migration in vivo is often guided by chemical signaling, i.e., chemotaxis. For immune cells performing chemotaxis in the organism, this process is influenced by the complex geometry of the tissue environment. In this study, we use a theoretical model of branched cell migration on a network to explore the cellular response to chemical gradients. The model predicts the response of a branched cell to a chemical gradient: how the cell reorients its internal polarity and how it navigates through a complex environment up a chemical gradient. We then compare the model’s predictions with experimental observations of neutrophils migrating to the site of a laser-inflicted wound in a zebrafish larva fin, and neutrophils migrating in vitro inside a regular lattice of pillars. We find that the model captures the details of the subcellular response to the chemokine gradient, as well as qualitative characteristics of the large-scale migration, suggesting that the neutrophils behave as fast cells, which explains the functionality of these immune cells.

BiCLUM: Bilateral contrastive learning for unpaired single-cell multi-omics integration

2026-02-03T14:00:00Z

by Yin Guo, Izaskun Mallona, Mark D. Robinson, Limin Li

The integration of single-cell multi-omics data provides a powerful approach for understanding the complex interplay between different molecular modalities, such as RNA expression, chromatin accessibility and protein abundance, measured through assays like scRNA-seq, scATAC-seq and CITE-seq, at single-cell resolution. However, most existing single-cell technologies focus on individual modalities, limiting a comprehensive understanding of their interconnections. Integrating such diverse and often unpaired datasets remains a challenging task due to unknown cell correspondences across distinct feature spaces and limited insights into cell-type-specific activities in non-scRNA-seq modalities. In this work, we propose BiCLUM, a Bilateral Contrastive Learning approach for Unpaired single-cell Multi-omics integration, which simultaneously enforces cell-level and feature-level alignment across modalities. BiCLUM first transforms one modality, such as scATAC-seq, into the data space of another modality, such as scRNA-seq, using prior genomic knowledge. It then learns cell and gene embeddings simultaneously through a bilateral contrastive learning framework, incorporating both cell-level and feature-level contrastive losses. Across multiple RNA+ATAC and RNA+protein datasets, BiCLUM consistently outperforms or matches existing integration methods in both visualization and quantitative benchmarks. Importantly, BiCLUM embeddings preserve biologically meaningful regulatory relationships between chromatin accessibility and gene expression, as evidenced by significantly higher gene–peak correlations than random controls. Downstream analyses further demonstrate that BiCLUM-derived embeddings facilitate transcription factor activity inference, identification of cell-type-specific marker genes, functional enrichment, and cell–cell interaction mapping. Comprehensive hyperparameter sensitivity and ablation analyses further establish BiCLUM as a robust and interpretable framework that not only achieves effective cross-modal alignment but also retains the underlying regulatory and functional landscape across single-cell modalities.

Multiscale segmentation using hierarchical phase-contrast tomography and deep learning

2026-02-02T14:00:00Z

by Yang Zhou, Shahab Aslani, Yousef Javanmardi, Joseph Brunet, David Stansby, Saskia Carroll, Alexandre Bellier, Maximilian Ackermann, Paul Tafforeau, Peter D. Lee, Claire L. Walsh

Biomedical systems span multiple spatial scales, encompassing tiny functional units to entire organs. Interpreting these systems through image segmentation requires the effective propagation and integration of information across different scales. However, most existing segmentation methods are optimised for single-scale imaging modalities, limiting their ability to capture and analyse small functional units throughout complete human organs. To facilitate multiscale biomedical image segmentation, we utilised Hierarchical Phase-Contrast Tomography (HiP-CT), an advanced imaging modality that can generate 3D multiscale datasets from high-resolution volumes of interest (VOIs) at ca. 1 μm/voxel to whole-organ scans at ca. 20 μm/voxel. Building on these hierarchical multiscale datasets, we developed a deep learning-based segmentation pipeline that is initially trained on manually annotated high-resolution HiP-CT data and then extended to lower-resolution whole-organ scans using pseudo-labels generated from high-resolution predictions and multiscale image registration. As a case study, we focused on glomeruli in human kidneys, benchmarking four 3D deep learning models for biomedical image segmentation on a manually annotated high-resolution dataset extracted from VOIs, at 2.58 to ca. 5 μm/voxel, of four human kidneys. Among them, nnUNet demonstrated the best performance, achieving an average test Dice score of 0.906, and was subsequently used as the baseline model for multiscale segmentation in the pipeline. Applying this pipeline to two low-resolution full-organ data at ca. 25 μm/voxel, the model identified 1,019,890 and 231,179 glomeruli in a 62-year-old donor without kidney diseases and a 94-year-old hypertensive donor, enabling comprehensive morphological analyses, including cortical spatial statistics and glomerular distributions, which aligned well with previous anatomical studies. Our results highlight the effectiveness of the proposed pipeline for segmenting small functional units in multiscale bioimaging datasets and suggest its broader applicability to other organ systems.

Cluster dispersal shapes microbial diversity during community assembly

2026-02-02T14:00:00Z

by Loïc Marrec, Sonja Lehtinen

Identifying the drivers of diversity remains a central challenge in microbial ecology. In microbiota, within-community diversity is often linked to host health, which makes it all the more important to understand. Since many communities assemble de novo, microbial dispersal plays a critical role in shaping community structure during the early stages of assembly. While theoretical models typically assume microbes disperse individually, this overlooks cases where microbes disperse in clusters, such as, for example, during host feeding. Here, we investigate how cluster dispersal impacts species richness, between-community dissimilarity, and species abundance in the initial steps of microbial community assembly. We developed a model in which microbes disperse from a pool into communities as clusters and then replicate locally. Using both analytical and numerical approaches, we show that cluster dispersal promotes community homogenization by increasing within-community richness and reducing dissimilarity across communities, even at low dispersal rates. Moreover, it modulates the influence of local selection on microbial community assembly and, consequently, on species abundance. Our results demonstrate that cluster dispersal has distinct effects from simply increasing the dispersal rate. This work reveals new evidence for the role of cluster dispersal in the early dynamics of microbial community assembly.

A 2D Gabor-wavelet baseline model out-performs a 3D surface model in scene-responsive cortex

2026-02-02T14:00:00Z

by Anna Shafer-Skelton, Timothy F. Brady, John T. Serences

Understanding 3D representations of spatial information, particularly in naturalistic scenes, remains a significant challenge in vision science. This is largely because of conceptual difficulties in disentangling higher-level 3D information from co-occurring features and cues (e.g., the 3D shape of a scene image is necessarily defined by “low-level” spatial frequency and orientation information). Recent work has employed newer models and analysis techniques that attempt to mitigate these difficulties within a model-comparison framework. For example, one such study reported 3D-surface features were uniquely present in areas OPA, PPA, and MPA/RSC (areas typically referred to as ‘scene-selective’), above and beyond a Gabor-wavelet baseline model. Here, we tested whether these findings generalized to a new stimulus set that, on average, dissociated static Gabor-wavelet baseline features from 3D scene-surface features. Surprisingly, we found evidence that a Gabor-wavelet baseline model—commonly thought of as a “low-level” or “2D” model—better fit voxel responses in areas OPA, PPA and MPA/RSC compared to a model with 3D-surface information. We highlight that this difference in results could be due to differences in the baseline conditions used across studies. These findings emphasize that much of the information in “scene-selective” regions—potentially even information about 3D surfaces—may be in the form of spatial frequency and orientation information often considered 2D or low-level. Disentangling lower-level and higher-level visual information is a continuing fundamental challenge for model-comparison approaches in visual cognition, and it motivates future work investigating which visual features could cue higher-level properties in our real-world visual experience—both within and beyond current model comparison frameworks.

Information uncertainty influences learning strategy from sequentially delayed rewards

2026-02-02T14:00:00Z

by Sean R. Maulhardt, Alec Solway, Caroline J. Charpentier

When receiving a reward after a sequence of multiple events, how do we determine which event caused the reward? This problem, known as temporal credit assignment, can be difficult for humans to solve given the temporal uncertainty in the environment. Research to date has attempted to isolate dimensions of delay and reward during decision-making, but algorithmic solutions to temporal learning problems and the effect of uncertainty on learning remain underexplored. To further our understanding, we adapted a reward learning task that creates a temporal credit assignment problem by combining sequentially delayed rewards, intervening events, and varying uncertainty via the amount of information presented during feedback. Using computational modeling, two learning strategies were developed: an eligibility trace, whereby previously selected actions are updated as a function of the temporal sequence, and a tabular update, whereby only systematically related past actions (rather than unrelated intervening events) are updated. We hypothesized that reduced information uncertainty would correlate with increased use of the tabular strategy, given the model’s capacity to incorporate additional feedback information. Both models effectively learned the task, and predicted choices made by participants (N = 142) as well as specific behavioral signatures of credit assignment. Consistent with our hypothesis, the tabular model outperformed the eligibility model under low information uncertainty, as evidenced by more accurate predictions of participants’ behavior and an increase in tabular weight. These findings provide new insights into the mechanisms implemented by humans to solve temporal credit assignment and adapt their strategy in varying environments.

Learning genetic perturbation effects with variational causal inference

2026-02-02T14:00:00Z

by Emily Liu, Jiaqi Zhang, Caroline Uhler

Advances in sequencing technologies have enhanced the understanding of gene regulation in cells. In particular, Perturb-seq has enabled high-resolution profiling of the transcriptomic response to genetic perturbations at the single-cell level. This understanding has implications in functional genomics and potentially for identifying therapeutic targets. Various computational models have been developed to predict perturbational effects. While deep learning models excel at interpolating observed perturbational data, they tend to overfit in the lack of enough data and may not generalize well to unseen perturbations. In contrast, mechanistic models, such as linear causal models based on gene regulatory networks, hold greater potential for extrapolation, as they encapsulate regulatory information that can predict responses to unseen perturbations. However, their application has been limited to small studies due to overly simplistic assumptions, making them less effective in handling noisy, large-scale single-cell data. We propose a hybrid approach that combines a mechanistic causal model with variational deep learning, termed Single Cell Causal Variational Autoencoder (SCCVAE). The mechanistic model employs a learned regulatory network to represent perturbational changes as shift interventions that propagate through the learned network. SCCVAE integrates this mechanistic causal model into a variational autoencoder, generating rich, comprehensive transcriptomic responses. Our results indicate that SCCVAE exhibits superior performance over current state-of-the-art baselines for extrapolating to predict unseen perturbational responses. Additionally, for the observed perturbations, the latent space learned by SCCVAE allows for the identification of functional perturbation modules and simulation of single-gene knockdown experiments of varying penetrance, presenting a robust tool for interpreting and interpolating perturbational responses at the single-cell level.

TEvarSim: A genome simulator for transposable element (TE) variants

2026-01-30T14:00:00Z

by Jian Miao, Dawei Li

Transposable element (TE) variants, the presence or absence of TE sequences such as LINE-1, Alu, SVA, and endogenous retroviruses, are a major source of genomic diversity and play critical roles in human health, evolution, and disease. As interest in TE variants grows, developing related methods and tools for detection has become increasingly important. However, rigorous benchmarking of TE variant detection methods remains limited due to the lack of accurate and scalable TE variant simulation platforms and the absence of reliable ground truth data. Here, we developed TEvarSim, a novel TE variant simulator that generates TE-containing genomic data in multiple formats, including genomes, short- and long-read sequencing data, and VCF files. TEvarSim supports both random and real-world TE insertions and deletions, including variants derived from pangenome graphs. It can rapidly simulate hundreds to thousands of synthetic chromosomes or genomes and model natural variation at the haplotype, individual, and population levels, making it well suited for large-scale studies. In addition, TEvarSim can directly compare simulated VCF files with TEs reported by TE detection tools, streamlining the benchmarking of TE genotyping methods. TEvarSim provides an all-in-one toolkit for simulating, evaluating, and improving TE variant detection, advancing our ability to accurately study TEs in health and disease in various species.

Capturing individual variation in children’s electroencephalograms during nREM sleep

2026-01-30T14:00:00Z

by Verna Heikkinen, Susanne Merz, Riitta Salmelin, Sampsa Vanhatalo, Leena Lauronen, Mia Liljeström, Hanna Renvall

Human brain dynamics are highly unique between individuals: functional neuroimaging studies have recently described functional features that can be used as neural fingerprints. However, the stability of these fingerprints is affected by aging and disease. As such, the stability of brain fingerprints may be a useful metric when studying normal and pathological neurodevelopment. Before examining clinically relevant deviations, the individual stability and variation of neuroimaging features across brain maturation in normally developing children need to be addressed with real clinical data. Here we applied Bayesian reduced-rank regression (BRRR) to extract low-dimensional representations of electroencephalography (EEG) power spectra measured during different non-REM sleep stages (N1 and N2) from 782 normally developing children aged between 6 weeks to 19 years. The representations learned within specific sleep stages successfully separated between subjects and generalized across sleep stages. Fingerprint stability increased with the age of the subjects. Compared to correlation-based fingerprinting methods, the BRRR model performed better, especially in fingerprinting across sleep stages, highlighting the usefulness of dimensionality reduction when the noise and signal of interest are correlated. While further studies are needed to address the possible non-linear maturation effects over developmental periods, our results demonstrate the existence of stable within-session neurofunctional fingerprints in pediatric populations.

SpaConTDS: A multimodal contrastive learning framework for identifying spatial domains by applying tuple disturbing strategy

2026-01-29T14:00:00Z

by Ruiwen Xu, Xiaoqing Cheng, Waiki Ching, Siyao Wu, Yuanben Zhang, Yidan Zhang

The rational utilization of multimodal spatial transcriptomics (ST) data enables accurate identification of spatial domains, which is essential for investigating cellular structure and functions. In this study, we proposed SpaConTDS, a novel framework that integrates reinforcement learning with self-supervised multimodal contrastive learning. SpaConTDS generates positive and negative samples through data augmentation and a pseudo-label tuple perturbation strategy, enabling the learning of fused representations that capture global semantics and cross-modal interactions. The model’s hyper-parameters are dynamically optimized using reinforcement learning. Extensive experiments across various resolutions and platforms demonstrate that SpaConTDS achieves state-of-the-art accuracy in spatial domain identification and outperforms existing methods in downstream tasks such as denoising, trajectory inference, and UMAP visualization. Moreover, SpaConTDS effectively integrates multiple tissue sections and corrects batch effects without requiring prior alignment. Compared to existing approaches, SpaConTDS offers more robust fused representations of multimodal data, providing researchers with a flexible and powerful tool for a wide range of spatial transcriptomics analyses.

Persistence diagrams as morphological signatures of cells: A method to measure and compare cells within a population

2026-01-28T14:00:00Z

by Yossi Bokor Bleile, Pooja Yadav, Patrice Koehl, Florian Rehfeldt

Quantifying cell morphology is central to understanding cellular regulation, fate, and heterogeneity, yet conventional image-based analyses often struggle with diverse or irregular shapes. We present a computational framework that uses topological data analysis to characterise and compare single-cell morphologies from fluorescence microscopy. Each cell is represented by its contour together with the position of its nucleus, from which we construct a filtration based on a radial distance function and derive a persistence diagram encoding the shape’s topological evolution. The similarity between two cells is quantified using the 2-Wasserstein distance between their diagrams, yielding a shape distance we call the PH distance. We apply this method to two representative experimental systems—primary human mesenchymal stem cells (hMSCs) and HeLa cells—and show that PH distances enable the detection of outliers in those systems, the identification of sub-populations, and the quantification of shape heterogeneity. We benchmark PH against three established contour-based distances (aspect ratio, Fourier descriptors, and elastic shape analysis) and show that PH offers better separation between cell types and greater robustness when clustering heterogeneous populations. Together, these results demonstrate that persistent-homology-based signatures provide a principled and sensitive approach for analysing cell morphology in settings where traditional geometric or image-based descriptors are insufficient.

Spatial variation in socio-economic vulnerability to Influenza-like Infection for the US population

2026-01-28T14:00:00Z

by Shrabani S. Tripathy, Joseph V. Puthussery, Taveen S. Kapoor, John R. Cirrito, Rajan K. Chakrabarty

This study aims to quantify environmental health impacts and assess risk by understanding the disproportionate burden of infectious diseases, specifically Influenza-like Illness (ILI), across regions with varying socio-economic characteristics. We introduce a novel vulnerability-based approach to better understand the complex relationship between socio-economic factors and ILI burden. We developed a machine-learning-driven framework to assess and map state-level socio-economic vulnerability to ILI in the United States. A vulnerability index was created by integrating 39 diverse socio-economic and health indicators from the latest CENSUS. A Random Forest Regression model then weighed these indicators to quantify each state’s vulnerability for the ILI values in 2022. To assess multicollinearity, Variance Inflation Factor (VIF) was calculated, and parameters were filtered to reduce the VIF. Key determinants of vulnerability include migration patterns, insurance coverage, and proportions of female and elderly populations. The resulting state-level vulnerability map reveals significant regional disparities. District of Columbia was identified as the most vulnerable state, followed by Massachusetts, Hawaii, New Mexico, and Rhode Island, all with normalized vulnerability indices exceeding 0.35. Our findings highlight significant regional variations in ILI vulnerability, emphasizing the need for targeted public health interventions tailored to state-specific socio-economic conditions. This scalable and adaptable methodology extends beyond influenza, offering a valuable approach for assessing vulnerability to a wide range of infectious diseases, strengthening epidemic preparedness and response.

Learning cardiac activation and repolarization times with operator learning

2026-01-27T14:00:00Z

by Giovanni Ziarelli, Edoardo Centofanti, Nicola Parolini, Simone Scacchi, Marco Verani, Luca F. Pavarino

Solving partial or ordinary differential equation models in cardiac electrophysiology is a computationally demanding task, particularly when high-resolution meshes are required to capture the complex dynamics of the heart. Moreover, in clinical applications, it is essential to employ computational tools that provide only relevant information, ensuring clarity and ease of interpretation. In this work, we exploit two recently proposed operator learning approaches, namely Fourier Neural Operators (FNO) and Kernel Operator Learning (KOL), to learn the operator mapping the applied stimulus in the physical domain into the activation and repolarization time distributions. These data-driven methods are evaluated on synthetic 2D and 3D domains, as well as on a physiologically realistic left ventricle geometry. Notably, while the learned map between the applied current and activation time has its modeling counterpart in the Eikonal model, no equivalent partial differential equation (PDE) model is known for the map between the applied current and repolarization time. Our results demonstrate that both FNO and KOL approaches are robust to hyperparameter choices and computationally efficient compared to traditional PDE-based Monodomain models. These findings highlight the potential use of these surrogate operators to accelerate cardiac simulations and facilitate their clinical integration.

Deep learning models to map osteocyte networks from confocal microscopy can successfully distinguish between young and aged bone

2026-01-27T14:00:00Z

by Simon D. Vetter, Charles A. Schurman, Tamara Alliston, Gregory Slabaugh, Stefaan W. Verbruggen

Osteocytes, the most abundant and mechanosensitive cells in bone tissue, play a pivotal role in bone homeostasis and mechano-responsiveness, orchestrating the delicate balance between bone formation and resorption under daily activity. Studying osteocyte connectivity and understanding their intricate arrangement within the lacunar canalicular network is essential for unravelling bone physiology, which is significantly disrupted during ageing. Much work has been carried out to investigate this relationship, often involving high resolution microscopy of discrete fragments of this network, alongside advanced computational modelling of individual cells. However, traditional methods of segmenting and measuring osteocyte connectomics are time-consuming and labour-intensive, often hindered by human subjectivity and limited throughput. In this study, we explored the application of deep learning and computer vision techniques to automate the segmentation and measurement of osteocyte connectomics, enabling more efficient and accurate analysis. For this specific application, once trained, the analysis was completed within 10 seconds, compared to manual segmentation time of 130 hours. We compared a number of state-of-the-art computer vision models (U-Nets and Vision Transformers) to successfully segment the osteocyte network, finding that an Attention U-Net model can accurately segment and measure 81.8% of osteocytes and 42.1% of dendritic processes, when compared to manual labelling. While further development is required, we demonstrated that this degree of accuracy is already sufficient to distinguish between bones of young (2-month-old) and aged (36-month-old) mice, as well as partially capturing the degeneration induced by genetic modification of osteocytes. Comparison of the model predictions with manual measurements showed no significant difference, indicating that, with additional training, such deep learning algorithms could be trained to human-level accuracy when measuring the osteocyte network. By harnessing the power of these advanced technologies, further developments will likely shed light on the complexities of osteocyte networks with ever-increasing efficiency.

Forecasting drug resistant HIV protease evolution

2026-01-27T14:00:00Z

by Manu Aggarwal, Vipul Periwal

Protease inhibitors (PIs) target the protease (PR) enzyme to suppress viral replication. Their efficacy in human immunodeficiency virus treatment is compromised by the emergence of drug-resistant strains. Therefore, forecasting drug-resistance during viral evolution would help in the design of effective treatment strategies. To this end, we develop a framework that bridges two distinct data sets. First, we train probabilistic models to learn coevolutionary information in observed PR genotypes in different PI treatment regimens. We use these models to infer transition probabilities of point-mutations conditioned on the genotype and the treatment regimen. Second, we train another set of models to infer drug resistance of PR genotypes to different PIs using data of clinically measured drug resistance. We use these models together to simulate evolutionary trajectories and predict drug resistance. Importantly, we use these simulations to forecast the emergence of persistent drug resistant genotypes. Our analysis shows that the dual therapy of Atazanavir (ATV) and Ritonavir (RTV) is the multi-PI treatment regimen least likely to induce drug resistance. We also conduct an exhaustive ablation study of all possible mutations and predict seven point-mutations as critical for drug resistance. Interestingly, our results highlight the necessity of the amino-acid polymorphism of L63P by predicting that it is critical in developing resistance to Nelfinavir (NFV). The results validate that our framework effectively extracts and combines biological information from the distinct data sets of observed genotypes and drug resistance, while also tackling the challenge of sparsity of available sequence data compared to the large combinatorial complexity of protein evolution and changing functionality in dynamic environments.

PCR bias impacts microbiome ecological analyses

2026-01-27T14:00:00Z

by Dharmik R. Rathod, Justin D. Silverman

Polymerase Chain Reaction (PCR) is a critical step in amplicon-based microbial community profiling, allowing the selective amplification of marker genes such as 16S rRNA from environmental or host-associated samples. Despite its widespread use, PCR is known to introduce amplification bias, where some DNA sequences are preferentially amplified over others due to factors such as primer-template mismatches, sequence GC content, and secondary structures. Although these biases are known to affect transcript abundance, their implications for ecological metrics remain poorly understood. In this study, we conduct a comprehensive evaluation of how PCR-bias influences both within-samples (α-diversity) and between-sample (β-diversity) analyses. We show that perturbation-invariant diversity measures remain unaffected by PCR bias, but widely used metrics such as Shannon diversity and Weighted-Unifrac are sensitive. To address this, we provide theoretical and empirical insight into how PCR-induced bias varies across ecological analyses and community structures, and we offer practical guidance on when bias-correction methods should be applied. Our findings highlight the importance of selecting appropriate diversity metrics for PCR-based microbial ecology workflows and offer guidance for improving the reliability of diversity analyses.

Hierarchical analysis of RNA secondary structures with pseudoknots based on sections

2026-01-27T14:00:00Z

by Ryota Masuki, Donn Liew, Ee Hou Yong

Predicting RNA structures containing pseudoknots remains computationally challenging due to high processing costs and complexity. While standard methods for pseudoknot prediction require O(N⁶) time complexity, we present a hierarchical approach that significantly reduces computational cost while maintaining prediction accuracy. Our method analyzes RNA structures by dividing them into contiguous regions of unpaired bases (“sections”) derived from known secondary structures. We examine pseudoknot interactions between sections using a nearest-neighbor energy model with dynamic programming. Our algorithm scales as O(n2ℓ4), offering substantial computational advantages over existing global prediction methods. Analysis of 726 transfer messenger RNA and 454 Ribonuclease P RNA sequences reveals that biologically relevant pseudoknots are highly concentrated among section pairs with large minimum free energy (MFE) gain. Over 90% of connected section pairs appear within just the top 3% of section pairs ranked by MFE gain. For 2-clusters, our method achieves high prediction accuracy with sensitivity exceeding 0.9 and positive predictive value above 0.8. For 3-clusters, we discovered asymmetric behavior where “former” section pairs (formed early in the sequence) are predicted accurately, while “latter” section pairs do not follow local energy predictions. This asymmetry suggests that complex pseudoknot formation follows sequential co-transcriptional folding rather than global energy minimization, providing insights into RNA folding dynamics.

Abundant positively-charged proteins underlie JCVI-Syn3A’s expanded nucleoid and ribosome distribution

2026-01-27T14:00:00Z

by Gesse Roure, Vishal S. Sivasankar, Roseanna N. Zia

Nucleoid compaction in bacteria is commonly attributed to cytoplasmic crowding, DNA supercoiling, and nucleoid-associated proteins (NAPs). In most bacterial species, including E. coli, these effects condense the chromosome into a subcellular region and largely exclude ribosomes to the surrounding cytoplasm. In contrast, many Mycoplasma—including the Mycoplasma-derived synthetic cell JCVI-Syn3A—exhibit a cell-spanning nucleoid with ribosomes distributed throughout. Because Mycoplasma are evolutionarily distant from model bacteria like E. coli and have undergone extensive genome reduction, Syn3A is a natural testbed for genotype-to-‘physiotype’-to-phenotype, in which genome-encoded composition reshapes cell-scale organization. Here we show that this organization can arise from Syn3A’s unusually high abundance of positively charged proteins. We develop a coarse-grained model that explicitly and physically represents a sequence-accurate chromosome together with ribosomes and cytoplasmic proteins at physiological size, charge, and abundance. With DNA and ribosomes alone, the cell-spanning nucleoid relaxes toward a compacted state that sterically excludes ribosomes, indicating missing physics beyond polymer mechanics and excluded volume. When we include electrostatic interactions by assigning effective charges to each biomolecule, positively charged proteins dynamically enrich around ribosomes and DNA, partially screening ribosome–DNA repulsion. This charge shielding enables ribosomes to penetrate the nucleoid mesh and stabilizes a cell-spanning nucleoid consistent with experiment. This behavior is robust across parameter sweeps: DNA stiffness, heterogeneous mesh size, and crowding favor compaction, whereas electrostatics and size polydispersity promote expansion, with consequences for migration pathways within the nucleoid and thus transcription–translation dynamics. The framework is parameterized directly from genomic and proteomic composition and is transferable to other bacteria.

A model for the human fetal ventricular myocyte electrophysiology

2026-01-27T14:00:00Z

by Adelisa Avezzú, Stefano Longobardi, Anita Alvarez-Laviada, Francisca Schultz, Julia Gorelik, Catherine Williamson, Steven A. Niederer

Fetal cardiac arrhythmias can lead to stillbirth, but direct studies on the human fetal heart are challenging. To address this, we developed a computational model of human fetal ventricular myocyte (hfVM) electrophysiology, focusing on early gestation (10 weeks). This model incorporates major ionic currents, including fetal-specific T-type calcium and funny currents, and is calibrated using mRNA expression data and experimental measurements. The hfVM model replicates key electrophysiological features, such as a shorter action potential duration and a more positive resting membrane potential compared to adult cells. Global sensitivity analysis reveals that the resting membrane potential is primarily influenced by the funny current and I_K1, while action potential repolarisation depends mainly on I_Kr. Additionally, the sarcoplasmic reticulum contributes to calcium release, but less so than in adults; instead, the T-type calcium current and the sodium-calcium exchanger are more prominent in initiating calcium transients. This is the first human fetal ventricular myocyte model available for studying fetal cardiac physiology, pathology, and potential pharmacological interventions. It provides novel insights into the dominant ion channels governing fetal electrophysiology and calcium dynamics, offering a foundation for understanding arrhythmias and guiding therapeutic strategies.

PlasticEnz: An integrated database and screening tool combining homology and machine learning to identify plastic-degrading enzymes in meta-omics datasets

2026-01-26T14:00:00Z

by Anna Krzynowek, Jasper Snoeks, Karoline Faust

PlasticEnz is a new open-source tool for detecting plastic-degrading enzymes (plastizymes) in metagenomic data by combining sequence homology-based search with machine learning techniques. It integrates custom Hidden Markov Models, DIAMOND alignments, and polymer-specific classifiers trained on ProtBERT embeddings to identify candidate depolymerases from user-provided contigs, genomes, or protein sequences. PlasticEnz supports 11 plastic polymers with ML classifiers for PET and PHB, achieving F1 > 0.7 on an independent test set. Applied to plastic-exposed microcosms and field metagenomes, the tool recovered known PETases and PHBases, distinguished plastic-contaminated from pristine environments, and clustered predictions with validated reference enzymes. PlasticEnz is fast, scalable, and user-friendly, providing a robust framework for exploring microbial plastic degradation potential in complex communities.

AugGCL: Multimodal graph learning for spatial transcriptomics analysis with enhanced gene and morphological data

2026-01-23T14:00:00Z

by Tengfei Ji, Bo Yang, Meng Wang, Hong Ji, Huazhe Yang, Yizhuo Liu

Spatial transcriptomics enables the measurement of gene expression in intact tissues. Despite this, reconstructing anatomically accurate spatial domains remains challenging, primarily due to expression sparsity, complex tissue architecture that is characterized by sharp boundaries and long-range continuity, and weak spatial signals. Traditional pipelines typically rely on expression-driven clustering and spatial smoothing, which underperform at boundaries and in sparse regions while neglecting morphological information. To address these challenges, AugGCL is proposed, an augmented graph-convolutional learning framework that enhances spatial structure decoding and gene expression reconstruction through targeted augmentation of both gene and image data. A key component of AugGCL is neighborhood information aggregation mechanism, which integrates expression similarity and spatial proximity to construct a weighted graph and an enhanced expression matrix, addressing sparsity without sacrificing boundary clarity. Additionally, a two stream weighted graph convolutional network jointly models refined gene features and image-derived morphological information, with image-aware auxiliary reconstructions enhancing weak spatial signals and sharpening boundaries. On datasets from the human dorsolateral prefrontal cortex, breast cancer, and mouse embryo, AugGCL outperforms baseline methods across multiple metrics, showing robustness and generalization across a range of datasets. Downstream analysis validated the reliability of the method, confirming its effectiveness in cell annotation, functional enrichment, and mechanistic studies. AugGCL generates clearer spatial domains and significantly advances the application of spatial transcriptomics in tissue structure and disease research.

How do tumor-associated neutrophils regulate the microenvironmental landscape of brain tumors: Delivery of nano-particles through BBB

2026-01-23T14:00:00Z

by Haneol Cho, Junho Lee, Sean Lawler, Yangjin Kim

Glioblastoma multiforme (GBM) is the most aggressive form of brain cancer with the very poor survival and high recurrence rate. Tumor-associated neutrophils (TANs) play a pivotal role in regulation of the tumor microenvironment. In this study, we developed a new mathematical model of the critical GBM-TAN interaction in the heterogeneous brain tissue. The model reveals that the dual and complex role of TANs (either anti-tumorigenic N1 and the pro-tumorigenic N2 type) regulates the phenotypic trajectory of the evolution of tumor growth and the invasive patterns in white and gray matter via mediators such as IFN-β and TGF-β. We investigated the effect of normalizing the immune environment on glioma growth by applying a therapeutic antibody and developed several strategies for eradication of tumor cells by neutrophil-mediated transport of nanoparticles. We also developed a strategy of combination therapy (surgery + Trojan neutrophils) for effective control of the infiltration of the glioma cells in one hemisphere before crossing the corpus callosum (CC) in order to prevent recurrence in the other hemisphere. This alternative approach compared to the extended resection of the glioma including CC or butterfly GBM may provide the greater anti-tumor efficacy and reduce side effects such as cognitive impairment.

Composing egocentric and allocentric maps for flexible navigation

2026-01-23T14:00:00Z

by Daniel Shani, Peter Dayan

Egocentric representations of the environment have historically been relegated to being used only for simple forms of spatial behaviour such as stimulus-response learning. However, in the many cases that critical aspects of policies are best defined relative to the self, egocentric representations can be advantageous. Furthermore, there is evidence that forms of egocentric representation might exist in the wider hippocampal formation. Nevertheless, egocentric representations have yet to be fully incorporated as a component of modern navigational methods. Here we investigate egocentric successor representations (SRs) and their combination with allocentric representations. We build a reinforcement learning agent that combines an egocentric SR with a conventional allocentric SR to navigate complex 2D environments. We demonstrate that the agent learns generalisable egocentric and allocentric value functions which, even when only additively composed, allow it to learn policies efficiently and to adapt to new environments quickly. Our work shows the benefit for egocentric relational structure to be captured, as well as allocentric. We offer a new perspective on how cognitive maps could usefully be composed from multiple simple maps representing associations between state features defined in different reference frames.

A predicted structural interactome reveals binding interference from intrinsically disordered regions

2026-01-22T14:00:00Z

by Junhui Peng, Li Zhao

Proteins function through dynamic interactions with other proteins in cells, forming complex networks fundamental to cellular processes. While high-resolution and high-throughput methods have significantly advanced our understanding of how proteins interact with each other, the molecular details of many important protein-protein interactions are still poorly characterized, especially in non-mammalian species, including Drosophila. Recent advancements in deep learning techniques have enabled the prediction of molecular details in various cellular pathways at the network level. In this study, we used AlphaFold2 Multimer to examine and predict protein-protein interactions from both physical and functional datasets in Drosophila. We found that functional associations contribute significantly to high-confidence predictions. Through detailed structural analysis, we also found the importance of intrinsically disordered regions in the predicted high-confidence interactions. Our study highlights the importance of disordered regions in protein-protein interactions and demonstrates the importance of incorporating functional interactions in predicting physical interactions between proteins. We further compiled an interactive web interface to present these predictions, facilitating functional exploration, comparative analysis, and the generation of mechanistic hypotheses for future studies.

Bayesian data driven modelling of kinetochore dynamics: Space-time organisation of the human metaphase plate

2026-01-22T14:00:00Z

by Constandina Koki, Alessio V. Inchingolo, Abdullahi Daniyan, Enyu Li, Andrew D. McAinsh, Nigel J. Burroughs

Mitosis is a complex self-organising process that achieves high fidelity separation of duplicated chromosomes into two daughter cells through capture and alignment of chromosomes to the spindle mid-plane. Chromosome movements are driven by kinetochores (KTs), multi-protein machines that attach chromosomes to microtubules (MTs), and through those attachments both control and generate directional forces. Using lattice light sheet microscopy imaging and automated near-complete tracking of kinetochores at fine spatio-temporal resolution, we produce a detailed atlas of kinetochore metaphase-anaphase dynamics in untransformed human cells (RPE1). Such data allows dynamic models to be reverse engineered and biological hypotheses to be addressed. We determined the support from this dataset for 17 models of metaphase dynamics using Bayesian inference, demonstrating (1) substantial sister asymmetry that generates transverse organisation of the metaphase plate (MPP), (2) substantial spatial organisation of KT dynamic properties within the MPP, and (3) significant time dependence of the K-fiber mechanical parameters whereby K-fiber forces tune over the last 5 mins of metaphase towards a set point, referred to as the anaphase ready state. These spatio-temporal trends are robust to perturbation of the spindle assembly pathway (nocodazole washout treatment), suggesting that the underlying processes generating kinetochore heterogeneity are intrinsic to mitosis and possibly play a role in ensuring high-fidelity segregation.