Hostname: page-component-cb9f654ff-rkzlw Total loading time: 0 Render date: 2025-08-31T06:37:57.753Z Has data issue: false hasContentIssue false

The dawn of biophysical representations in computational immunology

Published online by Cambridge University Press:  28 May 2025

Eric Wilson
Affiliation:
Department of Immunology and Immunotherapy, https://ror.org/04a9tmd77Icahn School of Medicine at Mount Sinai, New York, NY, USA
Akshansh Kaushik
Affiliation:
School of Molecular Sciences, https://ror.org/03efmqc40Arizona State University, Tempe, AZ, USA
Soumya Dutta
Affiliation:
Biodesign Institute, Center for Applied Structural Discovery
Abhishek Singharoy*
Affiliation:
Biodesign Institute, Center for Applied Structural Discovery
*
Corresponding author: Abhishek Singharoy; Email: asinghar@asu.edu
Rights & Permissions [Opens in a new window]

Abstract

Computational immunology has been the breeding ground of some of the best bioinformatics work of the day. By melding diverse data types, these approaches have been successful in associating genotypes with phenotypes. However, the representations (or spaces) in which these associations are mapped have primarily been constructed from some omics-oriented sequence data typically derived from high-throughput experiments. In this perspective, we highlight the importance of biophysical representations for performing the genotype–phenotype map. We contend that using biophysical representations reduces the dimensionality of a search problem, dramatically expedites the algorithm, and more importantly, offers physical interpretability to the classes of clustered sequences across different layers of complexity – molecular, cellular, or macro-level. Such biophysical interpretations offer a firm basis for the future of bioengineering and cell-based therapies.

Information

Type
Perspective
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© Arizona State University, 2025. Published by Cambridge University Press

Introduction

The core responsibility of our immune system is to protect the body from pathogens and cancers. The need to target and activate the immune system reproducibly has been underscored by the recent pandemic and the rise of anticancer therapies that rely on immunological mechanisms. This has generated a focused enthusiasm for gaining a detailed description of the immune system. However, the human immune system is incredibly complex, and often regarded as one of the most challenging topics in biology. The sheer size of sequence and population diversity in proteins associated with the immune system presents a formidable obstacle to mapping their network of interactions within a tractable space. For instance, T-cell recognition of antigens is driven by human leukocyte antigens (HLA) genes encoding Major Histocompatibility Complexes (or MHCs), which are among the most polymorphic germline genes in the human genome that contain tens of thousands of variants across populations (Barker et al., Reference Barker, Maccari, Georgiou, Cooper, Flicek, Robinson and Marsh2023). Moreover, the somatic hypermutations involved in the function of T-cell and B-cell receptors make them the most polymorphic human proteins in known existence, with theoretical estimates of T-cell receptor (or TCR) diversity reaching over 1061 potential sequences. A more conservative estimate places TCR diversity in the range of 107 receptors (Mora and Walczak, Reference Mora, Walczak and Das2018), which still offers an incredibly vast range of human variations. The desire to account for this diversity and predict its associated non-linear relationships has motivated the genesis of the field of computational immunology to develop methods to analyze and predict immune outcomes based on this data (Figure 1). Computational immunology has transformed our understanding of the immune system by enabling the integration of massive amounts of biochemical and biological data. Simple mathematical models to study disease transmission can be traced to the early 20th century (Ross, Reference Ross1911; Brauer, Reference Brauer2017). By leveraging population data, it clarified the relationship between the size of mosquito populations and malaria incidence, which led to improved malaria control. The power of computational immunology expanded significantly in the information age with the advent of high-throughput sequencing, proteomics, and the growing availability of experimental and clinical data further empowered by advances in computational technology. These advancements have enabled computational techniques to tackle more complex immunological questions. Consequently, computational immunology has now been applied to a broad spectrum of applications including vaccine design (He and Zhu, Reference He and Zhu2015), predicting population-level mortality rates (Wilson et al., Reference Wilson, Hirneise, Singharoy and Anderson2021), and forecasting the outcomes of immune checkpoint blockade therapies (Chowell et al., Reference Chowell, Morris, Grigg, Weber, Samstein, Makarov, Kuo, Kendall, Requena, Riaz, Greenbaum, Carroll, Garon, Hyman, Zehir, Solit, Berger, Zhou, Rizvi and Chan2018).

Figure 1. The scales of computational immunology models from atomistic to macroscales.

Due to the availability and ease of collection of protein and amino acid sequence information, most computational immunology approaches primarily rely on sequence data for their predictions (Ansari and Raghava, Reference Ansari and Raghava2010; Jespersen et al., Reference Jespersen, Peters, Nielsen and Marcatili2017; Peters et al., Reference Peters, Nielsen and Sette2020). However, recent advances in machine learning and protein modeling have caused an explosion in the synergistic incorporation of biophysical information and modeling into existing computational immunology approaches (Andersen et al., Reference Andersen, Nielsen and Lund2006; Wilson et al., Reference Wilson, Cava, Chowell, Raja, Mangalaparthi, Pandey, Curtis, Anderson and Singharoy2024). Such integrations have already shown a profound improvement in the accuracy of models, but also enable novel insights into previously inscrutable mechanisms, advancing computational immunology models into the next era. In the following perspective, we will explore immune-related models ranging from atomistic environments to macro-level systems, demonstrating how biophysics can be used to enhance predictive accuracy and improve our overall understanding of immune responses.

A perspective on biophysical models

Computational immunology has been dominated by bioinformatics, primarily due to a push from recent findings in genomic and proteomic technologies that compose around 31 different databases today (Rigden and Fernández, Reference Rigden and Fernández2023). Historically, it allows the study of complex protein–protein interactions across a diversity of sequences (Petrovsky and Brusic, Reference Petrovsky and Brusic2002). Recently, deep learning approaches have offered rapid access to molecular structures from sequences (Jumper et al., Reference Jumper, Evans, Pritzel, Green, Figurnov, Ronneberger, Tunyasuvunakool, Bates, Žídek, Potapenko, Bridgland, Meyer, Kohl, Ballard, Cowie, Romera-Paredes, Nikolov, Jain, Adler, Back, Petersen, Reiman, Clancy, Zielinski, Steinegger, Pacholska, Berghammer, Bodenstein, Silver, Vinyals, Senior, Kavukcuoglu, Kohli and Hassabis2021), which has extended the realm of bioinformatics to structure-guided models of immune interactions (Bradley, Reference Bradley2023). However, the physical formulation of intermolecular interactions is statistical, which entails an ensemble description of conformations that remains obscure in the bioinformatics approaches. These ensembles capture transition in the order–disorder transition of the molecules, flexibility, and thermal effects, as well as solvation and microenvironmental impacts on structure. Attempts to overcome such limitations of traditional computational immunology open the doors for employing biophysical tools to take MHC, TCRs, and antibody predictions beyond the sequence-only or sequence-structure paradigm (Raha et al., Reference Raha, Ding, Liu and Wu2022; Deng et al., Reference Deng, Ly, Abdollahi, Zhao, Prinz and Bonn2023; Demerdash and Smith, Reference Demerdash and Smith2024). Notwithstanding the computationally expensive biophysical simulations, it generates unique representations and metrics that connect collective molecular properties with phenotypic and even population outcomes. We break down the biophysical advances in the realm of atomistic, molecular, whole-cell, and macro-level modeling, and highlight how biophysical entities of Figure 1 are acting or can be leveraged as novel representations for learning in computational immunology, as complements to the traditional sequence or structural methods (Figure 2).

Figure 2. A comprehensive list of immunological problems and their biophysical representations. Illustrations – 1. Antibody (PDB-1IGT), 2. MHC (PDB-1HHK), 3. TCR (from RCSB-PDB), 4. Viral vector ChAdOx1, 5. Whole-cell illustration, and 6. Epitope (PDB-3PP4).

Atomistic description

We start with biophysical descriptors in computational biology arising from detailed interactions of antibodies, MHCs, and TCRs.

Free energy description of antibodies

Since the first antibody structure was deposited in 1976, the number of antibody structures in the Protein Data Bank (PDB) has grown, and it now represents approximately 2.1% of the total entries (Ferdous and Martin, Reference Ferdous and Martin2018). Many computational tools now use only the antibody data, as opposed to general protein data, due to the increased performance (Ponomarenko and Bourne, Reference Ponomarenko and Bourne2007; Młokosiewicz et al., Reference Młokosiewicz, Deszyński, Wilman, Jaszczyszyn, Ganesan, Kovaltsuk, Leem, Galson and Krawczyk2022). To this end, the Structural Antibody Database (or SAbDab) collects, curates, and presents an ensemble of antibody structures from the PDB (Schneider et al., Reference Schneider, Raybould and Deane2022). Such databases allow for the prediction of the affinity of antibody–antigen interfaces by combining the biophysics of protein–protein interactions with deep learning approaches (Hummer et al., Reference Hummer, Schneider, Chinery and Deane2023). In fact, a significant improvement in the ranking and prediction of affinity predictions is observed by combining all-atom free energy methods like Free energy perturbation or FEP+ with focused machine learning approaches like QuanSA (Cleves and Jain, Reference Cleves and Jain2018). Using such a combination of biophysics and informatics, the affinity of the CR3022 antibody is optimized to the spike protein of the SARS-CoV-2 Omicron strain, achieving a high success rate with up to a 17-fold affinity increase (Cai et al., Reference Cai, Zhang, Wang, Zhong, Li, Zhong, Wu, Ying and Tang2024). Going beyond simple geometric 3D coordinate representations of ligands (Cleves and Jain, Reference Cleves and Jain2018), a novel metric of multiple-ligand alignment is employed using so-called pocket fields to learn affinities. Unlike the learning of real geometries that are quite high-dimensional, the learning of smoother functions like the 3D fields (with known map to the SMILE or peptide sequences) offers learning across a broad diversity of molecular identity and conformation, without overfitting the loss function. In conclusion, the application of free energy-augmented antibody design underscores the growing power of biophysical modeling to not only understand but also engineer biological systems for specific therapeutic outcomes.

Structural modeling of MHC (Major Histocompatibility Complex)

In 1968, Snell examined the concept of transplantation and came across the term histocompatibility polymorphism (Hull, Reference Hull1970; Garrido, Reference Garrido2024). MHC proteins play a crucial role in immune mechanisms due to their involvement in activating T cells and B cells (Janeway et al., Reference Janeway, Travers, Walport and Shlomchik2001; Wieczorek et al., Reference Wieczorek, Abualrous, Sticht, Álvaro-Benito, Stolzenberg, Noé and Freund2017). Structural modeling of these complexes offers insights into the mechanism of the several pathways relevant to immunogenicity (Keller et al., Reference Keller, Weiss and Baker2022). The MHC protein is one of the most polymorphic proteins in humans (Barker et al., Reference Barker, Maccari, Georgiou, Cooper, Flicek, Robinson and Marsh2023), but despite the high polymorphism, the structure of the MHC binding groove is highly conserved (Wilson et al., Reference Wilson, Cava, Chowell, Raja, Mangalaparthi, Pandey, Curtis, Anderson and Singharoy2024). Researchers found that the second and last residues are key anchors for peptide binding to the MHC class-I binding groove (Janeway et al., Reference Janeway, Travers, Walport and Shlomchik2001), a discovery made through X-ray diffraction studies (Zhang et al., Reference Zhang, Anderson and DeLisi1998). Since countless peptides can bind to MHC, many generated by frameshift events, and lack evolutionary context for multi-sequence alignments, crystallizing all polymorphic complexes is unfeasible. A biophysical approach is thus needed to model MHC–peptide complexes for further study.

Conventionally, there are three ways to model structures: molecular dynamics, molecular docking, and homology modeling (Bertoline et al., Reference Bertoline, Lima, Krieger and Teixeira2023). The unifying protocol to design a model for MHC is as follows: the first part is to generate a peptide conformation using a PDB template, the second step involves docking of the peptide, and finally optimizing the overall structure. Multiple sources are available to model MHC-I complexes such as DockTope, GradDock, APE-Gen, AlphaFold2, and RoseTTAfold (Rigo et al., Reference Rigo, Antunes, de Freitas, de Almeida Mendes, Meira, gaglia and Vieira2015; Kyeong et al., Reference Kyeong, Choi and Kim2018; Abella et al., Reference Abella, Antunes, Clementi and Kavraki2019; Bryant et al., Reference Bryant, Pozzati and Elofsson2022). Although these methods are highly accurate, some of them are highly computationally heavy or applicable only to the MHC class-I molecule due to the heterodimeric binding pocket observed in MHC class-II molecules. Recently, a state-of-the-art method, PANDORA, shows potential to design even MHC class-II molecules, and also offers some tunability while modeling. Its energy-based definition of loop conformations is shown to outperform most of the methods previously introduced in terms of accuracy and computational efficiency (Parizi et al., Reference Parizi, Marzella, Ramakrishnan, t Hoen, Karimi-Jafari and Xue2023). However, there still is a need for a tool that models complex structures by capturing the biophysical attributes of the peptide–MHC complex instead of exploiting sequence similarity and templates. Large datasets to benchmark biophysical properties across a range of MHC systems – similar to MISATO (for MD simulations of 20,000 protein-ligand systems) or 100-protein NMR spectra (for protein dynamics) – do not yet exist in this space. A very promising result is that semi-empirical quantum mechanical representations can now be embedded in these data sets to refine the associated protein structures. Once similar datasets start existing for the broad class of MHC proteins, such quantum chemistry representations can likely be extended to the peptide–MHC predictions, for example, with PANDORA or other tools. Ultimately, improvement to MHC modeling and subsequent extraction of generalizable biophysical properties will lead to better predictions of immunogenicity. Highlighting this point, a thorough structural study demonstrated that a non-anchor position mutation in an MHC-I peptide, presented by an ovarian cancer tumor, modified both the structural and dynamic properties of the bound complex. These changes resulted in optimal confirmations for interaction and subsequent activation of cognate T cells (Devlin et al., Reference Devlin, Alonso, Ayres, Keller, Bobisse, Kooi, Coukos, Gfeller, Harari and Baker2020). Such an observation would be difficult, if not impossible to determine from sequence alone and emphasizes the value of structural considerations when studying immunogenicity.

Catch bond description of TCRs

Catch bonds have been referred to as the interaction between various biomolecules and biomolecular surfaces, where the lifetime of the bond increases with the application of tensile force on the bond (Marshall et al., Reference Marshall, Long, Piper, Yago, McEver and Zhu2003; Hertig and Vogel, Reference Hertig and Vogel2012). The atomistic detail of catch bond formation had remained elusive for a long period of time, but the general explanation was given by a two-state model or a two-pathway model. In the two-state model, the receptor-ligand complex is theorized to exist in two distinct states, a short-lived and a long-lived state. The application of force loosens the interaction between the binding site and a regulatory site, which drives the whole complex toward the long-lifetime state (Hertig and Vogel, Reference Hertig and Vogel2012). In the two-pathway model, the receptor-ligand complex undergoes unbinding via two distinct pathways with different Koff values, and the application of tensile force triggers the allosteric change that leads the unbinding to happen via the pathway with a high energy barrier, thereby the long-lifetime (Sokurenko et al., Reference Sokurenko, Vogel and Thomas2008). Such catch bonding has been observed at the TCR-peptide–MHC immune synapse, and more importantly, immunogenicity has been attributed to the strength of the catch bond formation (Choi et al., Reference Choi, Cong, Ge, Natarajan, Liu, Zhang, Li, Rushdi, Chen, Lou, Krogsgaard and Zhu2023). Hence, catch bonds offer a biophysical descriptor of MHC alleles for presenting peptides to the TCRs. Interestingly, unlike binding affinity, catch bonds uniquely capture the system’s out-of-equilibrium properties. Therefore, it can capture the state of the immune synapse under stress, which rectifies the frozen stationary picture of complexes drawn by the affinity measures. This descriptor is computable using Steered MD simulations (Schoeler et al., Reference Schoeler, Malinowska, Bernardi, Milles, Jobst, Durner, Ott, Fried, Bayer, Schulten and Gaub2014) and more recently using metadynamics methodologies (Ccoa and Hocky, Reference Ccoa and Hocky2022), offering insights into how sequence changes reflect in non-equilibrium interaction changes. However, both the experimental and computational biophysical methods for tracking catch bonds are resource-intensive, so high-throughput measurements are yet missing, in turn impacting the extensive use of this information in immunology models. The advent of reinforcement learning with Jarzynski’s equality and so-called stiff-spring approximations (Park and Schulten, Reference Park and Schulten2004) to formulate a space of molecular actions using steered MD simulations presents a promising step forward in rapidly modeling at least the 2-state model of the catch bonds as another biophysical descriptor in computational immunology (Choi et al., Reference Choi, Cong, Ge, Natarajan, Liu, Zhang, Li, Rushdi, Chen, Lou, Krogsgaard and Zhu2023). A more rigorous consideration of catch bond formation has practical implications for enhancing T cell-based cancer immunotherapies. A recent study showed low-affinity TCRs can be optimized to acquire catch bonding characteristics, allowing for potent activation at relatively weak 3D binding affinities (Zhao et al., Reference Zhao, Kolawole, Chan, Feng, Yang, Gee, Jude, Sibener, Fordyce, Germain, Evavold and Garcia2022). This has the ability to drive a strong antitumor immune response with a lower risk of potentially life-threatening cross-reactivity.

Molecular description

The translation from atomistic to molecular biophysical representation has become popular to allow algorithms to distinguish self versus non-self interactomes. The biophysical representations of glycans underpinning the pathogen entry path offer some stark examples. By employing tools like variational autoencoders, the so-called glycan shield of spike proteins was dissected to detect the role of specific glycan size, orientation, and chemistry (Casalino et al., Reference Casalino, Dommer, Gaieb, Barros, Sztain, Ahn, Trifan, Brace, Bogetti, Clyde, Ma, Lee, Turilli, Khalid, Chong, Simmerling, Hardy, Maia, Phillips, Kurth, Stern, Huang, McCalpin, Tatineni, Gibbs, Stone, Jha, Ramanathan and Amaro2021). A physical interpretation of the latent spaces was determined from protein-glycan contacts. Subsequently, we engineered the glycan shield based on their contact representation to reduce the infectivity of the NL63 coronavirus by nearly 50% (Chmielewski et al., Reference Chmielewski, Wilson, Pintilie, Zhao, Chen, Schmid, Simmons, Wells, Jin, Singharoy and Chiu2023). This idea of monitoring contacts was also extrapolated to monitor inter-glycan interactions between the cell surface of the influenza virus and those of chicken and human cell surface glycocalyx (Lucas et al., Reference Lucas, Gupta, Altman, Sanchez, Naticchia, Gagneux, Singharoy and Godula2021). Again, by translating fluorescence signals into a contact matrix representation, support vector machines were successful in identifying the critical density of glycans that make the H1N1 cells in mammalian cells show a greater binding than when grown in egg cells. Finally, the protein–protein contact matrices also found application in vector design for AstraZeneca and J&J’s COVID vaccines, implicating platelet factor proteins in blood clotting side effects of the vaccine candidate (Baker et al., Reference Baker, Boyd, Sarkar, Teijeira-Crespo, Chan, Bates, Waraich, Vant, Wilson, Truong, Lipka-Lloyd, Fromme, Vermaas, Williams, Machiesky, Heurich, Nagalo, Coughlan, Umlauf, Chiu, Rizkallah, Cohen, Parker, Singharoy and Borad2021). Altogether, contact matrices can offer a robust biophysical representation, wherein molecular interactions can be classified to be self or non-self.

Cellular description

Whole-cell models, though scarce, have found applications in computational immunology. A mechanistic, multiscale mathematical model of immunogenicity for therapeutic proteins was formulated by recapitulating key biological mechanisms, including antigen presentation, activation, proliferation, and differentiation of immune cells, secretion of antidrug antibodies, as well as in vivo disposition of antibodies and therapeutic proteins (Chen et al., Reference Chen, Hickling and Vicini2014). The multiscale model structure can be represented by the subcellular, cellular, and whole-body levels. To represent the physiology of MHC-II, a key parameter used in these models involves the number of T-epitope-MHC, in silico T cell epitope prediction and experimental measurements of their MHC-binding affinities, which is scaffolded within a two-compartment drug pharmacokinetics model. Using adalimumab as an example therapeutic protein, the model is able to simulate immune responses against adalimumab in individual subjects and in a population and also provides estimations of immunogenicity incidence and drug exposure reduction that can be validated experimentally (Chen et al., Reference Chen, Hickling and Vicini2014; Handel et al., Reference Handel, La Gruta and Thomas2020). Most of the cell models in immunology are agent-based that use the automaton algorithm with specific mechanistic logics or rules. Interestingly these rules show remarkable similarity with classical thermodynamic and kinetic principles, such as landscapes and equations of motion (Koopmans and Youk, Reference Koopmans and Youk2021). Such models have found applications in CD4+ T cell responses to influenza infections, multiscale mechanistic modeling of human dendritic cells, and have potential applications in dendritic cell-based targeted cell therapies (Wertheim et al., Reference Wertheim, Puniya, La Fleur, Shah, Barberis and Helikar2021; Aghamiri et al., Reference Aghamiri, Puniya, Amin and Helikar2023).

Macro description

The integration of molecular immunology concepts into macro-level analyses has already demonstrated significant potential in elucidating disease associations. A notable example is the use of patient-specific MHC genotypes to predict disease risk. For instance, large-scale analyses involving 9,176 cancer patients revealed that MHC-I genotypes were predictive of the tumor mutational landscape (Marty et al., Reference Marty, Kaabinejadian, Rossell, Slifker, van de Haar, Engin, de Prisco, Ideker, Hildebrand, Font-Burgada and Carter2017). This study found that oncogenic mutations were more likely to occur in regions not presented by the patient’s MHC-I molecules, suggesting that gaps in antigen presentation contribute to tumor evolution. Similarly, patients undergoing immune checkpoint blockade therapies have shown improved responses when their MHC-I genotype allows for the presentation of a more diverse array of potential peptides (Chowell et al., Reference Chowell, Krishna, Pierini, Makarov, Rizvi, Kuo, Morris, Riaz, Lenz and Chan2019). More recently, bio-physical approaches have been applied to link MHC-I genotypes with disease risk and progression (Wilson et al., Reference Wilson, Cava, Chowell, Raja, Mangalaparthi, Pandey, Curtis, Anderson and Singharoy2024). Recently, we created a diverse protein ensemble of 5,281 MHC-I protein binding grooves, generating 211,240 structural models, which were subsequently translated into a simplified representation of electrostatic properties (5,281 averaged electrostatic maps). A subset of these maps, those with known MHC-I binding motifs, was used to train an Inception neural network capable of predicting MHC-I binding motifs from electrostatic maps alone. Beyond the ability to perform high-throughput proteome-scale binding predictions, the predicted binding motifs were utilized to construct interaction networks that accurately classified HIV disease progression and immune checkpoint therapy response. At the population level, applications of MHC-I genotype analysis have revealed further insights. A consensus MHC-I prediction model, ensembleMHC, demonstrated that populations enriched for MHC-I alleles capable of strongly binding multiple peptides from SARS-CoV-2 structural proteins exhibited lower mortality rates during the pre-vaccination phase of the COVID-19 pandemic (Wilson et al., Reference Wilson, Hirneise, Singharoy and Anderson2021). This suggests that MHC-I diversity and peptide-binding capacity at the population level may serve as predictors of disease outcomes in emerging viral threats. These findings highlight some of the promise of MHC genotype-based analysis in both disease risk assessment and therapeutic strategy development. MHC analysis can aid in predicting susceptibility to autoimmune diseases and cancer while also informing vaccine design by optimizing patient antigen selection.

Outlook: Future inspired by the past of functional representations

Most of the biophysics, including the powerful integrative models we know, is predicated upon the sequence → structure → function → phenotype paradigm. With the maturation of machine learning techniques and the availability of data at various scales, researchers (particularly bioinformaticians) have been trying to bridge gaps between the different tiers of this process, starting from the age-old genotype–type modeling to CASP and AlphaFold’s sequence structure up to recent attempts to go from sequence to ensemble. However, physical causality is often missing in the traditional bioinformatics models, thus far sidelining the role of AI-driven advances only to predictions of the forward direction. So, it is high time that we introduce physical ideas to conceive generative models that backmap phenotypes down to an ensemble of structures and sequences. Model representations play a central role in this mapping process. Although the traditional sequence of 3D coordinate structural representations requires an enormous amount of training data and is prone to overfitting, they nonetheless offer the most extensive models. In contrast, the thermodynamic or kinetic representations, using ideas of entropy or committor functions are quite generalizable across application domains but lack the physical interpretability (Mehdi et al., Reference Mehdi, Smith, Herron, Zou and Tiwary2024). Loosely, they draw analogies to the plane wave basis set representations that find application in several areas of quantum mechanics (Nagy and Jensen, Reference Nagy and Jensen2017). However, akin to how quantum mechanics was represented in the molecular systems using the Gaussian-like basis set representations, we posit that biophysical representations offer a segue for representing the deep learning models in the molecular space. To this end, we highlight a number of representations that are either being used or hold the potential for multiscale applications in computational immunology. Similar to how Gaussian orbitals offer physical interpretation of highly resolved electronic structures (e.g. using the molecular orbital theory), biophysical functions offer interpretability. These functions, such as pocket fields, QM/MM charge density, binding affinity, catch bonding, contact matrices, and molecular electrostatics are deeply rooted in physical theories. These theories (thermodynamic integration, electronic structure theory, equilibrium and non-equilibrium statistical theories, linear response theories, polymer folding, and continuum mechanics) can be projected onto structure and function. Essentially, they offer a physical basis to the loss functions and the latent spaces that enable learning both the data and the context. So, we propose a sustained intellectual effort in this direction.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2025.7.

Acknowledgments

A.S. also acknowledges start-up grants from Arizona State University School of Molecular Sciences and Biodesign Institute’s Center for Applied Structural Discovery. A.S. acknowledges funding from the Division of Chemical Sciences, Geosciences, and Biosciences, Office of Basic Energy Sciences, of the U.S. Department of Energy through grants DESC0010575. A.S. acknowledges grant DE-SC0022956 for their support, also from the Department of Energy. This material is based on work supported by the National Defense Education Program (NDEP) for Science, Technology, Engineering, and Mathematics (STEM) Education.

Financial support

A.S. was supported by a CAREER award from the NSF (MCB-1942763) and an RO1 grant from the NIH (GM095583).

Competing interest

The authors declare no competing interests exist.

Footnotes

A.K and S.D have contributed equally to this work.

References

Abella, JR, Antunes, DA, Clementi, C and Kavraki, LE (2019) APE-gen: A fast method for generating ensembles of bound peptide-MHC conformations. Molecules 24(5), 881.CrossRefGoogle ScholarPubMed
Aghamiri, SS, Puniya, BL, Amin, R and Helikar, T (2023) A multiscale mechanistic model of human dendritic cells for in-silico investigation of immune responses and novel therapeutics discovery. Frontiers in Immunology 14, 1112985.CrossRefGoogle ScholarPubMed
Andersen, PH, Nielsen, M and Lund, OLE (2006) Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Science 15(11), 25582567.CrossRefGoogle Scholar
Ansari, HR and Raghava, GPS (2010) Identification of conformational B-cell epitopes in an antigen from its primary sequence. Immunome Research 6, 19.CrossRefGoogle Scholar
Baker, AT, Boyd, RJ, Sarkar, D, Teijeira-Crespo, A, Chan, CK, Bates, E, Waraich, K, Vant, J, Wilson, E, Truong, CD, Lipka-Lloyd, M, Fromme, P, Vermaas, J, Williams, D, Machiesky, L, Heurich, M, Nagalo, BM, Coughlan, L, Umlauf, S, Chiu, PL, Rizkallah, PJ, Cohen, TS, Parker, AL, Singharoy, A and Borad, MJ (2021) ChAdOx1 interacts with CAR and PF4 with implications for thrombosis with thrombocytopenia syndrome. Science Advances 7(49), eabl8213. https://doi.org/10.1126/sciadv.abl8213.CrossRefGoogle ScholarPubMed
Barker, DJ, Maccari, G, Georgiou, X, Cooper, MA, Flicek, P, Robinson, J and Marsh, SGE (2023) The ipd-imgt/hla database. Nucleic Acids Research 51(D1), D1053D1060.CrossRefGoogle ScholarPubMed
Bertoline, LMF, Lima, AN, Krieger, JE and Teixeira, SK (2023) Before and after AlphaFold2: An overview of protein structure prediction. Frontiers in Bioinformatics 3, 1120370.CrossRefGoogle ScholarPubMed
Bradley, P (2023) Structure-based prediction of T cell receptor: Peptide-MHC interactions. eLife 12, e82813.CrossRefGoogle ScholarPubMed
Brauer, F (2017) Mathematical epidemiology: Past, present, and future. Infectious Disease Modelling 2(2), 113127.CrossRefGoogle ScholarPubMed
Bryant, P, Pozzati, G and Elofsson, A (2022) Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications 13(1), 1265.CrossRefGoogle ScholarPubMed
Cai, H, Zhang, Z, Wang, M, Zhong, B, Li, Q, Zhong, Y, Wu, Y, Ying, T and Tang, J (2024) Pretrainable geometric graph neural network for antibody affinity maturation. Nature Communications 15(1), 7785.CrossRefGoogle ScholarPubMed
Casalino, L, Dommer, AC, Gaieb, Z, Barros, EP, Sztain, T, Ahn, SH, Trifan, A, Brace, A, Bogetti, AT, Clyde, A, Ma, H, Lee, H, Turilli, M, Khalid, S, Chong, LT, Simmerling, C, Hardy, DJ, Maia, JD, Phillips, JC, Kurth, T, Stern, AC, Huang, L, McCalpin, JD, Tatineni, M, Gibbs, T, Stone, JE, Jha, S, Ramanathan, A and Amaro, RE (2021) AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. The International Journal of High Performance Computing Applications 35(5), 432451. https://doi.org/10.1177/10943420211006452.CrossRefGoogle ScholarPubMed
Ccoa, WJP and Hocky, GM (2022) Assessing models of force-dependent unbinding rates via infrequent metadynamics. The Journal of Chemical Physics 156(12), 125102. https://doi.org/10.1063/5.0081078.CrossRefGoogle Scholar
Chen, X, Hickling, T and Vicini, P (2014) A mechanistic, multiscale mathematical model of immunogenicity for therapeutic proteins: Part 2—Model applications. CPT: Pharmacometrics & Systems Pharmacology 3(9), 110.Google ScholarPubMed
Chmielewski, D, Wilson, EA, Pintilie, G, Zhao, P, Chen, M, Schmid, MF, Simmons, G, Wells, L, Jin, J, Singharoy, A and Chiu, W (2023) Structural insights into the modulation of coronavirus spike tilting and infectivity by hinge glycans. Nature Communications 14(1), 7175. https://doi.org/10.1038/s41467-023-42836-9.CrossRefGoogle ScholarPubMed
Choi, H-K, Cong, P, Ge, C, Natarajan, A, Liu, B, Zhang, Y, Li, K, Rushdi, MN, Chen, W, Lou, J, Krogsgaard, M and Zhu, C (2023) Catch bond models may explain how force amplifies TCR signaling and antigen discrimination. Nature Communications 14(1), 2616.CrossRefGoogle ScholarPubMed
Chowell, D, Krishna, C, Pierini, F, Makarov, V, Rizvi, NA, Kuo, F, Morris, LGT, Riaz, N, Lenz, TL and Chan, TA (2019) Evolutionary divergence of HLA class I genotype impacts efficacy of cancer immunotherapy. Nature Medicine 25(11), 17151720.CrossRefGoogle Scholar
Chowell, D, Morris, LGT, Grigg, CM, Weber, JK, Samstein, RM, Makarov, V, Kuo, F, Kendall, SM, Requena, D, Riaz, N, Greenbaum, B, Carroll, J, Garon, E, Hyman, DM, Zehir, A, Solit, D, Berger, M, Zhou, R, Rizvi, NA and Chan, TA (2018) Patient HLA class I genotype influences cancer response to checkpoint blockade immunotherapy. Science 359(6375), 582587. https://doi.org/10.1126/science.aao4572.CrossRefGoogle Scholar
Cleves, AE and Jain, AN (2018) Quantitative surface field analysis: Learning causal models to predict ligand binding affinity and pose. Journal of Computer-Aided Molecular Design 32, 731757.CrossRefGoogle ScholarPubMed
Demerdash, ONA and Smith, JC (2024) TCR-H: Explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets. Frontiers in Immunology 15, 1426173.Google Scholar
Deng, L, Ly, C, Abdollahi, S, Zhao, Y, Prinz, I and Bonn, S (2023) Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency. Frontiers in Immunology 14, 1128326.CrossRefGoogle Scholar
Devlin, JR, Alonso, JA, Ayres, CM, Keller, GL, Bobisse, S, Kooi, CWV, Coukos, G, Gfeller, D, Harari, A and Baker, BM (2020) Structural dissimilarity from self drives neoepitope escape from immune tolerance. Nature Chemical Biology 16(11), 12691276.CrossRefGoogle ScholarPubMed
Ferdous, S and Martin, ACR (2018) AbDb: Antibody structure database—A database of PDB-derived antibody structures. Database 2018, bay040.CrossRefGoogle ScholarPubMed
Garrido, F (2024) The discovery of the major histocompatibility complex (MHC): The H-2 in mice and the HLA in man. In The Major Histocompatibility Complex (MHC/HLA) in Medicine: A Personal Recollection. Springer, Cham, pp. 113. https://doi.org/10.1007/978-3-031-59866-1_1.CrossRefGoogle Scholar
Handel, A, La Gruta, NL and Thomas, PG (2020) Simulation modelling for immunologists. Nature Reviews Immunology 20(3), 186195.CrossRefGoogle ScholarPubMed
He, L and Zhu, J (2015) Computational tools for epitope vaccine design and evaluation. Current Opinion in Virology 11, 103112.CrossRefGoogle ScholarPubMed
Hertig, S and Vogel, V (2012) Catch bonds. Current Biology 22, R823R825, 10.CrossRefGoogle ScholarPubMed
Hull, P (1970) Notes on dr snell’s observations concerning the h-2 locus polymorphism.CrossRefGoogle Scholar
Hummer, AM, Schneider, C, Chinery, L and Deane, CM (2023) Investigating the volume and diversity of data needed for generalizable antibody-antigen G prediction. bioRxiv, 20232025.Google Scholar
Janeway, CA Jr, Travers, P, Walport, M and Shlomchik, MJ (2001) The major histocompatibility complex and its functions. In Immunobiology: The Immune System in Health and Disease, 5th edn. New York: Garland Science. Available from: https://www.ncbi.nlm.nih.gov/books/NBK27156/Google Scholar
Jespersen, MC, Peters, B, Nielsen, M and Marcatili, P (2017) BepiPred-2.0: Improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Research 45(W1), W24W29.CrossRefGoogle ScholarPubMed
Jumper, J, Evans, R, Pritzel, A, Green, T, Figurnov, M, Ronneberger, O, Tunyasuvunakool, K, Bates, R, Žídek, A, Potapenko, A, Bridgland, A, Meyer, C, Kohl, SAA, Ballard, AJ, Cowie, A, Romera-Paredes, B, Nikolov, S, Jain, R, Adler, J, Back, T, Petersen, S, Reiman, D, Clancy, E, Zielinski, M, Steinegger, M, Pacholska, M, Berghammer, T, Bodenstein, S, Silver, D, Vinyals, O, Senior, AW, Kavukcuoglu, K, Kohli, P and Hassabis, D (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), 583589. https://doi.org/10.1038/s41586-021-03819-2.CrossRefGoogle ScholarPubMed
Keller, GLJ, Weiss, LI and Baker, BM (2022) Physicochemical heuristics for identifying high fidelity, near-native structural models of peptide/MHC complexes. Frontiers in Immunology 13, 887759.CrossRefGoogle ScholarPubMed
Koopmans, L and Youk, H (2021) Predictive landscapes hidden beneath biological cellular automata. Journal of Biological Physics 47(4), 355369.CrossRefGoogle ScholarPubMed
Kyeong, H-H, Choi, Y and Kim, H-S (2018) GradDock: Rapid simulation and tailored ranking functions for peptide-MHC class I docking. Bioinformatics 34(3), 469476.CrossRefGoogle Scholar
Lucas, TM, Gupta, C, Altman, MO, Sanchez, E, Naticchia, MR, Gagneux, P, Singharoy, A and Godula, K (2021) Mucin-mimetic glycan arrays integrating machine learning for analyzing receptor pattern recognition by influenza a viruses. Chem 7(12), 33933411.CrossRefGoogle ScholarPubMed
Marshall, BT, Long, M, Piper, JW, Yago, T, McEver, RP and Zhu, C (2003) Direct observation of catch bonds involving cell-adhesion molecules. Nature 423, 190193, 5.CrossRefGoogle ScholarPubMed
Marty, R, Kaabinejadian, S, Rossell, D, Slifker, MJ, van de Haar, J, Engin, HB, de Prisco, N, Ideker, T, Hildebrand, WH, Font-Burgada, J and Carter, H (2017) MHC-I genotype restricts the oncogenic mutational landscape. Cell 171(6), 12721283. https://doi.org/10.1016/j.cell.2017.09.050.CrossRefGoogle ScholarPubMed
Mehdi, S, Smith, Z, Herron, L, Zou, Z and Tiwary, P (2024) Enhanced sampling with machine learning. Annual Review of Physical Chemistry 75, 347370.CrossRefGoogle ScholarPubMed
Młokosiewicz, J, Deszyński, P, Wilman, W, Jaszczyszyn, I, Ganesan, R, Kovaltsuk, A, Leem, J, Galson, JD and Krawczyk, K (2022) AbDiver: A tool to explore the natural antibody landscape to aid therapeutic design. Bioinformatics 38(9), 26282630.CrossRefGoogle ScholarPubMed
Mora, T and Walczak, AM (2018) Quantifying lymphocyte receptor diversity. In Systems Immunology: An Introduction to Modeling Methods for Scientists. Das, J, ed. CRC Press, Taylor and Francis, pp. 183198.Google Scholar
Nagy, B and Jensen, F (2017) Basis sets in quantum chemistry. Reviews in Computational Chemistry 30, 93149.Google Scholar
Parizi, FM, Marzella, DF, Ramakrishnan, G, t Hoen, PAC, Karimi-Jafari, MH and Xue, LC (2023) PANDORA v2. 0: Benchmarking peptide-MHC II models and software improvements. Frontiers in Immunology 14, 1285899.CrossRefGoogle Scholar
Park, S and Schulten, K (2004) Calculating potentials of mean force from steered molecular dynamics simulations. The Journal of Chemical Physics 120(13), 59465961.CrossRefGoogle ScholarPubMed
Peters, B, Nielsen, M and Sette, A (2020) T cell epitope predictions. Annual Review of Immunology 38(1), 123145.CrossRefGoogle ScholarPubMed
Petrovsky, N and Brusic, V (2002) Computational immunology: The coming of age. Immunology and Cell Biology 80(3), 248254.CrossRefGoogle ScholarPubMed
Ponomarenko, JV and Bourne, PE (2007) Antibody-protein interactions: Benchmark datasets and prediction tools evaluation. BMC Structural Biology 7, 119.CrossRefGoogle ScholarPubMed
Raha, R, Ding, Y, Liu, Q and Wu, F-X (2022) Unseen epitope-TCR interaction prediction based on amino acid physicochemical properties. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Las Vegas, NV: IEEE, pp. 31223129, doi: 10.1109/BIBM55620.2022.9995066.CrossRefGoogle Scholar
Rigden, DJ and Fernández, XM (2023) The 2024 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Research 52, D1D9, 10.CrossRefGoogle Scholar
Rigo, MM, Antunes, DA, de Freitas, M, de Almeida Mendes, M, Meira, L, gaglia, MS and Vieira, GF (2015) DockTope: A web-based tool for automated pMHC-I modelling. Scientific Reports 5(1), 18413.CrossRefGoogle ScholarPubMed
Ross, R (1911) The Prevention of Malaria. E.P. Dutton and company, New York.Google ScholarPubMed
Schneider, C, Raybould, MIJ and Deane, CM (2022) SAbDab in the age of biotherapeutics: Updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Research 50(D1), D1368D1372.CrossRefGoogle ScholarPubMed
Schoeler, C, Malinowska, KH, Bernardi, RC, Milles, LF, Jobst, MA, Durner, E, Ott, W, Fried, DB, Bayer, EA, Schulten, K and Gaub, HE (2014) Ultrastable cellulosome-adhesion complex tightens under load. Nature Communications 5(1), 5635.CrossRefGoogle ScholarPubMed
Sokurenko, EV, Vogel, V and Thomas, WE (2008) Catch bond mechanism of force-enhanced adhesion: Counter-intuitive, elusive but … Widespread?. Cell Host & Microbe 4, 314, 10.CrossRefGoogle Scholar
Wertheim, KY, Puniya, BL, La Fleur, A, Shah, AR, Barberis, M and Helikar, T (2021) A multi-approach and multi-scale platform to model cd4+ t cells responding to infections. PLoS Computational Biology 17(8), e1009209.CrossRefGoogle ScholarPubMed
Wieczorek, M, Abualrous, ET, Sticht, J, Álvaro-Benito, M, Stolzenberg, S, Noé, F and Freund, C (2017) Major histocompatibility complex (mhc) class i and mhc class ii proteins: Conformational plasticity in antigen presentation. Frontiers in Immunology 8, 292.CrossRefGoogle Scholar
Wilson, E, Cava, JK, Chowell, D, Raja, R, Mangalaparthi, KK, Pandey, A, Curtis, M, Anderson, KS and Singharoy, A (2024) The electrostatic landscape of MHC-peptide binding revealed using inception networks. Cell Systems 15(4), 362373.CrossRefGoogle ScholarPubMed
Wilson, EA, Hirneise, G, Singharoy, A and Anderson, KS (2021) Total predicted MHC-I epitope load is inversely associated with population mortality from SARS-CoV-2. Cell Reports Medicine 2(3).CrossRefGoogle ScholarPubMed
Zhang, C, Anderson, A and DeLisi, C (1998) Structural principles that govern the peptide-binding motifs of class I MHC molecules. Journal of Molecular Biology 281(5), 929947.CrossRefGoogle ScholarPubMed
Zhao, X, Kolawole, EM, Chan, W, Feng, Y, Yang, X, Gee, MH, Jude, KM, Sibener, LV, Fordyce, PM, Germain, RN, Evavold, BD and Garcia, KC (2022) Tuning t cell receptor sensitivity through catch bond engineering. Science 376(6589), eabl5282. https://doi.org/10.1126/science.abl5282.CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. The scales of computational immunology models from atomistic to macroscales.

Figure 1

Figure 2. A comprehensive list of immunological problems and their biophysical representations. Illustrations – 1. Antibody (PDB-1IGT), 2. MHC (PDB-1HHK), 3. TCR (from RCSB-PDB), 4. Viral vector ChAdOx1, 5. Whole-cell illustration, and 6. Epitope (PDB-3PP4).

Author comment: The dawn of biophysical representations in computational immunology — R0/PR1

Comments

Dear Dr. Finan,

Please fine herewith a review perspective on “The dawn of biophysical representations in computational immunology” by Eric Wilson, Akshansh Kaushik, Soumya Dutta, and Abhishek Singharoy for consideration towards an invited perspective on “Integrated Biophysics: how to probe biological process with complementary multiscale techniques”.

The article aims to summarize multiscale computational immunology models that leverage recent advances in biophysics encompassing quantum, classical, hybrid biophysical models to solve important outstanding questions in the field. We have been able to come up with concise table of accomplishments and discoveries that we deem notable in this area, and which will be of eminence for the computational immunology community at large. We have stressed on the key computational methods and have highlighted the advantages of integrating biophysics in a area traditional dominated by sequence-based analysis. We end offering both a personal and community perspective on the role of biophysics in computational immunology, and it’s great promise for advancing the field in general.

Warm regards,

Abhishek Signharoy, Ph.D.

Assistant Professor, School of Molecular Sciences

Biodesign Center for Applied Structural Discovery

Arizona State University

Web: https://web.asu.edu/abhi.

Phone: 812-369-3268

Review: The dawn of biophysical representations in computational immunology — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

The manuscript is a well-composed exploration of how biophysical models can enhance traditional computational immunology approaches. The authors emphasize that while sequence-based methods dominate the field, integrating biophysical representations improves efficiency, reduces complexity, and provides deeper insights into immune responses. They discuss advancements in antibody free energy calculations, MHC modeling, and TCR catch bonds, illustrating how these tools address the limitations of sequence-only approaches. Additionally, the manuscript highlights the potential of molecular descriptors, such as contact matrices, for vaccine design and pathogen interaction studies, while connecting biophysical properties to disease progression predictions on a macro scale. The authors advocate for integrating biophysical models to complement existing methods, calling for a shift towards generative models that incorporate physical principles for improved accuracy and relevance.

With minor revisions, the manuscript could be further enhanced:

1) The section on antibody free energy descriptions would benefit from a concluding sentence to connect the findings to the broader theme of biophysical modeling.

2) The sub-sections on MHC and TCR could better summarize the practical implications of these biophysical descriptors.

3) The discussion on MHC-I genotype analysis, while robust, could be strengthened by clarifying how these insights directly influence therapeutic or diagnostic advancements.

Recommendation: The dawn of biophysical representations in computational immunology — R0/PR3

Comments

No accompanying comment.

Decision: The dawn of biophysical representations in computational immunology — R0/PR4

Comments

No accompanying comment.

Author comment: The dawn of biophysical representations in computational immunology — R1/PR5

Comments

Please find in the colored pdf, we have now highlighted some practical applications of the stated biophysical representation, including potential diagnostic outcomes.

Recommendation: The dawn of biophysical representations in computational immunology — R1/PR6

Comments

I have reviewed the authors’ responses for this manuscript (QRBD-2024-0013.R1) and I think that we can accept the revised version without sending the manuscript back to the reviewers.

Can you please accept this manuscript?

Decision: The dawn of biophysical representations in computational immunology — R1/PR7

Comments

No accompanying comment.