Introduction
The Protein Data Bank (PDB) was the first open-access digital archive in biology; it was a vanguard in the open-access data movement (Berman, Reference Berman2008; Protein Data Bank, 1971). Over the past fifty-three-plus years, it has co-evolved with the scientific research it supports and continually embraced new technologies for 3D biostructure data deposition, validation, biocuration, preservation, and dissemination.
In this review, we first trace how the contents of the PDB have grown in terms of the types of structures and the experimental methods used for determination. We show how cyberinfrastructure has evolved in parallel to meet the needs of the ever-growing archive. We then describe how data standards and policies were established in collaboration with a growing cohort of stakeholders, thereby enabling the PDB to be a pioneer in embracing the FAIR (Findability, Accessibility, Interoperability, and Reusability (Wilkinson et al., Reference Wilkinson2016)), FACT (FAIRness, Accuracy, Confidentiality, and Transparency (van der Aalst et al., Reference van der Aalst, Bichler and Heinzl2017)), and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology (Lin et al., Reference Lin2020)) principles emblematic of responsible data management. The costs of data capture, archiving, and delivery in accord with the FAIR and FACT principles are also discussed.
Thereafter, we describe the immense impact of the PDB on basic and applied research in virtually all areas of biology and medicine. The impact of the PDB on structure-guided drug discovery and vaccine development played central roles in helping to reduce the ravages of HIV and combat the COVID-19 pandemic. The existence of the PDB led to the creation of a new field of science–structural bioinformatics–that, in turn, yielded transformative advances in protein structure prediction and design. The impact of the PDB on the chemical, computational, mathematical, physical, and social sciences is also described.
Evolution and growth of the PDB
Content of the PDB
The types of structures archived in the PDB have evolved with the progress of structural biology. In the 1970s, atomic-level 3D biostructures were mostly smaller, single-domain globular proteins (Figure 1), such as myoglobin (Kendrew et al., Reference Kendrew1958; Kendrew et al., Reference Kendrew1960), hemoglobin (Bolton and Perutz, Reference Bolton and Perutz1970; Perutz et al., Reference Perutz1960), lysozyme (Blake et al., Reference Blake1965), and ribonuclease (Kartha et al., Reference Kartha, Bello and Harker1967; Wyckoff et al., Reference Wyckoff1967). The first experimental structure of a large nucleic acid, yeast Phe tRNA, was determined in 1974 (Kim et al., Reference Kim1973; Robertus et al., Reference Robertus1974). The 1980s saw the first atomic-resolution structure of a full turn of a B-DNA double helix (Drew et al., Reference Drew1981), icosahedral virus structures (Abad-Zapatero et al., Reference Abad-Zapatero1980; Harrison et al., Reference Harrison1978), and an ever-increasing number of protein structures. In the late 1980s, the structure of the first protein-nucleic acid complex was determined (Anderson et al., Reference Anderson, Ptashne and Harrison1987), and then the first nucleosome structure was determined in the late 1990s (Luger et al., Reference Luger1997). The 2000s saw the determination of the first ribosome structures (Ban et al., Reference Ban2000; Carter et al., Reference Carter2000; Schluenzen et al., Reference Schluenzen2000). By 1999, the PDB archive had reached 10,000 structures. By 2014, the archive housed 100,000 structures; now, there are more than 230,000 structures (Figure 2).

Figure 1. Early structures in the PDB: (a) Oxygen carrying; (b) enzymes. (c) Electron transport. Images from Molecule of the Month: PDB Pioneers (Goodsell Reference Goodsell2011).

Figure 2. Overall growth of structures released in the PDB archive (https://www.rcsb.org/stats).
The size and complexity of structures deposited in the PDB reflect advances in technologies available for structure determination. In the early days, atomic-level structures were determined exclusively by macromolecular X-ray crystallography (MX). Although the steps needed to determine a structure remain the same (Figure 3a), methods for carrying out each step have evolved significantly over the years. In the 1950s, proteins were purified at a large scale from natural sources (e.g., sperm whale muscle tissue) and crystallized using batch methods. Data were collected on photographic films using CuK𝛼 X-ray sources. The phases of each structure factor were determined using multiple isomorphous replacement in which additional diffraction data are measured from crystals soaked with heavy-atom labeling reagents. Resulting electron density maps were interpreted by building atomic models manually at 5 cm/1 Å scale with Kendrew’s wire models inside a Richard’s box (Martz and Francoeur, Reference Martz and Francoeur2004; Richards, Reference Richards1968). Until the late 1970s, the fit of atomic coordinates to electron density maps was not computationally optimized (refined); rubredoxin (a 54 amino acid protein, PDB ID 4rxn (Watenpaugh et al., Reference Watenpaugh, Sieker and Jensen1980)) was the first protein structure to be refined against experimental data (Watenpaugh et al., Reference Watenpaugh1972). New technologies for gene cloning and facile expression of exogenous proteins in Escherichia coli and so forth enabled rapid production of large quantities of proteins for structural analyses. Moreover, having control of which part of a protein to express enabled studies of individual protein domains when full-length proteins could not be crystallized. Multi-well hanging-drop/vapor diffusion crystallization plates began to be used for manual crystallization trials. Over time, crystallization reagent kits were designed, and robots did the job of setting up and screening for crystallinity (McPherson, Reference McPherson2017). The advent of bright synchrotron radiation sources made it possible to have more intense X-rays at tunable wavelengths, the latter supporting development of multiple-wavelength anomalous dispersion or MAD phase determination (Hendrickson, Reference Hendrickson1985). X-ray detectors have also improved dramatically in terms of efficiency and speed.

Figure 3. Structure determination pipelines for (a) MX, (b)NMR, and (c) 3DEM. Figure from https://pdb101.rcsb.org/learn/pdb-and-data-archiving-curriculum/about/ (Lawson et al. Reference Lawson2018).
Together, these myriad technical advances helped inspire the launch of the National Institute of General Medical Sciences Protein Structure Initiative (Norvell and Machalek, Reference Norvell and Machalek2000) to determine the structures of all unique protein shapes, which, in addition to increasing the number and quality of PDB structures, resulted in major efficiency improvements in structure determination processes. These advances have made it possible to use extremely small samples and, for smaller proteins, produce structures in a matter of days to weeks rather than years. Today, efforts are being devoted to ever more challenging problems (e.g., integral membrane proteins, large multi-complex protein assemblies). Total archival holdings of MX structures as of January 2025 were ~191,000. Public release of new MX structures by the PDB averaged ~10,000/year for 2019–2024.
Nuclear magnetic resonance or NMR spectroscopy emerged as a structure determination method in the 1980s (Williamson et al., Reference Williamson, Havel and Wuthrich1985) (Figure 3b). Unlike MX, most NMR samples are dilute solutions (typically ~5 mM), which can make sample preparation easier. While relatively small protein structures are amenable to NMR structure determination methods, the technique is particularly well suited to measuring protein dynamics and exploring the behavior of intrinsically disordered proteins. Total archival holdings of NMR structures as of January 2025 were ~14,400, most of which are represented as ensembles of atomic-level structures. Public release of new NMR structures by the PDB has plateaued to a few hundred/year (311 in 2024).
The 1990s saw PDB deposition of the first electron microscopy or 3DEM structure (bacteriorhodopsin (Henderson and Schertler, Reference Henderson and Schertler1990)) (Figure 3c). 3DEM offers three critical advantages versus MX: (i) crystals are not required, (ii) it is suitable for studying larger macromolecular systems, and (iii) it can be used for compositionally and conformationally heterogeneous samples. Over more than thirty years, significant advances in sample preparation and vitrification, electron optics, direct electron detection, motion correction, and cyberinfrastructure have made it possible to determine 3DEM structures at higher and higher resolution, leading to what has been termed the “Resolution Revolution (Kuhlbrandt, Reference Kuhlbrandt2014).” As of late January 2025, 3DEM structure holdings in PDB exceeded those of NMR (24,379 versus 14,440), and public release of new 3DEM structures in 2024 by PDB was ~63% of new MX structures (5793 versus 9241). At the same time, the highest resolution 3DEM structure archived in the PDB was that of murine apo-ferritin at 1.09 Å resolution (PDB ID 8rqb (Kucukoglu et al., Reference Kucukoglu2024)).
In 2014, a structure of a Nup-84 sub-complex of the Saccharomyces cerevisiae nuclear pore complex was among the very first integrative structures to be determined by combining experimental information from multiple methods (PDB-Dev ID PDBDEV_00000001/PDB/PDB-IHM ID 8zz1 (Shi et al., Reference Shi2014)). Structures determined using information from various biophysical (e.g., MX, 3DEM, NMR, small-angle scattering, cross-linking mass spectrometry, Forster resonance energy transfer (FRET)) and computational (e.g., homology modeling and de novo structure prediction) methods are classified as integrative/hybrid methods (IHM) structures, which typically could not have been determined using a single method. Some of the early IHM structures were archived in a prototype data resource, PDB-Dev (pdb-dev.wwpdb.org (Burley et al., Reference Burley2017; Vallat et al., Reference Vallat2021; Vallat et al., Reference Vallat2018)). In late 2024, the contents of the PDB-Dev prototype resource were unified with PDB holdings and designated as PDB-IHM structures. Each of the original PDB-Dev structures now has both a PDB-Dev ID and a PDB ID. PDB-Dev has been rebranded as PDB-IHM (pdb-ihm.org (Vallat B et al., Reference Vallatin press)).
Policies
Policies for managing PDB data have evolved considerably since 1971. At the outset, deposition was purely voluntary. In the 1980s, it became clear that unless there were deposition guidelines, there was a high likelihood that valuable data would be lost. Fred Richards worked with colleagues on a petition demanding that deposition be a prerequisite for publication (Barinaga, Reference Barinaga1989). The Biological Macromolecular Commission of the IUCr convened a committee of prominent structural biologists to establish data deposition guidelines. In 1989, these guidelines were published (International Union of Crystallography, 1989), and in time, most scientific journals began requiring the deposition of 3D biostructure to the PDB (as evidenced by the inclusion of a valid PDB ID) as a prerequisite for publication. Many funding organizations, both governmental and philanthropic, require PDB depositions by their awardees.
In 2003, the Worldwide Protein Data Bank (wwPDB) was established as a global consortium partnership to jointly manage the PDB archive (Berman et al., Reference Berman, Henrick and Nakamura2003). PDB data centers in the US (RCSB Protein Data Bank or RCSB PDB), UK (Protein Data Bank in Europe or PDBe), and Japan (Protein Data Bank Japan or PDBj) were signatories to the first formal wwPDB Charter developed to ensure that all PDB data follow uniform standards and that the information remains freely available. The Charter is reviewed and renewed regularly. Current members include RCSB PDB (Berman et al., Reference Berman2000; Burley et al., Reference Burley2025), PDBe (Armstrong et al., Reference Armstrong2020), PDBj (Kinjo et al., Reference Kinjo2018), Biological Magnetic Resonance Bank (BMRB (Hoch et al., Reference Hoch2023; Romero et al., Reference Romero2020; Ulrich et al., Reference Ulrich2008)), and Electron Microscopy Data Bank (EMDB (wwPDB Consortium, 2023)). Protein Data Bank China (PDBc (Xu et al., Reference Xu2023)) recently joined the organization as an Associate Member. Each wwPDB data center is responsible for ingesting structures deposited from within their assigned geographic catchment area (RCSB PDB: Americas and Oceania; PDBe: Europe, Africa, and Israel), PDBj (Asia and the Middle East), and PDBc (People’s Republic of China). Leaders of each wwPDB partner organization meet frequently with one another and annually with the wwPDB Advisory Committee (https://www.wwpdb.org/about/advisory).
At present, wwPDB members jointly manage three Core Archives, including the Protein Data Bank, the Electron Microscopy Data Bank, and the Biological Magnetic Resonance Data Bank. Each wwPDB Core Archive is safeguarded and maintained by a wwPDB-designated Archive Keeper as follows: Protein Data Bank: RCSB PDB; Electron Microscopy Data Bank-EMDB; Biological Magnetic Resonance Data Bank-BMRB. The PDB Core Archive houses atomic coordinates of all PDB structures and related metadata and experimental data for all MX structures. The EMDB Core Archive houses electric Coulomb potential maps (hereafter 3DEM density maps) for all 3DEM structures stored in PDB and a sizeable number of additional density maps with no corresponding atomic coordinates in PDB (typically derived from lower-resolution 3DEM studies). The BMRB Core Archive houses NMR data for NMR structures stored in PDB and a considerable volume of additional biomolecule NMR data with no corresponding atomic coordinates in PDB. wwPDB members also jointly manage the NextGen PDB archive (files-nextgen.wwpdb.org), which is an enriched PDB archive that includes annotations from external database resources in the metadata that goes beyond content available in the PDB main archive (Choudhary et al., Reference Choudhary2024).
Evolving infrastructure for ingesting, managing, and delivering PDB data
Key steps required for operating an open-access repository are data ingestion, validation, biocuration, archiving, query, and distribution (https://pdb101.rcsb.org/learn/pdb-and-data-archiving-curriculum/about/ (Lawson et al., Reference Lawson2018)). In this section, we review how cyberinfrastructure supporting the PDB has evolved during more than 53 years of continuous PDB operations.
In 1971, data were transferred to magnetic tapes and mailed to the PDB, where they were processed on Control Data Corporation CDC 6600/7600 main-frame computers, which at the time were state-of-the-art machines. The CDC 6600 was networked via DARPANET (a precursor to the World Wide Web) to graphics workstations at two locations in the United States, allowing visualization of 3D biostructure data. The CRYSNET project, funded by the United States (US) National Science Foundation in 1973, was innovative in its time and provided some financial support for the PDB (Meyer et al., Reference Meyer1974).
Initially, atomic coordinate data were stored in the Diamond format (Diamond, Reference Diamond1971). In 1976, the (now legacy) PDB file format, based on the 80-column Hollerith punch cards, was created and became extremely popular and widely used by the structural biology community (Bernstein et al., Reference Bernstein1977). As the size and complexity of structures grew, it became clear that the 80-column format limited the number of atoms and/or polymer chains that could be contained in a single structure data file. During the 1990s, a Working Group convened by the International Union of Crystallography (IUCr) Commission on Biological Macromolecules developed a machine- readable format that was self-defining and gave explicit relationships (Fitzgerald et al., Reference Fitzgerald, Hall and McMahon2005). The new “macromolecular Crystallographic Information File (mmCIF)” has no limitations on the number of atoms or residues. As the PDB archive contains structures determined by severa methods, the format is now called PDBx/mmCIF (Westbrook et al., Reference Westbrook2022). It became the Master PDB archival format in 2014 (Berman et al., Reference Berman2014).
At present, PDBx/mmCIF is jointly maintained by the wwPDB PDBx/mmCIF Working Group (Westbrook et al., Reference Westbrook2022) and the wwPDB partners. Data dictionary terms and definitions are continuously formulated, reviewed, and modified to support existing data remediation and inclusion of new and rapidly evolving methodologies. This fully extensible data standard also supports data items and metadata elements for newer experimental methods that could not be accommodated within the restrictive legacy PDB format.
Once the World Wide Web became available in the 1990s, a computer program named AutoDep made it possible to deposit 3D biostructure data to PDB electronically (Lin et al., Reference Lin2000). For more than two decades, Brookhaven National Laboratory (BNL) hosted the only PDB data center that accepted depositions and processed incoming atomic coordinate data. In the late 1990s, a data center at the European Bioinformatics Institute (originally called the Macromolecular Structure Database, later rebranded as PDBe) began a collaboration with the BNL PDB to process data (Boutselakis et al., Reference Boutselakis2003). In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) formed as a collaboration between Rutgers, The State University of New Jersey, the National Institute for Standards and Technology, and the San Diego Supercomputer Center successfully competed for US federal agency funding to manage the United States (US) PDB data center (Berman et al., Reference Berman2000). Central to the RCSB PDB project was its development of an integrated 3D biostructure data deposition system (AutoDep Input Tool, ADIT), built atop the PDBx/mmCIF data dictionary. PDBj (Kinjo et al., Reference Kinjo2018) became the first Asian PDB data center in 2000. Initially, two different data deposition systems were used by the wwPDB: AutoDep by PDBe and ADIT by RCSB PDB and PDBj. To ensure that data were fully consistent, data were exchanged regularly between sites and reviewed. A major software development project created the global OneDep system (Young et al., Reference Young2017) for comprehensive deposition, rigorous validation (Feng et al., Reference Feng2021; Gore et al., Reference Gore2017; Young et al., Reference Young2017), and expert biocuration (Young et al., Reference Young2018) of MX, 3DEM, NMR, and micro-electron diffraction structures, supporting experimental data and related metadata. Biocuration involves checking for self-consistencies, enforcing controlled vocabularies that are part of the PDBx/mmCIF dictionary, checking polymer sequences against the sequence databases, standardization of ligand atom naming, etc., and value-added annotations (i.e., disease-causing mutations and quaternary structure information).
In the early days of the PDB, validation was focused on the geometry and chemistry of both macromolecules and bound small-molecule ligands. In addition to polypeptide backbone Ramachandran checks, MolProbity evaluation (Williams et al., Reference Williams2018) became the part of wwPDB validation. Although it was always possible for MX structure factor data to be deposited into PDB, it was not until 2008 that they became mandatory (wwPDB, 2024). That important policy change made it possible for OneDep to validate atomic coordinates against experimental electron density map data. NMR chemical shift deposition became mandatory in 2010 (wwPDB, 2024). For 3DEM structures, deposition of 3DEM density maps became mandatory in 2016 (wwPDB, 2024).
In 2008, the wwPDB began establishing a series of Task Forces to define validation criteria for each structure determination method supported by the PDB (Henderson et al., Reference Henderson2012; Montelione et al., Reference Montelione2013; Read et al., Reference Read2011; Sali et al., Reference Sali2015; Trewhella et al., Reference Trewhella2013; wwPDB, 2024). Method-specific Task Forces, consisting of subject matter experts, evaluated procedures for rigorous assessment of structures with reference data sets and made recommendations to the wwPDB for adoption within the OneDep software system. Their efforts gave rise to a rich suite of structure validation tools that are today used to generate a wwPDB Validation Report for every incoming structure (Gore et al., Reference Gore2017; Gore et al., Reference Gore, Velankar and Kleywegt2012; Smart et al., Reference Smart2018a, Reference Smartb; Young et al., Reference Young2017). These validation reports are first used by depositors, then journal editors and manuscript reviewers, and finally by PDB data consumers.
Once all the structure data and related metadata are validated and reviewed by wwPDB biocurators and depositors, they are archived as flat PDBx/mmCIF formatted files. Other research communities have followed suit, and now there are multiple working groups setting data formatting, archiving, and validation standards for various biophysical methods (Hanke et al., Reference Hanke2024; Leitner et al., Reference Leitner2020; Trewhella, Reference Trewhella2018).
An important attribute of the PDBx/mmCIF data dictionary/data standard is that all of the information is stored as tables, which lend themselves to creating a relational database that can be searched efficiently. In the 1990s, the Nucleic Acid Database (NDB) became a testbed for the utility of such a database built atop the PDBx/mmCIF data standard (Berman et al., Reference Berman1992). NDB proved to be fit for purpose; it supported many different kinds of queries of the data and presented results in various formats. Building on this experience, RCSB PDB used PDB data stored in PDBx/mmCIF format to build a core database and integrated features from external databases to provide rich contextual reports for every structure in the PDB (Berman et al., Reference Berman2000).
From 1977 to 1992, PDB data were distributed via magnetic tape. Just one tape was sufficient to store a copy of the entire archive. In 1977, a total of 14 tapes, each housing 77 structures, were publicly distributed; in 1992, 262 tapes (957 structures). Thereafter, distribution of PDB data utilized CDs, followed by DVDs first by BNL and then RCSB PDB. In the late 1990s, it became possible to distribute information via the internet (Stampf et al., Reference Stampf, Felder and Sussman1995), and now it is the only way PDB data are distributed. In 2023, more than 3.1 billion structure data files were downloaded from the PDB main archive and web portals operated by RCSB PDB, PDBe, and PDBj combined.
PDB stakeholders
When the PDB was launched, almost all of its users were data depositors – structural biologists. Before deposition became mandatory, motivations for deposition varied, including the assurance that the data would never be lost, the desire to have someone else check the data for serious errors, or the desire to share scientific information for the public good. As archival holdings grew, protein crystallographers increasingly used previously deposited structure data to determine new structures via the molecular replacement approach to diffraction data phasing. PDB structures are also used to interpret lower-resolution 3DEM density maps. Computational biologists began to use the resource to classify and compare structures, thus creating a whole new field of structural bioinformatics. Drug companies began to use the PDB to facilitate structure-guided drug discovery. Educators began to use the PDB to teach biology at all levels. Computer scientists, mathematicians, and statisticians used the large PDB data set for their analyses. Today, structural biologists probably represent <1% of the very large and diverse community of PDB data consumers numbering in the many millions worldwide.
The PDB has received funding from US government agencies since its inception. Current funders of wwPDB members are as follows: RCSB PDB: US National Science Foundation, National Institutes of Health, and US Department of Energy; PDBe: European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Trust, Biotechnology and Biological Sciences Research Council, Medical Research Council, and European Union; PDBj: Japan Science and Technology Agency Department for Information Infrastructure and Japan Agency for Medical Research and Development; EMDB: European Molecular Biology Laboratory-European Bioinformatics Institute, and Wellcome Trust; BMRB: National Institute of General Medical Sciences; PDBc: Shanghai Advanced Research Institute, Chinese Academy of Sciences, and ShanghaiTech University.
Costs and benefits of 3D biostructure data archiving
Preservation and dissemination of research results have never been free. The purchase price of Charles Darwin’s “On the Origin of the Species” was nearly US$100 in today’s money when first published in 1859. That volume was but a summary of the vast amount of information that Darwin assembled and analyzed before presenting his ideas on natural selection to the world. Notwithstanding the cost of the book and the enormous body of research Darwin and others undertook to make it possible, there can be no serious debate as to the value proposition of preserving the observations and ideas that went on to stimulate generations of biologists in developing our current understanding of evolution. The second half of this review explores the cost of preserving and disseminating 3D biostructure data and enumerates tangible benefits therefrom.
How much does it cost to capture, archive, and distribute PDB data?
To explain how much it costs to ingest, safeguard, and distribute PDB data, we used key performance indicators and other metrics documented during 2023 RCSB PDB operations.
We first provide a summary of the results, followed by the full details.
-
1. Average one-time cost to archive each new PDB structure in 2023 is ~US$420 (<1% of the estimated cost of an inexpensive determination of the MX structure of a single domain globular protein from a prokaryote).
-
2. Average annual cost to preserve a PDB structure in 2023 was ~US$10 (<0.01% of the estimated average replacement cost of a PDB structure).
-
3. Average annual cost to serve a unique IP Address Client from the RCSB PDB research-focused web portal RCSB.org in 2023 was ~30 US Cents.
During 2023, wwPDB partners based in the US, Europe, and Asia received a record 17,063 new depositions from structural biologists working on every inhabited continent. RCSB PDB processed 6,698 (~40%) of the global depositions. US federal agency grant monies budgeted for data ingestion, rigorous validation, and expert biocuration at RCSB PDB in 2023 totaled ~US$2.8 million. The 2023 one-time, non-recurring cost to ensure that these 3D biostructure data are FAIR was ~US$420/structure received/processed.
In its role as wwPDB-designated PDB Archive Keeper during the same period, RCSB PDB safeguarded the data for a total of 214,070 PDB structures (~1.4 TB of total digital storage), with an estimated replacement cost of ~US$21 billion. US federal agency grant monies budgeted for data preservation by RCSB PDB in 2023 totaled ~US$2.2 million. The annual recurring cost to preserve and safeguard the entire PDB archive was ~US$10.30/structure/year (in 2023).
Under the wwPDB charter, RCSB PDB disseminates data at no charge and with no limitations on its usage from its RCSB.org research-focused web portal. In 2023, the web portal hosted visits from ~8.2 million unique IP addresses in nearly every country and territory recognized by the United Nations. US federal agency grant monies budgeted for data dissemination at RCSB PDB in 2023 totaled ~US$2.5 million. The average recurring cost of serving each RCSB.org client in 2023 was ~30 USCents/RCSB.org unique IP Address Client. These data dissemination metrics and cost estimates do not include any users the wwPDB reaches indirectly when PDB data are reused and redistributed by nearly 500 external data resources, serving many millions more users. They also do not capture usage statistics when copies of the PDB archive are held inside biopharmaceutical and biotechnology company firewalls or stored locally for the convenience of large bioinformatics research teams in academia.
What is the value in capturing, securely archiving, and freely distributing PDB data?
We now review the enormous impact of open-access PDB data on structural biology as a discipline; the natural, chemical, engineering, mathematical, and physical sciences; biomedicine; biotechnology innovation; protein structure prediction; the global biodata ecosystem; and regional, US, and global economies.
Structural biology as a scientific discipline
Open access to PDB data has both accelerated the development of structural biology as a scientific discipline and enabled its reproducibility. For MX, >90% of new structures are today determined by molecular replacement using previously deposited PDB structures or Computed Structure Models (CSMs, see the Protein Structure Prediction section) to overcome the crystallographic phase problem. The development of NMR and 3DEM as mainstream structural biology methods benefited significantly from open access to the PDB, BMRB, and EMDB Core Archives. wwPDB Biocurators and OneDep structure validation tools contribute to the reproducibility of experimental methods currently supported by the PDB. Inarguably, MX is among the most reproducible experimental techniques in the biological sciences (Liebschner et al., Reference Liebschner2013).
Open-access PDB data have also enabled the analyses of ensembles of structures to understand common principles in macromolecular anatomy and biological and biochemical function. Archival contents have been used to classify and understand protein domains, and there now exist knowledge bases providing such classifications and grouping them into superfamilies (Conte et al., Reference Conte2000; Orengo et al., Reference Orengo1997). Protein–protein interactions (Jones and Thornton, Reference Jones and Thornton1996) and protein-nucleic acid interactions have also been analyzed across the archive (Jones et al., Reference Jones2001). More than 1,000 papers describing these types of analyses and resources have been published to date (Basner, Reference Basner2017).
Natural, chemical, computational, engineering, mathematical, physical sciences, and beyond
Knowledge of 3D structures (shapes) of biomolecules helps to explain how they function in nature, accelerating discovery across the biosciences. PDB structures include proteins and nucleic acids coming from every living kingdom (see Figure 7 in (Burley et al., Reference Burley2022)) and increasing numbers of designed biopolymers. Among the latter are MX structures of a designed digoxigenin binding protein (PDB IDs 4j8t, 4j9a (Tinberg et al., Reference Tinberg2013)) and an engineered organophosphate hydrolase (PDB ID 3tig (Khare et al., Reference Khare2012)), and a designed DNA nanomaterial (PDB ID 3gbi (Zheng et al., Reference Zheng2009)).
PDB data impact basic and applied research on health and diseases of humans, animals, and plants; production of food and energy; and other research about global prosperity, resilience, and environmental sustainability. There are many anecdotal accounts in the scientific literature attesting to the importance of the PDB. On the occasion of the 50th birthday of the PDB in 2021, for example, the Journal of Biological Chemistry published two PDB50-themed special issues (Berman and Gierasch, Reference Berman and Gierasch2021; Gierasch and Berman, Reference Gierasch and Berman2021).
Bibliometric analyses provide opportunities for quantitative assessment of the impact of PDB data. The inaugural RCSB PDB publication (Berman et al., Reference Berman2000) had been cited more than 30,000 times as of January 2025, according to the Clarivate Web of Science (Copyright Clarivate 2024. All rights reserved). Taking a broader view, a 2018 study (Burley et al., Reference Burley2018) demonstrated citations of PDB data spanning the biosciences from Agriculture to Zoology. Not surprisingly, nearly 90% of published PDB structures analyzed in 2018 were cited by Biochemistry & Molecular Biology journals. High impact was also documented in other areas of fundamental biology and biomedicine (Cell Biology, Pharmacology and Pharmacy, Microbiology, Genetics & Heredity). Related analyses highlighted PDB structure publications frequently cited in STEM-related journals focused on Materials Science, Physics, Computer Science, Chemistry, Engineering, and Mathematics (Feng et al., Reference Feng2020). PDB data are also being used in the Social Sciences to understand human behavior and incentives in academic research (Hill and Stein, Reference Hill and Stein2019) and even by artists (Voss-Andreae, Reference Voss-Andreae2005).
Additional unpublished bibliometric analyses provide further evidence of PDB data impact. As of January 2023, 168,902 PDB structures (~84% of the archive) were reported in 78,334 unique primary publications, which were cited 5,601,496 times. At that time, individual publications of PDB structures had been cited ~38 times on average, and each of 148,874 (~75% of the archive) PDB structures had garnered at least one citation of their corresponding primary publication. Again, as of January 2023, the “most popular” PDB structure, PDB ID 1aoi (Luger et al., Reference Luger1997), that of the nucleosome core particle (Figure 4), had been cited >4,500 times. Additional highly cited PDB structure statistics are as follows: 85 PDB IDs had each been cited >1,000 times; nearly 600 PDB IDs had each been cited >500 times; and >11,300 PDB IDs had each been cited >100 times). The top 10% of published PDB structures had each been cited at least 79 times. These data provide compelling evidence of the enormous breadth of PDB data impact across the scientific literature. They also document that many PDB structures are reported in “citation classic” publications.

Figure 4. MX structure of the nucleosome core particle PDB ID 1aoi (Luger et al. Reference Luger1997). Image from the Molecule of the Month (Goodsell, Reference Goodsell2000).
Biomedicine
3D structures of bacterial and viral proteins archived in the PDB are routinely used to discover and develop treatments and cures for infectious diseases. As of November 2024, the archive housed nearly 76,000 structures of bacterial proteins. The two National Institute for Allergy and Infectious Diseases Structural Genomics Centers for Infectious Diseases (Myler et al., Reference Myler2009; Stacy et al., Reference Stacy, Anderson and Myler2015) have together contributed >3,300 human pathogen protein structures to the archive. Collectively, bacterial protein structures in the PDB provide insights into microbial evolution (e.g., (Koonin and Makarova, Reference Koonin and Makarova2019)); metabolic pathways (e.g., (Brunk et al., Reference Brunk2018)); the human microbiome (e.g., (Walker et al., Reference Walker, Simpson and Redinbo2022)); potential targets for antimicrobial drug discovery (e.g., (Shaikh et al., Reference Shaikh2021)); molecular mechanisms underpinning antibiotic resistance (e.g., (Reeve et al., Reference Reeve, Lombardo and Anderson2015)); and structure-guided drug discovery (e.g., (Simmons et al., Reference Simmons, Chopra and Fishwick2010)). Similarly, as of November 2024, the PDB archive housed ~21,400 structures of viral proteins. They provide valuable insights into virus evolution (e.g., (Krupovic and Bamford, Reference Krupovic and Bamford2008)) and interactions with host cell proteins (e.g., (Goodsell and Burley, Reference Goodsell and Burley2020)). They also include information critical to combatting many of the viral pathogens already known to infect humans and others that may do so in the coming decades. For example, the PDB houses more than 2,600 human immunodeficiency virus-1 related structures, including more than 700 structures of the dimeric aspartyl protease (e.g., PDB ID 3hpv (Wlodawer et al., Reference Wlodawer1989)), many of which are co-crystal structures with bound to small-molecule inhibitors. These data played critical roles in structure-guided discovery of ten protease inhibitors approved for treating acquired immunodeficiency syndrome or AIDS, the first of which was saquinavir (PDB ID 1hxb (Krohn et al., Reference Krohn1991)). More recently, PDB data (more than 4,600 experimentally determined SARS-CoV-2 protein structures) played central roles in the fight against COVID-19 (Figure 5, reviewed in (Burley, Reference Burley2025)), contributing to both mRNA vaccine design (Corbett et al., Reference Corbett2020) and discovery and development of nirmatrelvir, the active ingredient of Pfizer’s Paxlovid (Owen et al., Reference Owen2021). Looking ahead to the possibility of a global pandemic caused by influenza A H5N1 virus (Kupferschmidt, Reference Kupferschmidt2023), there are currently >250 H5N1-related PDB structures and nearly 600 PDB structures of other influenza virus proteins (Bittrich et al., Reference Bittrich2025).

Figure 5. SARS-CoV-2 Genome and Proteome Organization. Near complete 3D knowledge of the SARS-CoV-2 proteome derives from >4,600 SARS-CoV-2 related PDB structures and CSM based on SARS-CoV-1 related structures archived in the PDB. Figure adapted from (Lubin et al., Reference Lubin2022) and available from PDB-101 (https://pdb101.rcsb.org/learn/flyers-posters- and-calendars/flyer/sars-cov-2-genome-and-proteins). Color coding: shades of blue-non-structural proteins; shades of green: structural proteins and proteins encoded by various open-reading frames; yellow/orange/red-duplex RNA; orange-S-adenosyl methionine; and shades of red-enzyme inhibitors.
Published case studies (e.g., (Hu et al., Reference Hu2018)) and anecdotal accounts presented at scientific meetings leave no doubt as to the important contributions to drug discovery made by structural biologists working within the biopharmaceutical industry. The first quantitative analysis of the impact of structural biologists and PDB structures on drug approvals across all therapeutic areas was published in 2019. PDB holdings were examined to identify 3D biostructures relevant to the discovery and development of 171 new small-molecule drugs across all therapeutic areas approved by the US Food and Drug Administration (FDA) from 2010 to 2018 (Westbrook and Burley, Reference Westbrook and Burley2019). The PDB archive housed 5,364 structures, providing atomic-level, 3D information for ~88% of the targets of these 171 small-molecule drugs. Structure-guided drug discovery approaches were used to generate >70% of these new drugs. In approximately 20% of cases, the number of PDB structures of the drug target exceeded 100.
Two follow-up studies focused on new anti-cancer drugs. One study documented that access to PDB structure information facilitated discovery and development of >90% of the 79 new anti-neoplastic agents approved by the US FDA from 2010 to 2018 (54 small-molecule drugs and 25 biologics) (Westbrook et al., Reference Westbrook2020). The other (Burley et al., Reference Burley2024) went on to review small-molecule anti-cancer drugs approved by US FDA from 2019 to 2023. During this latter period, open access to PDB structure information facilitated discovery and development of 100% of 34 newly approved anti-neoplastic agents. Approximately 80% of these new drugs were the products of structure-guided drug discovery. Figure 6, for example, illustrates PDB ID 6o8m (Canon et al., Reference Canon2019), showing the mechanism of action at the atomic level in 3D of sotorasib covalent targeting the G12C mutant form of KRAS (Lanman et al., Reference Lanman2020). Before discovery and development of sotorasib, RAS oncoproteins were deemed undruggable. Structure-guided discovery, development, and regulatory approval of this first-in-class drug set the stage for targeting other mutant forms of RAS, which collectively occur in ~20% of all human cancers (Prior et al., Reference Prior, Hood and Hartley2020).

Figure 6. Ribbon representation of the co-crystal structure of sotorasib covalently bound to the G12C KRAS (pink)/GDP complex (PDB ID 6oim (Canon et al., Reference Canon2019)). Inset highlights a zoomed-in view of the sotorasib binding site, showing the covalent bond (half green/half yellow) between the drug and Cysteine 12 (yellow atomic ball-and-stick figure). Images generated using the Mol* Viewer (Sehnal et al., Reference Sehnal2021). Image adapted from (Burley et al., Reference Burley2024).
Analyses of PDB holdings, the scientific literature, and related documents for each anti-cancer drug-protein target combination revealed that the impact of public-domain 3D structure data is broad and substantial, ranging from understanding target protein biology to identifying a given target protein as druggable to structure-guided drug discovery. There is every reason to believe that PDB structures of target proteins will continue to facilitate structure-guided discovery and subsequent development of new drugs benefiting patients and their families and society more broadly for decades to come.
Biotechnology innovation
Patent literature reviews conducted in August 2022 documented very broad impact of PDB data on innovation. As of June 2022, searching the US Patent and Trademark Office website (United States Patent and Trademark Office, 2022) identified ~10,000 issued US patents with PDB mentions (vs. ~6,500 issued patents in June 2017 (Burley et al., Reference Burley2018)). Analogous searches of global patent literature using PatSeer (Gridlogics, 2021) documented ~90,000 issued patents and in-process patent applications worldwide that include PDB mentions (vs. ~50,000 in mid-2017 (Burley et al., Reference Burley2018)). It should be noted that patents and patent applications mentioning PDB data do not involve attempts to patent protein structures per se (Committee on Intellectual Property Rights in Genomic And Protein Research and Innovation, Reference Merrill and Mazza2006).
The top ten assignees of worldwide patents mentioning PDB in mid-2017 included four US research universities (Massachusetts Institute of Technology; New York University; University of California Regents; University of Texas), two biopharmaceutical companies (Genentech, Inc.; Amgen, Inc.), two biotechnology companies (Xencor, Inc.; Novozymes, Inc.), and two agribusiness companies (DuPont de Nemours, Inc.; Pioneer). These findings underscore the importance of open access to PDB data for basic and applied research carried out in universities and not-for-profit institutes, and for-profit biopharmaceutical, biotechnology, and agribusiness companies.
Protein structure prediction
PDB data facilitated the development of structural bioinformatics as a vibrant subdiscipline of computational biology (Bourne and Weissig, Reference Bourne and Weissig2003). Without an open-access repository of rigorously validated, expertly biocurated 3D structures of biological macromolecules, there would be no homology modeling, no computational docking of small-molecule ligands, and no de novo protein structure prediction. Inspired by the work of Anfinsen, who showed in the 1970s that the sequence of a polypeptide chain determines its shape or fold (Anfinsen, Reference Anfinsen1973); practitioners of this emerging field strove for decades to predict the 3D structures of proteins accurately. The 2020 Critical Assessment of Structure Prediction exercise (CASP (Alexander et al., Reference Alexander2021)) witnessed a sea change in structural bioinformatics. Google DeepMind’s AlphaFold2 (AF2) Artificial Intelligence/Machine Learning (AI/ML) software emerged as the top performer for de novo protein structure prediction, with accuracies often comparable to that of lower-resolution experimental methods (Jumper et al., Reference Jumper2021; Shao et al., Reference Shao2022; Terwilliger et al., Reference Terwilliger2024). Subsequently, the Rosetta team released RoseTTAFold (Baek et al., Reference Baek2021), which generates CSMs of proteins with accuracies comparable to AF2.
Today, Computed Structure Models for nearly every protein sequence represented in UniProt (UniProt Consortium, 2023) are publicly accessible from the AlphaFold Protein Structure Database (AlphaFold DB (Varadi et al., Reference Varadi2022)). Some of the millions of CSMs generated independently of DeepMind (using RoseTTAFold, AF2 Colab, Meta, etc.) are available from the open-access ModelArchive (Protein Structure Bioinformatics Group, 2024). Both AlphaFold DB and the ModelArchive utilize the ModelCIF data standard (Vallat et al., Reference Vallat2023), which interoperates seamlessly with the PDBx/mmCIF data dictionary described above. It is jointly managed by wwPDB partners and the wwPDB ModelCIF Working Group (www.wwpdb.org/task/modelcif). A 2021 New England Journal of Medicine publication described potential uses of CSMs in clinical research and practice (Burley et al., Reference Burley, Arap and Pasqualini2021). Development of AF2 by John Jumper and Demis Hassabis and the pioneering protein design work of David Baker earned them shares of the 2024 Nobel Prize in Chemistry. All three newly minted Nobel Laureates explicitly acknowledged the key role that the Protein Data Bank played in providing highly curated, validated, machine-readable data (Callaway, Reference Callaway2024).
The RCSB PDB provides open access to more than one million CSMs alongside all PDB structures at RCSB.org (Figure 7). Access to both CSMs and PDB data benefits structural biologists who are using CSMs when initiating experimental studies (e.g., for expression construct design) and during MX (Terwilliger et al., Reference Terwilliger2022) and 3DEM (Subramaniam and Kleywegt, Reference Subramaniam and Kleywegt2022) structure determination efforts. Making CSMs available to PDB data consumers working in areas such as plant sciences makes RCSB.org a much more valuable resource. The current experimental structure coverage of the Arabidopsis thaliana proteome in the PDB is ~4%. With combined delivery of PDB structures and CSMs at RCSB.org, plant molecular biologist users enjoy access to 3D structure information spanning the entire A. thaliana proteome.

Figure 7. RCSB.org delivers PDB experimental structures (identified with an Erlenmeyer flask icon in dark blue) and CSMs (computer screen icon in cyan) from AI/ML that can be searched, analyzed, visualized, and explored using custom tools and features. Image from (Burley et al., Reference Burley2023).
Delivery of more than one million CSMs alongside PDB structures also provides full proteome coverage of 3D structural information for human, the major model organisms (mouse, rat, zebrafish, fruit fly, Dictyostelium, Caenorhabditis elegans, A. thaliana, S. cerevisiae, Schizosaccharomyces pombe, C. albicans, E. coli, and Methanocaldococcus jannaschii), 32 human pathogens, three important food crop plants (rice, maize, and soybean), and select organisms important for understanding the impact of climate change. Providing simultaneous access to experimentally determined structures and CSMs allows both types of information to be searched, visualized, and analyzed together. It also informs bioscience researchers and their trainees, and educators and their students as to some of the limitations of CSMs. They are comparable in accuracy to lower-resolution experimental structures and should not be relied on when a gold-standard, experimentally determined PDB structure(s) is available (Moore et al., Reference Moore2022; Shao et al., Reference Shao2022).
Information stored in the PDB is made available under the most permissive Creative Commons CC0 1.0 Universal License ( https://creativecommons.org/licenses/by/4.0/ ), enabling researchers around the world to access and utilize the information at no charge and with no restrictions on its usage. Recognizing its long-standing commitment to high standards of data preservation, management, and open access, the PDB is accredited by CoreTrustSeal, an international organization that certifies data repositories ( https://amt.coretrustseal.org/certificates/ ). More recently, the PDB was recognized by the Global Biodata Coalition ( https://globalbiodata.org ) as a Global Core Biodata Resource of “fundamental importance to the wider biological and life sciences community and the long-term preservation of biological data.” PDB remains a vanguard in the open-access movement.
Worldwide distribution of PDB data is not limited to wwPDB partner web portals. Review of the Nucleic Acids Research Online Molecular Biology Database Collection (Rigden and Fernandez, Reference Rigden and Fernandez2023), which comprises databases from the journal’s annual Database Issues, identified ~500 external data resources that distribute repackaged PDB data to individuals who may not routinely visit RCSB.org or one of the wwPDB partner web portals (Resources as of 2022 (Rigden and Fernandez, Reference Rigden and Fernandez2022) listed in able S1 of (Burley et al., Reference Burley2022)). Beyond utilization of PDB from open-access knowledgebases, etc., there is substantial reuse of public domain 3D biostructures within global biopharmaceutical companies (e.g., Pfizer, Novartis, Eli Lilly, and Company), most if not all of which maintain copies of the archive inside company firewalls. Therein, PDB data are used daily alongside proprietary MX, NMR, and 3DEM structures determined by the company to better understand target protein biology, identify target proteins as likely to be druggable, and support structure-guided drug discovery and preclinical development of drug candidates.
Regional, US, and global economies
Although it has not been possible to carry out comprehensive analyses of the economic impact of wwPDB Core Archives and wwPDB partner activities, some data about RCSB PDB operations are available. A 2017 Rutgers University Office of Research Analytics (ORA) study documented the substantial contributions of PDB data and RCSB PDB to public sector economies (Sullivan et al., Reference Sullivan, Brennan-Tonetta and Marxen2017). The corpus of scientific data (>227,000 3D biostructures) has an estimated replacement cost of nearly US$23 billion. The Rutgers ORA analyses of 2017 public sector usage of PDB data delivered via RCSB.org estimated an aggregate economic value of ~US$9.2 billion (>1,500 times ~US$6.1 million federal funding of RCSB PDB at that time). Since 2017, the PDB archive has grown by ~67%, and the number of unique IP clients visiting RCSB.org annually has grown by >80%, suggesting that the public sector economic impacts of PDB data and RCSB PDB operations have increased substantially (as a multiple of ~US$10 million in 2024 federal funding of RCSB PDB Core Operations).
The Rutgers ORA analysis did not attempt to estimate quantitatively the economic impact of PDB data accruing from societal benefits generated by pharmaceutical and biotechnology companies. However, some sense of the magnitude of impact on for-profit companies and the global economy can be gleaned from the metrics presented above under Biotechnology Innovation and Biomedicine. We are also unable to quantify the impact of open access to PDB data on education and STEM workforce training. Introductory RCSB PDB training materials and documentation delivered at PDB101.RCSB.org help researchers and their trainees, and educators and their students learn how to connect 3D biostructures to knowledge. PDBe and PDBj also provide training resources to users of their web portals (pdbe.org and pdbj.org, respectively), as do our other two wwPDB partners EMDB (ebi.ac.uk/emdb/) and BMRB (bmrb.io).
Perspectives and future directions
The advent of AF2, RoseTTAfold, etc., caused some scientists to suggest that structural biology as a discipline and those who determine structures using experimental methods would no longer be necessary. They failed to recognize that structural biologists have never shied away from embracing new biophysical and computational methods to achieve their ultimate goals of visualizing and understanding biomolecules in 3D at the atomic level. They also failed to appreciate how useful the results of de novo structure prediction would be for structural scientists, particularly for those research teams relying on 3DEM methods. Individual CSMs can be fitted into 3DEM density maps coming from both single-particle and tomographic 3DEM measurements. At present, there are no computational methods capable of delivering predicted structures of large macromolecular complexes with accuracies comparable to lower-resolution experimental methods. Even for individual proteins, experimentally determined structures are more accurate than CSMs. Moreover, they are often more informative because they provide atomic-level insights into binding of small-molecule ligands (e.g., enzyme co-factors, inhibitors, US FDA-approved drugs).
It is important to note, however, that the promise of new AI/ML tools for structural biology depends critically on open access to ever more experimental structures in the PDB; well-determined, rigorously validated, and expertly biocurated structures are essential for improving the quality of the training sets on which AI methods development depends. Thus, the importance of continued focus on validation by the wwPDB has never been greater. In addition, biochemical analyses performed by PDB structure depositors are essential for us to understand complex relationships between structure and function. The central importance of PDB data to the development of de novo protein structure prediction tools reliant on AI/ML approaches raises important questions regarding the “dark matter” of structural biology – the hundred thousand or more X-ray co-crystal structures of protein-ligand complexes preserved inside biopharmaceutical company firewalls as trade secrets. Successful development of AI/ML tools that support structure-guided drug discovery could well hinge on public access to some of this information, much of which is post- competitive (meaning that its release will in no way diminish company shareholder value). The lesson from AF2, RoseTTAfold, etc., is clear. Open access to tens of thousands of entirely new co-crystal structures “donated” to the PDB by biopharmaceutical companies will accelerate structure-guided drug discovery for the benefit of patients, their families, and all of humanity. Finally, the ambitious goals of providing structural descriptions of organelles and even whole cells (e.g., the pancreatic beta cell (Singla et al., Reference Singla2018)) can and will be realized if structural biologists continue to develop new structure determination methods such as integrative and hybrid methods, and PDB, EMDB, and BMRB continue to ingest, validate, biocurate, and archive reliable atomic-level 3D structure information for proteins and nucleic acids, and their complexes with one another and small-molecule ligands.
The overarching mission of the wwPDB is to make highly curated and therefore trustworthy atomic-level 3D macromolecular structure information freely available to anyone working and learning anywhere in the world, with no limitations on data usage. The wwPDB was founded by three Core Members representing three continents to ensure the success of the PDB Core Archive; today, it has five Core Members and an Associate Member that jointly manage three Core Archives. Various Task Forces and Working groups have developed rigorous structure validation criteria implemented by the wwPDB. Active involvement of subject matter experts from the global scientific community ensures that wwPDB data will remain well-curated and reliable. On a weekly basis, each wwPDB Core Archive is updated by its respective Archive Keeper and released to the public. Thereafter, web portals maintained independently by RCSB PDB, PDBe, PDBj, EMDB, and BMRB distribute identical 3D biostructure information, together with unique services and value-added information. With PDB, EMDB, and BMRB serving as singular wwPDB Core Archive data resources, fragmentation and balkanization of the world’s 3D biostructure data have been avoided. The wwPDB is a highly effective consortium, one that is laterally aligned. That is, it allows member organizations to be independent as appropriate while collaborating to achieve common goals (The Stakeholder Alignment Collaborative, 2025). We believe that the model provided by the wwPDB for international collaboration to preserve and disseminate high-quality information can be adopted by other scientific disciplines, thereby enabling exciting new technical and scientific breakthroughs, particularly those using data-reliant AI/ML approaches.
Acknowledgments
The authors thank the tens of thousands of structural biologists working on all inhabited continents who have deposited structures to the PDB since 1971 and the many millions of researchers, educators, and students around the world who consume PDB data. We thank the members of the RCSB PDB and wwPDB Advisory Committees for their valued advice. We also gratefully acknowledge contributions to the success of the PDB archive made by past members of RCSB PDB and our Worldwide Protein Data Bank partners (PDBe, PDBj, PDBc, EMDB, and BMRB) and thank Dr. Brinda Vallat and Christine Zardecki for help with manuscript preparation.
Financial support
RCSB PDB Core Operations are jointly funded by the U.S. National Science Foundation [DBI- 2321666, PI: S.K. Burley], the U.S. Department of Energy [DE-SC0019749, PI: S.K. Burley], and the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, and the National Institute of General Medical Sciences of the National Institutes of Health [R01GM157729, PI: S.K. Burley]. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Competing interest
H.M. Berman and S.K. Burley declare none.