Impact Statement
Cryogenic electron tomography (cryoET) has become a powerful approach to visualize the organization and high-resolution structures of biological complexes in their native environment. Subtomogram averaging (STA) of hundreds of thousands of particles (i.e., subtomograms) is necessary to obtain near-atomic resolution structures for each such complex. While abundant biological complexes often cluster in arrays that manifest as one to three-dimensional lattices, flexibility and imperfection of such lattices pose challenges for efficient and accurate particle picking. To overcome these challenges and to meet the growing demand for efficient data processing and management in the cryoET and STA workflow, we have developed TomoNet, a user-friendly software package with a modern graphical user interface that allows users to execute the entire data processing pipeline seamlessly with the integration of commonly used software packages. TomoNet addresses the particle-picking challenge with two solutions, one based on geometric template matching and the other using artificial intelligence. Applications of TomoNet to three representative datasets demonstrate its capability for high-resolution structure determination of biological complexes on flexible and imperfect lattices.
1. Introduction
Single-particle cryogenic electron microscopy (cryoEM) is used to elucidate atomic-level structures of purified biological complexes. This methodology adheres to a standardized and well-established workflow supported by advanced software packages such as RelionReference Kimanius, Dong, Sharov, Nakane and Scheres 1 and cryoSparc.Reference Punjani, Rubinstein, Fleet and Brubaker 2 In parallel, cryogenic electron tomography (cryoET), coupled with subtomogram averaging (STA), expands the investigative scope to encompass heterogeneous macromolecules in their native context.Reference Wan and Briggs 3 –Reference Kopylov, Bobe, Johnston and Paraan 10 To enhance the resolution of subunits within in situ macromolecules, subtomograms (i.e., particles) are extracted from each tomogram and then subjected to three-dimensional (3D) alignment and averaging, thereby improving signal-to-noise ratio. Notably, STA has achieved resolutions up to sub-3 Å for in situ structures of large cellular complexes such as ribosomes, approaching the capabilities of single-particle cryoEM methodologies.Reference Ni, Frosio, Mendonça, Sheng, Clare, Himes and Zhang 11 –Reference Obr and Schur 14
The workflow for cryoET and STA typically involves five key components across specific software packages. In cryoET preprocessing, dose-fractionated frames are collected from an electron microscope, undergo motion correction, organized, and then assembled into individual tilt series. In tomogram reconstruction, 3D reconstructions are generated from those tilt series. In particle picking, particles of interest are identified and extracted from tomograms. Complexity varies based on the diverse and intricate nature of in situ cellular samples and their unique configurations. Many packages include their own particle-picking methods, such as oversampling using a supporting geometry in Dynamo,Reference Castano-Diez, Kudryashev, Arheit and Stahlberg 15 template matching in emClarityReference Himes and Zhang 16 and machine learning in crYOLO.Reference Wagner, Merino, Stabrin, Moriya, Antoni, Apelbaum, Hagel, Sitsel, Raisch, Prumbaum, Quentin, Roderer, Tacke, Siebolds, Schubert, Shaikh, Lill, Gatsogiannis and Raunser 17 In 3D refinement and classification, particles are iteratively classified and refined to obtain a final structure at subnanometer or near-atomic resolution, which has been demonstrated by software packages such as RelionReference Zivanov, Otón, Ke, von Kügelgen, Pyle, Qu, Morado, Castaño-Díez, Zanetti, Bharat, Briggs and Scheres 13 , Reference Bharat and Scheres 18, emClarity,Reference Himes and Zhang 16 EMAN2,Reference Chen, Bell, Shi, Sun, Wang and Ludtke 4 and Warp.Reference Tegunov and Cramer 19 Finally, post-processing activities include map sharpening, Fourier shell correlation (FSC) calculation, visualization by placing averaged maps back into the original tomogram, and so forth users often need to navigate between several specialized software packages for optimal results, which often demands a certain level of computational proficiency that poses a barrier for many.
The method for particle picking varies on a case-by-case basis, dictated by the characteristics of in situ cellular samples. In the early works of STA, manual particle picking was used, particularly when aiming for resolutions between 20 and 50 Å with a maximum of several hundred particles.Reference Zhang, Wang, Imhof, Zhou, Liao, Atanasov, Hui, Hill and Zhou 20 –Reference Si, Zhang, Shivakoti, Atanasov, Tao, Hui, Zhou, Yu, Li, Luo, Bi and Zhou 22 However, for biological samples exhibiting periodic structures, oversampling on specified geometry was leveraged to significantly reduce the labor associated with acquiring enough particles for improved resolutions. For instance, HIV virus-like particles (VLPs) adopt a hexagonal Gag protein lattice in its sphere-like configuration.Reference Scaramuzza and Castaño-Díez 23 Other examples include the Marburg Virus,Reference Bharat, Riches, Kolesnikova, Welsch, Krähling, Davey, Parsy, Becker and Briggs 24 Herpes simplex virus,Reference Grünewald, Desai, Winkler, Heymann, Belnap, Baumeister and Steven 25 and the Coat protein complex II,Reference Zanetti, Prinz, Daum, Meister, Schekman, Bacia and Briggs 26 all of which contain lattice-like arrangements with repeating subunits that could benefit from particle-picking automation when performing cryoET data processing. With an increasing demand for automation to enhance efficiency with minimal manual intervention, template matching has emerged as a popular method for automatic particle picking, relying on a user-provided reference map.Reference Himes and Zhang 16 , Reference Böhm, Frangakis, Hegerl, Nickell, Typke and Baumeister 27 Simultaneously, convolutional neural networks have shown promising results for cryoET automatic particle picking given its capacity to analyze 3D feature maps and autonomously identify prominent features within specific samples.Reference de Teresa-Trueba, Goetz, Mattausch, Stojanovska, Zimmerli, Toro-Nahuelpan, Cheng, Tollervey, Pape, Beck, Diz-Muñoz, Kreshuk, Mahamid and Zaugg 28 –Reference Hao, Wan, Yan, Liu, Li, Zhang, Cui and Zhang 31 These machine-learning approaches typically operate template-free and often obviate the need for human annotation.Reference Rice, Wagner, Stabrin, Sitsel, Prumbaum and Raunser 32
The expanding array of specialized software tools designed for specific tasks posts a critical need for seamless software integration within the cryoET workflow. Transitioning between various software packages can be a cumbersome process. Remarkably, recent initiatives have made notable progress in tackling this integration challenge. For example, TomoBEARReference Balyschew, Yushkevich, Mikirtumov, Sanchez, Sprink and Kudryashev 33 offers an integrated solution, while ScipionTomoReference Jiménez de la Morena, Conesa, Fonseca, de Isidro-Gómez, Herreros, Fernández-Giménez, Strelak, Moebel, Buchholz, Jug, Martinez-Sanchez, Harastani, Jonic, Conesa, Cuervo, Losana, Sánchez, Iceta, del Cano, Gragera, Melero, Sharov, Castaño-Díez, Koster, Piccirillo, Vilas, Otón, Marabini, Sorzano and Carazo 34 and nextPYPReference Liu, Zhou, Huang, Piland, Jin, Mandel, Du, Martin and Bartesaghi 35 provide a comprehensive web-based platform for managing various tasks in the cryoET pipeline. Notably, none of these packages takes specific advantage of the fact that abundant complexes exist in arrays of some sort, albeit with imperfections, variability, or flexibility.Reference Schur, Hagen, Rumlová, Ruml, Müller, Kräusslich and Briggs 36 –Reference Ni, Zhu, Yang, Xu, Chaban, Nesterova, Ning, Böcking, Parker, Monnie, Ahn, Perilla and Zhang 41
In this context, we have developed TomoNet, a software package designed for streamlining the cryoET and STA data processing workflow, with a modern graphical user interface (GUI) (Figures 1 and 2). Our methodology uses a geometric template matching approach rooted in the concept of “Auto Expansion” which serves as a general particle-picking solution for biological complexes organized in flexible, variable, or imperfect arrays. TomoNet is also powered by a deep learning-based solution to automate particle picking, which only needs 1–3 tomograms with known particle locations as ground truth for model training. Importantly, while TomoNet is particularly powerful for locating and averaging particles arranged on flexible or imperfect lattices, it can be applied to a broader range of particle types, offering a more generalizable trained model. These methods significantly diminish the need for manual inputs, and their outcomes can be seamlessly imported into Relion for subsequent high-resolution 3D classifications and refinements. We demonstrate the capabilities of TomoNet by applying it to three datasets with distinct protein lattice types, highlighting its accuracy and efficiency in identifying particles across diverse scenarios.
2. Results
2.1. Overall design of TomoNet
TomoNet is a Python-based software package that integrates commonly used cryoET packages to streamline the cryoET and STA pipeline, with a particular emphasis on automating particle picking of lattice-configured structures and cryoET project management. As shown in the main menu and the entire TomoNet pipeline (Figures 1 and 2), after data collection from electron microscopy, TomoNet can perform motion correction with integration of MotionCorr2;Reference Zheng, Palovcak, Armache, Verba, Cheng and Agard 42 tilt series assembly and tomogram reconstruction with integration of IMOD;Reference Kremer, Mastronarde and McIntosh 43 CTF estimation with integration of CTFFIND4;Reference Rohou and Grigorieff 44 manual particle picking with IMOD; particle picking using built-in geometric template matching-based algorithms with integration of PEET;Reference Heumann, Hoenger and Mastronarde 45 automatic particle picking using built-in deep learning-based algorithms; 3D classification/particle cleaning and subtomograms placing back with built-in algorithms. This design also allows on-the-fly tomogram reconstruction processing during data collection, which facilitates a quick quality check. TomoNet generates particle-picking results in STAR format,Reference Hall 46 which can be incorporated into Relion for high-resolution 3D refinement. It can also read Relion results in STAR format for particle cleaning and subtomograms placing back (Figure 1).
2.2. Particle picking with “Auto Expansion”
The “Auto Expansion” module is based on template matching and uses cross-correlation coefficient as a selection criterion, with a design to pick particles on flexible lattices with minimal manual inputs; its basic concept is elucidated in Figure 3. These particles exist in array-like configurations and manifest as flexible, partial, and imperfect lattices in one, two, and three dimensions (1D–3D). Examples are abound: microtubule doublets, ubiquitous in most cells, consist of 96 nm axonemal 1D translational repeat unitsReference Imhof, Zhang, Wang, Bui, Nguyen, Atanasov, Hui, Yang, Zhou and Hill 21 , Reference Shimogawa, Wijono, Wang, Zhang, Sha, Szombathy, Vadakkan, Pelayo, Jonnalagadda, Wohlschlegel, Zhou and Hill 47 (1D rotational lattice); HIV VLPsReference Schur, Obr, Hagen, Wan, Jakobi, Kirkpatrick, Sachse, Kräusslich and Briggs 40 and surface layer (S-layer) lattice of prokaryotic cellsReference von Kügelgen, Alva and Bharat 48 , Reference Pum, Breitwieser and Sleytr 49 are composed of hexametric subunits (2D lattice); paraflagellar rod of protozoan species is organized into para-crystalline arrays in its distal zoneReference Zhang, Wang, Imhof, Zhou, Liao, Atanasov, Hui, Hill and Zhou 20 (3D lattice). In TomoNet, each of these isolated lattice densities is called a patch, within which all subunits of the complex are connected. For instance, Figure 3 shows two patches of different sizes.
“Auto Expansion” is an iterative process; each iteration expands the particle set by adding more unpicked ones. To initiate “Auto Expansion”, users need to prepare a few “seed” particles that sparsely distribute across all observed patches. Typically, the numbers of such “seed” particles per tomogram range from 20 to 200, which depends on the number and size of patches in the input tomogram. Then, “Auto Expansion” iteratively expands the “seed” particle set to a final particle set that contains all particles on given flexible lattices, following three steps for each iteration (Figure 3). First, potential particles adjacent to each “seed” particle are calculated and selected as “candidate” particles. Second, these “candidate” particles undergo alignments to a user-provided reference and are evaluated based on cross-correlation coefficient, such that “wrong” particles with low cross-correlations are excluded. Third, qualified “candidate” particles are added to the particle set and become “seed” particles for the next iteration. During this process, only unpicked ones can be considered as “candidate” particles, and “Auto Expansion” stops either when no “candidate” particles are detected or when the user-defined maximum iteration number is reached. Doing this allows for an exhaustive exploration of particles on given lattices following their assembly topology with no restriction on geometry and outputs a final particle-picking result (Figure 2).
Compared with conventional template matching methods, “Auto Expansion” incorporates prior knowledge of lattice configuration to iteratively guide the search for “candidate” particles, i.e., unpicked particles following user-defined paths, as detailed in the Method section and TomoNet’s user manual. Thus, “Auto Expansion” significantly reduces computational complexity by searching in the regions of interest only, with restricted angular and translational search ranges defined by users. As a result, it reduces the number of incorrectly picked particles. Notably, “Auto Expansion” potentially works for any flexible, imperfect, or variable lattices in 1D, 2D, and 3D and has no intrinsic size limit of subunits.
2.3. Automatic particle picking by deep learning
The “AI AutoPicking” module is designed for automatic particle picking using supervised machine learning, which uses a U-net convolutional neural network for model training. There are three main steps in “AI AutoPicking”: training data preparation, neural network training, and particle coordinate prediction, as detailed in the Method section (Figure 4). It only requires an input training dataset consisting of 1–3 tomograms paired with their corresponding particle coordinate files. The trained model can then be applied to the entire tomography dataset and output predicted particles for each tomogram.
Essentially, the neural network in “AI AutoPicking” is trained as a voxel-wise binary classifier, which determines whether a voxel in density maps is part of a particle (Figure 4b). To prepare for training, data pairs (ground truth) consist of extracted subtomograms coupled with their associated segmentation maps, within which each particle is labeled by a cube near its center (Figure 4a). The trained neural network model can be applied to other tomograms to perform particle segmentation. Finally, the particle coordinate information can be retrieved from the predicted segmentation maps (Figure 4c).
2.4. 3D classification using TomoNet
In addition to the above two commentary modules for particle picking, TomoNet allows users to eliminate “bad” particles based on user-defined geometric constraints, which could serve as 3D classification during high-resolution particle refinements. Lattice variation in cryoET data has multiple plausible causes. Biologically, particles may be incomplete near the lattice edge due to paused biology assembly process.Reference Liu, Zhang, Wang, Tao, Bi and Zhou 50 Experimentally, lattices tend to become flattened near the air-water interface of the sample during imaging. These variabilities pose challenges for 3D classification in the process of high-resolution STA, making it difficult to exclude “bad” particles that exhibit unexpected coordinates and orientations assignment as subunits of lattices (Supplementary Material S1).
Removing these “bad” particles is necessary for achieving better resolutions.Reference Tan, Pak, Morado, Voth and Briggs 51 To accomplish this, TomoNet assesses each particle by counting its neighboring particles and calculating the averaged tilt angle to these neighbors to represent the local surface curvature of a lattice. TomoNet identifies particles with too few neighbors or large tilt angles to their neighbors as “bad” particles since they potentially deviate from the lattice configuration. This step can be integrated into high-resolution refinement in Relion, providing an alternative 3D classification method based on analyzing spatial relationships between particles.
2.5. Application to in situ viral protein arrays: The matrix protein lattice in HIV VLPs
To validate TomoNet as an integrated high-resolution cryoET and STA pipeline and an efficient particle-picking tool, four tomograms were processed from the HIV-1 Gag dataset which resolved the Gag hexamer structure at 3.2 Å resolution. Motion-corrected images underwent tilt series assembly, CTF estimation, and tomographic reconstruction using TomoNet. Within these tomograms, the VLP hexagonal lattice and its building blocks were observed, and some of these observed VLPs showed sphere-like geometry (Figure 5a).
As detailed in the Method section, a combination of “Auto Expansion” and “AI AutoPicking” was applied to the above four tomograms. The result shows that particles were readily picked on all the observed lattice patches (Figure 5b,c). Then, these picked particles were imported to Relion to perform high-resolution particle refinements, the resulting reconstruction of the Gag hexamer structure (Figure 6) looks identical to the published high-resolution structure,Reference Ni, Frosio, Mendonça, Sheng, Clare, Himes and Zhang 11 , Reference Zivanov, Otón, Ke, von Kügelgen, Pyle, Qu, Morado, Castaño-Díez, Zanetti, Bharat, Briggs and Scheres 13, demonstrating particle-picking accuracy and efficiency of TomoNet –capable of obtaining more particles from fewer tomograms.
Using the “3D subtomogram place back” function in TomoNet, 3D visualizations were generated to illustrate the in situ assembly of the VLP lattices (Figures 5d and 7). All VLP lattices with various sizes and shapes were captured even with irregular shapes (Figure 7e and Supplementary Material S2), demonstrating TomoNet’s particle-picking ability on flexible lattices. Lattice defects on each VLP were also identified consistent with previous studies,Reference Guo, Saha, Saffarian and Johnson 52 enhancing the understanding of lattice assembly mechanisms.Reference Talledge, Yang, Shi, Coray, Yu, Arndt, Meng, Baxter, Mendonça, Castaño-Díez, Aihara, Mansky and Zhang 53
2.6. Application to cellular organelle sample: Eukaryotic axoneme
We validated TomoNet’s particle-picking capability for 1D lattices by processing one tomogram of extracted flagellum of Trypanosoma brucei. The axoneme consists of 9 outer doublet microtubules (DMTs) and a pair of central singlet microtubules, where each DMT is a 1D polymer of 96 nm axonemal building blocks (Figure 8a). This typical 1-D lattice often exhibits imperfections like bends and twists, posing challenges for precise particle picking (Figure 8a). Using “Auto Expansion”, TomoNet accurately picked the 96 nm-spaced axonemal subunits from all DMTs, effectively adapting to lattice imperfections (Figure 8b).
2.7. Application to focused ion beam (FIB)-milled cellular sample: The S-layer lattice of prokaryotic cell
We validated TomoNet’s particle-picking capability by processing one tomogram of FIB-milled Caulobacter crescentus cells from EMD-23622.Reference Lasker, Boeynaems, Lam, Scholl, Stainton, Briner, Jacquemyn, Daelemans, Deniz, Villa, Holehouse, Gitler and Shapiro 54 The S-layer functions as a component of the cell wall covering the cell body. Thus, its lattice geometry is typically defined by the shape of cells (Figure 9a). The pleomorphic shape of C. crescentus cell in variable sizes, with the low contrast shown in this tomogram, hindered locating subunits on the S-layer lattice and raised difficulty for efficient particle picking on its S-layer lattice (Figure 9a).
TomoNet overcame the above challenges by utilizing the hexagonal configuration of S-layer lattices. With minimal manual input, “Auto Expansion” picked over a thousand hexamer S-layer subunits. The binned STA result clearly reveals the S-layer inner domain, and docking previously resolved high-resolution structureReference von Kügelgen, Tang, Hardy, Kureisaite-Ciziene, Brun, Stansfeld, Robinson and Bharat 55 (EMD-10388) into it confirms the correct hexagonal distribution with well-fitted major domains (Figure 9b,c). Visualization of S-layer lattices also shows that the picked particles were arranged in the expected hexagonal pattern, confirming the reliability and applicability of TomoNet as a particle-picking tool (Figure 9d) and its broad application to structure determination of prokaryotic and archaeal cell walls.Reference Pum, Breitwieser and Sleytr 49 , Reference Sleytr, Schuster, Egelseer and Pum 56
2.8. Application to in vitro assembled arrays: Nuclear egress complex (NEC) lattice
We further validated TomoNet as an integrated high-resolution STA pipeline and an efficient particle-picking tool by processing samples containing NEC lattices within budded vehicles. Nuclear egress is a pivotal step in herpes virus replication, driven by NEC and responsible for translocating nascent viral particles from nucleus to cytoplasm. In our reported dataset,Reference Draganova, Wang, Wu, Liao, Vu, Gonzalez-Del Pino, Zhou, Roller and Heldwein 57 NEC heterodimers budded into large vesicles with diameters ranging from 100 nm to 500 nm, forming beehive-like lattices on the inner surface of these vesicles (Figure 10a,b). Because of their large sizes, noticeable compressions were observed during the sample freezing, reshaping the vesicles and NEC lattices from spherical to flattened disk shapes (Figure 10a,b). This conformational change was a consequence of the limitation in ice thickness imposed by cryoET, which restricts the sample thickness to approximately 250 nm, consequently posing challenges for particle picking.
TomoNet successfully picked NEC hexamer subunits following the topology of lattices. The intermediate STA result generated in TomoNet already showed the six heterodimers within one hexamer subunit (Figure 10c). With these picked particles, high-resolution 3D classifications and refinements were carried out to obtain a final reconstruction of NEC hexamer subunit at 5.4 Å resolution, without preferred orientation bias (Figure 10c,d), and all the helices were well resolved (Figure 10e). Visualization of subtomograms placing back shows that the large vesicle was compressed during sample freezing which stretched the NEC lattice, making it appear flat and split at the air-water interface, while the middle part of the lattice appears to be more curved.
2.9. Application to other types of arrays and free-floating particles
The above examples show how TomoNet’s ability to locate particles arrays arranged on flexible spheres (HIV), cell surfaces (S-layer), and nuclear membranes (NEC), which can be considered as topologically 2D lattices. In our published work of various cryoET structures, TomoNet has also been used to locate subtomograms arranged on flexible filaments (i.e., 1D arrays) such as the flagella of T. brucei Reference Imhof, Zhang, Wang, Bui, Nguyen, Atanasov, Hui, Yang, Zhou and Hill 21 , Reference Shimogawa, Wijono, Wang, Zhang, Sha, Szombathy, Vadakkan, Pelayo, Jonnalagadda, Wohlschlegel, Zhou and Hill 47 and the amyloid-like sheath protein on β-hoops of the prototypical archaeon, Methanospirillum hungatei. Reference Wang, Zhang, Toso, Liao, Sedighian, Gunsalus and Zhou 58 In the case of 3D lattices, TomoNet has been also used to obtain the paraflagellar rod structure of T. brucei. Reference Zhang, Wang, Imhof, Zhou, Liao, Atanasov, Hui, Hill and Zhou 20 Since TomoNet has integrated packages and is designed for the entire cryoET and STA data processing pipeline, it can also be used as a general-purpose package for STA toward high resolution when particles are free floating and without local order. In the latter case, TomoNet would have the same limitation recognized for all other cryoET software packages, that is, high resolution is currently only achieved for large complexes, such as ribosomes.
3. Discussion
In this paper, we report the implementation and application of TomoNet and demonstrate its efficacy in particle picking across three distinct datasets featuring particles with varying lattice configurations. TomoNet stands out as the first software to exhaustively trace lattices following its inherent topology. This unique approach ensures that the particle-picking results faithfully reflect in situ or in vitro lattice shape, providing valuable insights into how these lattices are formed by their constituent subunits. For HIV VLPs, TomoNet application enabled us to directly visualize the VLPs lattices and their defects potentially caused by the absence of pentamer subunits. Similarly, for the NEC dataset, TomoNet facilitated a more direct observation of lattice conformation changes resulting from the sample freezing process. Since vesicles in this dataset were too large to be compressed from a sphere into a disk-like shape, the lattice regions near the air-water interface became stretched and subsequently divided into smaller fragments. Moreover, TomoNet demonstrated its exceptional performance, even when dealing with datasets characterized by extremely low contrast. For instance, in the cellular S-layer tomogram of a lamella, S-layer subunits were nearly imperceptible to human observations. Therefore, “Auto Expansion” excelled in particle picking without requiring denoising or contrast-enhancement algorithms.
Additionally, “AI AutoPicking”, the deep learning-based module, demonstrated excellent performance on automatic particle picking, showing potential in handling a wide range of particle types even beyond those with lattice-like arrangements. Compared to the template matching-based “Auto Expansion”, “AI AutoPicking” has several advantages in particle picking. First, it applies to particles situated on flexible lattices and those arranged in scattered patterns, such as cellular ribosomes. The neural network learns to pick by discerning 3D features of individual particles, and it does not require prior knowledge about lattice configuration. Second, it utilizes GPUs for fast convolution operations, enabling particle prediction in just several minutes for each tomogram. Third, it does not require the “seed” particles used in “Auto Expansion”, which further reduces human efforts by approximately 5–15 minutes per tomogram. This is especially beneficial for processing extensive tomography datasets with hundreds of tomograms. However, comparing their final output particles, “AI AutoPicking” typically picks fewer particles than “Auto Expansion” because it misses certain particles on the flexible lattices. Thus, these two modules are complementary to each other and can be incorporated to further explore these missing particles.
Regarding the pipeline design, each module within TomoNet is designed to be highly independent, ensuring flexibility for integrating future methods and third-party packages. This adaptable framework positions TomoNet as a platform of choice for other developers to build their own innovations. At present, TomoNet is primarily tailored for integration with the Relion-related pipeline. However, it can accommodate specific demands and can be extended to integrate other pipelines, including emClarity,Reference Himes and Zhang 16 EMAN2,Reference Chen, Bell, Shi, Sun, Wang and Ludtke 4 M,Reference Tegunov, Xue, Dienemann, Cramer and Mahamid 59 and others in the future. In summary, TomoNet significantly simplifies the overall process for users in managing and monitoring every step of the complete cryoET and STA pipeline. Its user-friendly GUI design notably reduces the entry barrier for newcomers to the fast-emerging cryoET field. The particle-picking modules of TomoNet provide a general solution for particles organized in lattice-like arrangements, ensuring both accuracy and efficiency, thereby facilitating the high-resolution STA pipeline.
4. Methods
TomoNet is an open-source software package developed using Python. It follows a highly modularized architecture with each module responsible for specific tasks in a typical cryoET and STA data processing pipeline. Modules in TomoNet mainly cover the upper stream of the cryoET and STA pipeline including procedures of motion correction, tilt series generation, tomogram reconstruction, CTF estimation, and particle picking, while leaving the high-resolution 3D refinement to established software package like Relion (Figure 1). The design of a modern GUI, established with PyQt5 platform, enhances user-friendliness, and helps with tracking the processing progress (Figure 2). With table views, users can obtain a comprehensive overview of the entire dataset, facilitating direct and intuitive management for each tomogram (Figure 2).
4.1. Implementation of modules for motion correction, tomogram reconstruction and CTF estimation
Motion correction, tomogram reconstruction, and CTF estimation related functions are organized into individual modules in TomoNet, with the integration of corresponding external software packages including MotionCorr2,Reference Zheng, Palovcak, Armache, Verba, Cheng and Agard 42, IMODReference Kremer, Mastronarde and McIntosh 43 or AreTomo,Reference Zheng, Wolff, Greenan, Chen, Faas, Bárcena, Koster, Cheng and Agard 60 and CTFFIND4,Reference Rohou and Grigorieff 44 respectively. Since their codes are not rewritten in TomoNet, users have to install each of them before using the corresponding modules.
The “Motion Correction” module is used to correct bean-induced sample motion. It requires an input folder path that contains all the dose fractionated frames, then user can specify their MotionCorr2 parameters in the GUI. After clicking the “RUN” button, TomoNet will perform motion correction for all the input images and save the results in a separated directory. This module also allows on-the-fly motion correction during data collection.
The “3D Reconstruction” module comprises two sub-functions: “TS Generation” and “Reconstruction.” Within “TS Generation,” users can readily assemble tilt series for each tomogram from the previously generated motion corrected images. It provides advanced options for data cleaning, such as setting a minimum acceptable number of tilt images for a tomogram, removing duplicate images at the same tilt angle by excluding images with older time stamps. The “Reconstruction” tab automatically reads and lists all tomograms in a table view, with essential information, such as tilt image number and alignment errors, and action buttons for restarting, continuing, and deleting individual tomogram reconstruction processes. This simplifies the assessment of reconstruction results and facilitating tomogram reconstruction management.
The “CTF Estimation” module is used for the tilt series defocus estimation, with support of parallel processing using multiple CPUs. Its outcomes are also listed in a table view with visualization features, such as displaying defocus at 0 degree and plotting the defocus distribution across all tilt angles.
4.2. Implementation of the “Manual Picking” module
The “Manual Picking” module is designed for general management of manual particle picking, especially for the preparation of “seed” particles required in “Auto Expansion.” IMOD stalkInit picking criteria is implemented to define the Y-axis for each particle with 2 points, and the center in between them. In the example of HIV dataset, 5–10 particles were manually picked as the “seed” particles for each VLP lattice, which only takes several minutes per tomogram (Figure 5a).
4.3. Design and implementation of the “Auto Expansion” module
“Auto Expansion” consists of three steps as shown in Figure 2. “Generate tomograms.star” is used to generate a STAR format file that maintains information of tomograms and their associated “seed” particles to be applied in “Auto Expansion.” “Generate Picking Parameter” is used to set up parameters required for particle set expansion through the described iterative process. The parameters include angular search ranges and steps, translational search ranges and steps, a “transition list” (explained later), box size used in particle alignment, distance between neighboring repeating subunits, reference and mask map, cross-correlation threshold, and so forth The “transition list” is customized by users to describe the targeting lattice configuration, with each transition denoted by [sx, sy, sz], where sx, sy and sz are translational shifts from the center of “seed” particle to one of its neighbors along X, Y and Z-axis, respectively. Thus, “Auto Expansion” can use it to guide the search of “candidate” particles. These user-defined parameters will then be saved into a JSON format file. “Run Particle Expansion” takes the above STAR and JSON format files as inputs to perform the iterative particle set expansion.
During the “Auto Expansion” processing, three directories will be generated for each tomogram. They are “TomoName” as the working directory for carrying out the current iteration, “TomoName_cache” that stores intermediate results from finished iterations, and “TomoName_final” that stores the final particle-picking results. The iteration number of “Auto Expansion” is typically greater than one. However, “Auto Expansion” allows for some special usage cases. For example, in the scenario when users need to modify the particle-picking setting such as a different cross-correlation threshold, user can generate the new picking parameter file, then execute “Run Particle Expansion” by setting the iteration number as 0. This prompts the program to skip the “candidate” searching steps, but just gather all intermediate results saved in “TomoName_cache” directories, then generate a new “TomoName_final” result.
4.4. Design and implementation of the “AI AutoPicking” module
The “AI AutoPicking” module comprises three main steps, “Prepare Training Dataset”, “Train Neural Network” and “Predict Particles coordinates.” It uses supervised machine learning that requires users to provide ground truth, i.e., tomogram with the associated particle coordinates files, for the model training. In this study, the ground truth data were prepared by “Auto Expansion.”
In “Prepare Training Dataset,” extracted subtomograms are used as inputs to the network training model for two reasons. First, the size of the tomogram used for picking is typically around 1000x1000x1000 voxels, which is not applicable to be loaded in the GPU memory, but the size of extracted subtomograms is under 100x100x100 voxels. Second, it helps with increasing the number of training data pairs to avoid over-fitting during the network training. For the model output, the particle coordinates information was embedded into 3D binary segmentation maps, where the voxels associated with particles were set to 1, otherwise set to 0 (Figure 4a).
In “Train Neural Network,” the above-extracted subtomograms paired with their associated segmentation maps are used to train a neural network model to be a binary classifier that predicts whether a voxel is near the center of a particle. The network architecture used is derived from the one used in IsoNet Reference Liu, Zhang, Wang, Tao, Bi and Zhou 50 as it is well-suited for capturing generalized features of 3D objects (Figure 4b). Since the learning task is voxel-wisely binary classification, cross-entropy loss function is used instead of minimum squared error (MSE). Equipped with one RTX 3080Ti graphic card, the training process can be completed swiftly within 1–2 hours if using the default parameters.
In “Predict Particles coordinates”, users can apply the trained model on the entire tomography dataset for particle coordinate prediction (Figure 4c). For each tomogram, TomoNet generates a predicted segmentation map first, then its particle coordinates information can be retrieved from the segmentation map by utilizing the hierarchical clustering algorithm from scipy module in Python.
4.5. Implementation of tools within the “Other Utilities” module
The “Other Utilities” module consists of two sub-functions: “Recenter | Rotate | Assemble to. star file” and “3D Subtomogram Place Back” as useful tools for post-particle-picking processing. The first one allows users to assemble and convert the particle-picking results into a STAR format file following the Relion4 convention, reset particle center to its symmetric center, and align the rotation axis to Relion Z-axis. The second one takes a user-provided STAR format file that contains particle information as input, then generates a ChimeraXReference Meng, Goddard, Pettersen, Couch, Pearson, Morris and Ferrin 61 session file for 3D subtomograms placing back and a clean version of STAR format file with “bad” particles removed. This not only allows users to validate the accuracy of particle picking before importing into Relion but also enables direct observation of the distribution and configuration of subunits after the high-resolution 3D refinements, providing overall in situ lattice observations (Figure 7).
4.6. Processing tomograms of HIV VLP dataset
The HIV VLP dataset was downloaded from the Electron Microscopy Public Image Archive (EMPIAR) with the accession code EMPIAR-10164.Reference Schur, Obr, Hagen, Wan, Jakobi, Kirkpatrick, Sachse, Kräusslich and Briggs 40 Four tilt series, TS_01, TS_43, TS_45, and TS_54, were used in this study. Downloaded micrographs were loaded into the TomoNet pipeline to perform tilt series assembly, CTF estimation, and tomogram reconstruction using the WBP algorithm.
Four-time binned tomograms with 5.4 Å pixel size were used for further particle picking. First, tomograms TS_01 and TS_43 were used for “seed” particles preparation on 3 selected VLPs per tomogram, and an initial reference map was generated by averaging them in PEET. Second, one run of “Auto Expansion” was applied on the above two tomograms to get more particles, such as to refine the reference. Third, with an improved reference, a new run of “Auto Expansion” was applied on the selected 3 VLPs in both tomogram (Figure 5b), then the particle-picking result was used for neural network training in “AI AutoPicking.” Fourth, after the particle prediction on all four tomograms with a trained model, “AI AutoPicking” produced 4,860, 3,704, 4,550 and 2,101 particles for tomograms TS_01, TS_43, TS_45, and TS_54, as shown in Figure 5c. Lastly, the predicted particles were input as “seed” particles for the final run of “Auto Expansion”, resulting in 5,765, 4,043, 5,006, and 2,838 particles for tomograms TS_01, TS_43, TS_45 and TS_54, which were imported into Relion to perform high-resolution refinements.
Following the same procedure carried out in the Relion4 tutorial together with TomoNet 3D classification, the Gag hexamer structure was resolved at 3.2 Å resolution with 13,558 particles from four tomograms. Resolution was calculated in Relion and on 3DFSC Processing Server.Reference Tan, Baldwin, Davis, Williamson, Potter, Carragher and Lyumkis 62 The global resolution reported is based on the “gold standard” refinement procedures and the 0.143 FSC criterion (Figure 6c).
4.7. Processing one tomogram of T. brucei Axoneme
The tomogram of T. brucei axoneme is from our previous work.Reference Imhof, Zhang, Wang, Bui, Nguyen, Atanasov, Hui, Yang, Zhou and Hill 21 Initially, one “seed” particle was manually picked for each DMT, followed by four iterations of “Auto Expansion” applied to 9 “seed” particles, resulting in a total of 75 particles. EMD-20012 was used for subtomogram placing back to validate our picking results and visualize the entire axoneme architecture.
4.8. Processing one tomogram of C. Crescentus S-layer
The FIB-milled C. crescentus data of one reconstructed tomogram was downloaded from Electron Microscopy Data Bank (EMDB) with the accession code EMD-23622.Reference Lasker, Boeynaems, Lam, Scholl, Stainton, Briner, Jacquemyn, Daelemans, Deniz, Villa, Holehouse, Gitler and Shapiro 54 This tomogram was directly used for “seed” particles preparation on two of the cells. Around 30 “seed” particles were manually picked and averaged using PEET to generate an initial reference map. “Auto Expansion” was applied on the “seed” particles for 5 iterations to get more particles such as to refine the reference map. With the improved reference map, another run of “Auto Expansion” was applied to the same “seed” particles for 15 iterations to search all particles on the outer surface of the cells, and finally yielded ~1,500 S-layer particles of hexamer subunits (Figure 9d).
4.9. Processing tomograms of NEC budding in vitro
The cryoET grid preparation and data collection were previously described.Reference Draganova, Wang, Wu, Liao, Vu, Gonzalez-Del Pino, Zhou, Roller and Heldwein 57 Motion correction, tomogram reconstruction, and CTF estimation were performed using TomoNet. Around 50–150 “seed” particles were manually picked for each tomogram. “Auto Expansion” was applied on a total of 35 tomograms and yielded the ~48,000 particles besfore Relion refinements. Following one round of 3D auto-refine jobs under four-binned pixel size and several rounds of 3D auto-refine jobs under two-binned pixel size and one round of 3D auto-refine under unbinned pixel size, together with TomoNet 3D classifications, the NEC hexamer structure was resolved at 5.4 Å resolution with totally 35,039 particles.
4.10. 3D visualization
IMODReference Kremer, Mastronarde and McIntosh 43 was used to visualize the 2D tomographic and segmentation map slices. UCSF ChimeraXReference Meng, Goddard, Pettersen, Couch, Pearson, Morris and Ferrin 61 was used to visualize the STA results and the lattices generated by the 3D subtomogram place back. The atomic models were fitted into the density map using the “fit in map” tool in ChimeraX.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S2633903X24000060.
Data availability statement
TomoNet code is available on the Github website at https://github.com/logicvay2010/TomoNet, with a user manual. For the HIV VLPs dataset, the raw data was downloaded from the Electron Microscopy Public Image Archive (EMPIAR) with accession code EMPIAR-10164,Reference Schur, Obr, Hagen, Wan, Jakobi, Kirkpatrick, Sachse, Kräusslich and Briggs 40 the Gag atomic model was downloaded from the Protein Data Bank (PDB) with accession code 5L93.Reference Schur, Obr, Hagen, Wan, Jakobi, Kirkpatrick, Sachse, Kräusslich and Briggs 40 For the C. crescentus S-layer dataset, the reconstructed tomogram was downloaded from the Electron Microscopy Data Bank (EMDB) with accession code EMD-23622,Reference Lasker, Boeynaems, Lam, Scholl, Stainton, Briner, Jacquemyn, Daelemans, Deniz, Villa, Holehouse, Gitler and Shapiro 54 and the subunit model was generated using an atomic model with PDB accession code 6P5T.Reference Herrmann, Li, Jabbarpour, Chan, Rajkovic, Matsui, Shapiro, Smit, Weiss, Murphy and Wakatsuki 63 The STA results of NEC hexamerReference Draganova, Wang, Wu, Liao, Vu, Gonzalez-Del Pino, Zhou, Roller and Heldwein 57 and HIV can be obtained from EMDB with accession codes EMD-40224 and EMD-43869, respectively.
Acknowledgments
We thank Elizabeth Draganova and Ekaterina Heldwein for the NEC dataset.
Author contribution
H.W. and Z.H.Z. initialized and Z.H.Z. supervised research. H.W. wrote the code and developed the software GUI with help from S.L. H.W., S.L., and X.Y. tested the software on different datasets. H.W. and Z.H.Z. wrote the manuscript. X.Y. and J.Z. assisted in writing of article. All authors reviewed and approved the paper.
Funding statement
We acknowledge funding from the US National Institutes of Health (GM071940 to Z.H.Z.) and the National Science Foundation (DMR-1548924 to Z.H.Z.).
Competing interest
The authors declare none.