Introduction
Optical tweezers force spectroscopy
Optical tweezers (OT) can isolate single nucleic acid (NA) molecules in suspension by tethering both ends to functionalized micron-sized beads which are held by laser trap(s). The system directly controls the end-to-end extension of the NA molecule based on the relative displacement of the two trapped beads while simultaneously measuring applied force based on the deflection of the trapping laser as the beads shift from the center of the trap. One method to measure the properties of the NA molecule is to produce a force–extension curve (FEC), in which the force response is measured over a wide range of extensions (Figure. 1a), typically by increasing or decreasing the extension in fixed steps at a continuous rate starting from low or high extensions, respectively. Both DNA and RNA are well represented by polymer chain models, with double-stranded (ds) and single-stranded (ss) polymers having very different properties. While dsNA is modeled as an extensible worm-like chain (WLC) (Baumann et al., Reference Baumann1997; Odijk, Reference Odijk2002), the end-to-end extension (X) of ssNA as a function of applied force (F) is best modeled as a freely jointed chain (Smith et al., Reference Smith1996):
Both ssDNA and ssRNA have a contour length (L) of ~0.56 nm/nt and a persistence length (P) of ~0.75 nm, which reflect the length and flexibility, respectively, of the sugar-phosphate backbone (Figure 1a). The backbone can elastically stretch as well, with elongation linearly proportional to applied force, with the elastic modulus (S, ~600 pN for ssDNA and ssRNA) indicating the force required to double the length of the substrate. Thus, the FEC has two main regimes: at low force, the FEC gently curves upwards as the ssNA molecule is straightened, approaching its contour length, and at high force, the extension continues to increase linearly past the contour length. Note that the FJC does not account for secondary structure, which can form at low force. Structures such as large defined hairpins found in certain biological systems can persist at higher forces. OT can be used to study specific interactions between these sequences and binding proteins, but here we focus on long (>1 knt), 50% GC content, substrates without stable secondary structure (>5 pN applied tension), in order to focus on non-sequence specific interactions between ssNA and proteins. That is, we average over the behavior of many sequences present on the substrate, as the proteins discussed below also must interact with a wide range of sequences to coat exposed ssNA, as opposed to interacting only with specific sequences or motifs.
The polymer properties of the ssNA molecule determine the shape of its FEC, and if its properties change, so too will its FEC. For instance, an increase in persistence length both lowers the force required to straighten the ssNA and sharpens the transition between the low-force entropic regime and the high-force elastic regime (Figure 1b). In comparison, a reduction in contour length proportionally decreases ssNA extension at all forces, since extension scales linearly with contour length. In particular, the binding of a protein (or other small molecule) to ssNA can drastically impact its conformation. During binding, a length of ssNA adheres to a binding surface on the protein, such that its conformation is determined by the structure of the NA–protein complex. This has two primary effects. First, if the binding interface is not straight, bound ssNA must bend to adhere to the protein, which effectively lowers the substrate’s contour length (Figure 1c). This effect is more pronounced for more circuitous paths through the binding surface, such as ssNA fully wrapping around the protein. Since ssDNA extension is directly proportional to contour length (eq. (1)), a consistent decrease in ssDNA extension is observed for all forces in the FEC. Second, since the binding interface is nearly always longer than the persistence length of bare ssNA (~0.75 nm or less than 2 nt), bound protein increases the effective persistence length (Figure 2c). The new effective persistence length can be determined by the number of nt bound by each individual protein or can become even longer if multiple proteins form a filament that remains rigid over many repeating protein subunits. ssDNA extension has a non-linear response to a change in persistence length (eq. (1)), such that increased persistence length greatly increases ssDNA extension at low force but only mildly increases extension at high force when ssDNA is mostly straightened.
Proteins binding to ssDNA typically display both these effects, such that both effects can be observed in a single FEC, with increased persistence length increasing extension at low force but decreased contour length decreasing extension at high force (Cashen et al., Reference Cashen2023; Morse et al., Reference Morse2019). However, applied force on the ssDNA may also impact the binding affinity and/or binding conformation of proteins, as tension inhibits wrapping.
In either case, when a ssNA substrate is incubated with a protein that has such an effect when bound, the extension change over time has the shape of exponential decay, the rate and amplitude of which depend on the fundamental nature of the protein–NA interaction as discussed below.
NA–protein binding kinetics
The simplest protein–NA interaction to characterize is one in which the protein has only one binding conformation (Figure 2a). Specifically, each bound protein has the same binding site size (occludes the same number of nts). In this case, the ssNA substrate can be modeled as an array of binding sites, where each site acts independently and can occupy one of two states, protein-free and protein-bound.
Transition rates between these two states depend on the specific protein–NA interaction (e.g. different proteins have different binding affinities), but the rates are equivalent for each identical binding site. The rate of binding (of unbound sites becoming bound) is dependent on the concentration of free protein (c) and is denoted as ck on, while the rate of dissociation (of bound sites becoming unbound) is concentration-independent and is denoted as k off. Zooming out, the degree of protein saturation (θ) for the entire NA substrate can be simply calculated as the number of binding sites occupied by a protein divided by the total number of binding sites.
This value will vary with free protein concentration and over time if the system is not in equilibrium. The OT system does not directly measure protein binding, but rather NA conformation (the substrate’s change in extension, ΔX). Note, that the measured NA length (X) and change in length (ΔX), are typically reported in normalized units of nm/nt, where the absolute length in nm is divided by the substrate length in nt. This allows results to be compared and interpreted consistently regardless of the specific substrate used in experiments, which is typically arbitrary and determined by (commercial) availability or technical constraints rather than a specific biological function. For example, dsDNA consistently has a normalized contour length of 0.34 nm/nt for both commercially available and widely used lambda phage DNA (48.5 kbp) and plasmid pUC19 (2.7 kbp), even though their absolute extensions vary by more than an order of magnitude.
If only one protein binding conformation is possible, then the extension change and degree of protein saturation are directly proportional.
The maximal extension change, which is achieved when the NA is fully protein-saturated, can be experimentally determined by titrating protein concentration (at sufficiently high protein concentration, the amplitude of the extension change will asymptote, indicating saturation). If all proteins bind in the same conformation, then each binding site (size N nt) will have its extension altered by a constant value Δx.
OT systems allow for changing free protein concentration around the isolated NA molecule, either by flowing in different buffers into the sample while keeping the trap stationary or by moving the trap into a different location within the sample. This drives the system out of equilibrium, which reveals the kinetics of the protein–NA interaction. When protein-free NA is suddenly introduced to free protein (incubation), the extension change over time takes the form of an exponential decay (Figure 2b).
The rate constant of equilibration (k eq) is the sum of the two fundamental rates of binding (k on) and dissociation (k off).
Thus, higher protein concentrations (c) result in the NA substrate reaching its equilibrium extension on a shorter timescale. k off can be easily isolated by measuring the rate of dissociation when free protein is removed from the sample (c = 0). In contrast, k on can be calculated from a single incubation–dissociation cycle by subtracting the dissociation rate (measured after protein is removed) from the equilibration rate (measured during protein incubation) and dividing by the protein concentration. Alternatively, the equilibration rate can be measured for several protein concentrations and then linearly fit, with the zero-concentration intercept and the concentration proportional slope yielding k off and k on, respectively.
Additionally, the amplitude of the extension change at equilibrium increases with protein concentration (Figure 2c), as the extension change of the NA is directly proportional to the degree of protein saturation (Eq. (4)). The fraction of binding sites occupied at equilibrium (θ eq) can be written in terms of the fundamental rates of binding and dissociation, or in relation to the effective dissociation constant (K D), which indicates the protein concentration at which the rates of binding and dissociation are equal, and equivalently, half of the binding sites are protein-occupied (Figure 2d).
The degree of protein saturation can also be directly measured based on the amplitude of the extension change at equilibrium, where the extension before protein introduction indicates θeq = 0 and the maximum extension change approached at the highest protein concentrations signifies θeq = 1. Agreement between calculations of these fundamental parameters using either rate or amplitude measurements confirms that the system follows a simple two-state mechanism, governed by bi-molecular binding. Thus, when agreement is not found, such as for the three different protein systems explored below, the possibility of proteins binding in more than one conformation must be explored.
Single-stranded binding proteins
Single-stranded binding proteins (SSBs) are highly abundant proteins that have been identified in all domains of life, including viruses, and eukaryotic and prokaryotic cell nuclei. All SSBs have ssDNA (or ssRNA) binding grooves that do not discriminate between the ssNA sequences but do not bind double-stranded (ds) NA regions. SSBs are generally involved in DNA (and in some viruses RNA) replication, repair, and recombination. SSBs are able to promptly engage all available ssDNA (and ssRNA) templates, thereby protecting them from degradation by nucleases, as well as eliminating NA secondary structure. SSBs are known to facilitate all NA metabolic processes, most notably, the rate of DNA replication by the polymerase complex. As an extremely important class of proteins, SSBs from many organisms have been studied extensively over the past several decades, with much information on their structure and ssDNA binding modes accumulated. Despite their similar roles in various organisms from bacteria to humans, SSBs were found to be surprisingly diverse in their structure, ssDNA binding modes, and binding cooperativity. Most importantly, despite decades of research, it remains unclear: (i) how variable amounts of ssDNA template always remain protected by the variable bulk concentrations of SSBs during DNA processing and (ii) how strongly, and often cooperatively, bound SSBs are able to promptly dissociate from ssDNA to clear the way for the rapidly moving polymerase complex as it synthesizes the complementary DNA strand with rates of ~100–1000 nt/s. Many SSBs have C-terminal unstructured or poorly structured anionic tails that compete with ssDNA for binding to their SSB binding sites, while also serving as the attachment points for multiple cellular proteins that regulate ssDNA processing. It was suggested that the binding of C-terminal tails to these regulatory factors helps to promptly dissociate SSBs from ssDNA. However, no direct evidence of such activity was provided. Moreover, SSBs are also routinely used to improve the yields of in vitro PCR reactions where no such cellular cofactors are provided.
Despite commonalities in function, different SSBs need not be structurally similar. In this review, we specifically examine three SSB proteins, from various sources (bacteria, bacteriophage, and retrotransposon). Despite these differences, we will show that they all exhibit similar collective behaviors when examined in single-molecule experiments. Perhaps the most well-studied such protein is the SSB of E. coli (EcSSB). EcSSB is a homotetramer, with each 19 kDa monomer comprising an N-terminal domain (NTD) containing an oligonucleotide binding (OB) fold (the protein’s ssDNA binding site/groove), a C-terminal domain (CTD) with a conserved 9-amino acidic tip, and a poorly conserved intrinsically disordered linker (IDL) (Antony et al., Reference Antony2013; Kozlov et al., Reference Kozlov2015; Raghunathan et al., Reference Raghunathan2000; Raghunathan et al., Reference Raghunathan1997; Tan et al., Reference Tan2017). The OB domain contains both the high-affinity DNA binding grooves and interfaces for interprotein interactions responsible for stable tetramerization. EcSSB can bind ssDNA in multiple conformations, typically identified by the total number of nt occluded, such that higher binding site size states wrap more NA substrate around the OB tetramer (Bujalowski & Lohman, Reference Bujalowski and Lohman1986; Bujalowski et al., Reference Bujalowski1988; Lohman et al., Reference Lohman1988; Lohman & Overman, Reference Lohman and Overman1985; Lohman et al., Reference Lohman1986). Free protein concentration (or equivalently protein:nt ratio), salt conditions, and template tension (for force spectroscopy studies) affect the occupancy of these distinct binding modes (Bujalowski & Lohman, Reference Bujalowski and Lohman1986; Bujalowski et al., Reference Bujalowski1988; Kozlov et al., Reference Kozlov2019; Lohman et al., Reference Lohman1988; Lohman & Overman, Reference Lohman and Overman1985; Lohman et al., Reference Lohman1986; Suksombat et al., Reference Suksombat2015). Three wrapping states are typically observed for EcSSB bound to ssDNA in the absence of applied tension, with binding site sizes of 65, 56, and 35 nt. However, OT experiments observing single proteins binding to a 70 nt ssDNA substrate identified binding of as little as 17 nt by an individual tetramer under increased tension (Suksombat et al., Reference Suksombat2015).
A model of the path through which a ~65 nt ssDNA substrate accesses the binding grooves of all four OB domains has been established based on X-ray crystallographic structural data (Raghunathan et al., Reference Raghunathan1997). While the exact topologies of other binding modes have not been structurally resolved, they are generally consistent with a discrete number of the OB folds being occupied by ssDNA. Some experiments have observed evidence of ssDNA segments as short as 8 nt binding to EcSSB, including sedimentation of 8 nt poly dT oligos with a stoichiometry of more than 3 oligos per tetramer (Krauss et al., Reference Krauss1981), the addition of a poly dT ssDNA overhang to a hairpin increases protein-mediated hairpin destabilization, (Grieb et al., Reference Grieb2017), and AFM observation of EcSSB localization to 8 nt poly dT overhangs at the end of dsDNA substrates (Naufer et al., Reference Naufer2021).
Even when NA-bound, EcSSB is highly dynamic. Single-molecule FRET experiments measured a dynamic equilibrium between structural states (Roy et al., Reference Roy2007), and diffusion of wrapped protein along its ssDNA substrate (Roy et al., Reference Roy2009). Fluorescent imaging of EcSSB-ssDNA complexes has even resolved multiple sequential kinetic steps, from measurements of the concentration-dependent rate of free protein binding (Kozlov & Lohman, Reference Kozlov and Lohman2002b) and the concentration-independent rate of wrapping (Kuznetsov et al., Reference Kuznetsov2006), to the slow addition of additional protein to an EcSSB-occupied substrate (Kunzelmann et al., Reference Kunzelmann2010) and direct transfer of an EcSSB tetramer between two different ssDNA substrates (Kozlov & Lohman, Reference Kozlov and Lohman2002a).
The depth of research on EcSSB makes it a good model to compare with other less well-defined protein systems. The results we observe using optical tweezers (Naufer et al., Reference Naufer2021) can be directly related to previous research using different experimental systems. Since then, we have also observed significant commonalities with how both the gene 32 protein of T4 bacteriophage (T4 gp32) (Cashen et al., Reference Cashen2023; Cashen et al., Reference Cashen2024a) and the open reading frame (ORF) 1 protein of the LINE 1 retrotransposon (L1-ORF1p) (Cashen et al., Reference Cashen2022; Cashen et al., Reference Cashen2024b) interact with an ssNA substrate. As we will discuss further, the known framework in which EcSSB interacts with ssDNA in multiple binding conformations can be generalized to explain both our experimental data and the function of these other proteins.
T4 gp32 is a 33.5 kDa monomer comprising three domains: a central ssDNA binding core (residues 22–253), a positively charged NTD (residues 1–21), and a negatively charged CTD (residues 254–301) (Karpel, Reference Karpel1990). The gp32 core domain binds ssDNA (7 nt occluded site size) in a small, positively charged cleft created by a single OB-fold, conferring the protein with largely sequence-independent binding and the ability to effectively discriminate against duplexed dsDNA (Shamoo et al., Reference Shamoo1995; Theobald et al., Reference Theobald2003; Wu et al., Reference Wu1999). gp32 forms highly stable protein filaments on ssDNA mediated by cooperative interprotein interactions between the NTD of a nucleic acid-bound monomer and the core domain of an adjacently bound protein (Casas-Finet et al., Reference Casas-Finet1992; Lonberg et al., Reference Lonberg1981). Previous light scattering and circular dichroism measurements suggested that NA-bound gp32 filaments wind ssDNA, resulting in a relatively stiff, helical protein-DNA structure (Kuil et al., Reference Kuil1990; Kuil et al., Reference Kuil1988; Scheerhagen et al., Reference Scheerhagen1989; Scheerhagen et al., Reference Scheerhagen1985a; Scheerhagen et al., Reference Scheerhagen1985b; van Amerongen et al., Reference van Amerongen1990). These findings were recapitulated by recent single-molecule DNA stretching experiments, which showed that cooperatively bound gp32 simultaneously rigidifies and compacts ssDNA, characterized by its increased persistence length and reduced contour length, respectively (Cashen et al., Reference Cashen2023), an expected signature of a helical protein filament (Griffith & Formosa, Reference Griffith and Formosa1985; Lee et al., Reference Lee2004; Takahashi & Norden, Reference Takahashi and Norden1994; Wu et al., Reference Wu2004; Xu et al., Reference Xu2017; Yang et al., Reference Yang2001; Yu et al., Reference Yu2004). The gp32 CTD, on the other hand, is believed to primarily help coordinate DNA replication via direct (heterotypic) interactions with other constituents of the T4 replisome (Alberts & Frey, Reference Alberts and Frey1970a; Alberts & Frey, Reference Alberts and Frey1970b; Krassa et al., Reference Krassa1991; Lefebvre et al., Reference Lefebvre1999; Morrical et al., Reference Morrical1996; Nelson et al., Reference Nelson2008) while also competing with the ssDNA for the central domain’s binding cleft, thereby moderating the strength of the individual protein ssDNA interactions in a salt-dependent manner (Pant et al., Reference Pant2005).
The exact structural details of the gp32-ssDNA complex remain incomplete. An initial X-ray crystal structure of the gp32 core (ssDNA binding) domain complexed to a dT6 oligonucleotide revealed only weak electron density for the ssDNA lattice bound within the protein’s OB-fold (Shamoo et al., Reference Shamoo1995), suggesting that ssDNA is fairly mobile within the gp32 binding groove, allowing the protein to freely translocate (slide) along ssDNA (Jose et al., Reference Jose2015a; Lee et al., Reference Lee2016; Lohman & Kowalczykowski, Reference Lohman and Kowalczykowski1981). However, the authors modeled four nucleotides of the dT6 chain into the gp32 binding cleft, and the resulting structure suggested that at least two nucleotides were tightly bound within the protein’s core. A more recent low-resolution crystal structure of gp32 in complex with the T4 Dda helicase and a dT17 oligo (He et al., Reference He2024) further defined the entire ssDNA binding surface of the gp32 monomer, as well as its interaction with Dda. Consistent with these structural studies, oligonucleotide-based binding measurements using proteolysis and DNA Tm depression methods demonstrated that at least 2–3 adjacent phosphodiester bonds are required for gp32-ssDNA binding (Wu et al., Reference Wu1999). However, this study also showed an increase in gp32-ssDNA binding affinity when the oligos were increased in length from 5 to 8 nt, suggesting that the number of interactive residues within the core may be variable and dependent on substrate length.
Bulk studies of gp32 binding have often utilized relatively short ssDNA substrates, limiting measurements of gp32-ssDNA dynamics to either single noncontiguous monomers or small clusters thereof (Camel et al., Reference Camel2021; Jose et al., Reference Jose2015a; Jose et al., Reference Jose2015b; Lee et al., Reference Lee2016; Lohman & Kowalczykowski, Reference Lohman and Kowalczykowski1981). However, the length of a typical Okazaki fragment in T4-infected E. coli is 1000–2000 nt (Maloy & Hughes, Reference Maloy and Hughes2013) (i.e., can accommodate ~150–300 proteins), indicating that greater ssDNA lengths are likely needed for a complete understanding of gp32 filament structure and organizational dynamics in vivo. In this regard, single-molecule DNA stretching experiments are able to extend our understanding of gp32 behavior by probing its binding to long ssDNA substrates, which may accommodate >1000 proteins. Previous measurements on force-melted λ-phage DNA revealed how competing interactions of the acidic CTD for access to the protein’s OB-groove regulate its salt-dependent binding to ssDNA (Pant et al., Reference Pant2004; Pant et al., Reference Pant2005; Pant et al., Reference Pant2003; Rouzina et al., Reference Rouzina2005). These studies also helped explain the origin of the “kinetic block” to dsDNA helix-destabilization (melting) by full-length gp32 that was observed in thermal melting experiments.
Similar to EcSSB, gp32 has the seemingly paradoxical requirement to both stably bind and protect regions of ssDNA transiently exposed during replication while also ensuring their rapid release upon synthesis of the complementary strand. While gp32’s high-affinity binding facilitates efficient coating of the discrete Okazaki fragments, such stable and highly cooperative binding could prevent the protein from being easily displaced from its ssDNA template. Indeed, previous stopped-flow measurements revealed that gp32 primarily dissociates from the ends of its cooperative clusters and that the rate of unbinding is too slow to account for the observed rate of DNA synthesis by T4 polymerase (Lohman, Reference Lohman1984a; Lohman, Reference Lohman1984b). Efficient protein recycling during DNA replication remains an important, open question for all SSBs.
LINE-1 (L1) is an intragenomic parasitic DNA element, comprising ~20% of the human genome, that amplifies within its host through a “copy-paste” mechanism known as retrotransposition (Furano, Reference Furano2000; Goodier et al., Reference Goodier2013; Kazazian & Moran, Reference Kazazian and Moran2017; Lander et al., Reference Lander2001). L1 encodes two proteins, ORF1p and ORF2p, which assemble on their encoding transcript (cis preference) to form the L1 ribonucleoprotein (RNP), an essential intermediate of retrotransposition (Doucet et al., Reference Doucet2010; Howell & Usdin, Reference Howell and Usdin1997; Kulpa & Moran, Reference Kulpa and Moran2005; Kulpa & Moran, Reference Kulpa and Moran2006; Martin, Reference Martin1991; Martin, Reference Martin2010; Moran et al., Reference Moran1996; Sahakyan et al., Reference Sahakyan2017). ORF2p provides reverse transcriptase and endonuclease activity (Feng et al., Reference Feng1996; Luan et al., Reference Luan1993; Mathias et al., Reference Mathias1991; Miller et al., Reference Miller2021; Moran et al., Reference Moran1996; Thawani et al., Reference Thawani2024). ORF1p, the major component of the L1 RNP, is a homotrimeric phosphoprotein that binds single-stranded nucleic acid (ssNA) nonspecifically with high affinity and exhibits NA chaperone activity (i.e., facilitates annealing and exchange of NA strands).
ORF1p contains a 51 amino acid intrinsically disordered NTD, which harbors two highly conserved phosphorylation sites necessary for retrotransposition (Cook et al., Reference Cook2015; Furano & Cook, Reference Furano and Cook2016), followed by a 14-heptad coiled-coil (CC), which mediates the trimerization of ORF1p monomers (Boissinot & Sookdeo, Reference Boissinot and Sookdeo2016; Callahan et al., Reference Callahan2012; Khazina et al., Reference Khazina2011; Khazina & Weichenrieder, Reference Khazina and Weichenrieder2018; Martin et al., Reference Martin2003). The ORF1p coiled coil is evolutionarily labile (subject to rampant amino acid substitutions) (Furano et al., Reference Furano2020). However, despite such variability, mutational analysis has shown that ORF1p activity is exquisitely sensitive to its CC sequence (Adney et al., Reference Adney2019; Goodier et al., Reference Goodier2007; Naufer et al., Reference Naufer2016), suggesting that the persistence of L1 activity requires periodic remodeling of the ORF1p coiled coil. In contrast, the carboxy-terminal half is highly conserved and comprises two domains: a noncanonical RNA recognition motif (RRM) (Khazina & Weichenrieder, Reference Khazina and Weichenrieder2009), which contains two additional phosphorylation sites required for retrotransposition, and a CTD, which terminates in a 46 amino acid intrinsically disordered sequence.
Residues within the RRM and CTD endow ORF1p with high-affinity ssNA binding and NA chaperone activity in vitro. However, these properties are only evident in the context of the trimer (Basame et al., Reference Basame2006; Callahan et al., Reference Callahan2012; Januszyk et al., Reference Januszyk2007; Khazina et al., Reference Khazina2011; Khazina & Weichenrieder, Reference Khazina and Weichenrieder2009; Kolosha & Martin, Reference Kolosha and Martin2003; Kulpa & Moran, Reference Kulpa and Moran2005; Martin, Reference Martin2010; Martin et al., Reference Martin2003; Martin et al., Reference Martin2008; Martin et al., Reference Martin2005; Martin et al., Reference Martin2000; Moran et al., Reference Moran1996). Mutations in the RRM or CTD domains that eliminate NA chaperone activity also abolish retrotransposition, suggesting a primary role of ORF1p chaperone activity in L1 replication (Martin et al., Reference Martin2005). However, the mechanistic details of this activity are not known. FRET-based assays showed that ORF1p can stabilize mismatched oligonucleotide duplexes (Callahan et al., Reference Callahan2012), which are likely to be encountered during the hybridization of the target-site DNA and L1 transcript to generate a productive primer for cDNA synthesis by ORF2p.
Review of single-molecule experiments
Measuring multimode binding
Recently published work using similar OT single molecule techniques for three different protein systems observed strong evidence that more than one binding state must be present (Cashen et al., Reference Cashen2023; Cashen et al., Reference Cashen2024a; Cashen et al., Reference Cashen2022; Cashen et al., Reference Cashen2024b; Naufer et al., Reference Naufer2021). In particular, incubation experiments show a non-monotonic extension change response, both in terms of extension over time and equilibrium extension change (Figure 3a–c). Low protein concentrations (order of 1 nM) result in incubation curves that resemble simple bimolecular binding, as the ssDNA’s extension decreases monotonically before approaching a highly compact equilibrium. However, while increasing protein concentration does increase the rate of initial compaction, the final equilibrium compaction is reduced as the extension begins to increase over time upon reaching a minimum value. Thus, to analyze the kinetics of the incubation process, the data must be split into two regimes (Figure 3d). First, the ssDNA extension decreases as the binding proteins wrap and compact the substrate. Eventually, the ssDNA must be sufficiently saturated such that in order to accommodate additional protein, already bound protein must decrease its binding site size by switching to a less wrapped, and less compacted, state. As a result, both rates (initial compaction and subsequent elongation) increase with free protein concentration, but the secondary binding rate is an order of magnitude slower. This discrepancy in rate can be explained by the energetic barrier of partially unwrapping already bound protein to accommodate additional protein into the saturated complex, which is not a necessary step when the ssNA is initially protein-free. The equilibrium ssDNA extension at equilibrium can also be interpreted as a competition between a more and a less compacted binding state by extending Eq. (2) into a two-step, three-state reaction (Naufer et al., Reference Naufer2021).
In this scheme, the binding and wrapping of ssNA by protein are separated into two distinct steps. If the maximally and minimally compacted ssDNA extensions observed are associated with full occupancy of the wrapped (θ w) and unwrapped state (θ u), respectively, then the occupancy of both states can be calculated for any intermediate extension through interpolation (Figure 3e). Such analysis shows a smooth transition between the two states as a function of protein concentration that can be reproduced by simulating the three-state reaction in Eq. (8) or by approximating the interconversion between the states as a simple binding isotherm. The analysis shown here for EcSSB (Naufer et al., Reference Naufer2021) was also used to separate the multiple binding states and kinetic steps for L1-ORF1p (Cashen et al., Reference Cashen2022; Cashen et al., Reference Cashen2024b) and T4 gp32 (Cashen et al., Reference Cashen2023; Cashen et al., Reference Cashen2024a). One advantage of using EcSSB as a model system is that its ability to interconvert between different wrapping states with different binding site sizes has been well-documented using many different experimental assays. For example, one assay that also directly measures the conformation of an NA substrate (rather than directly measuring protein bound) consists of a FRET pair of dyes on both ends of a binding substrate (Roy et al., Reference Roy2007)(Figure 3f). When no protein is present, the ends are effectively uncoupled, leading to low FRET intensity. For EcSSB, when one protein tetramer binds a 70 dT substrate, the two ends are brought together, resulting in high FRET efficiency. In contrast, if two tetramers bind the ssNA simultaneously (each occupying 35 nt), the resulting structure places the two dyes further apart. As a result, EcSSB titration experiments with this assay also return a non-monotonic response with concentration, similar to the OT experiments described above. In contrast, any experiment that directly measures bound protein (such as directly fluorescently labeling the protein) will instead simply measure a continuous increase in signal as free protein concentration is increased, obfuscating conformational transitions of the protein–NA complex.
Evidence of facilitated dissociation
Removing protein from the sample allows for the direct observation of protein dissociation from the ssNA substrate. One issue with working with proteins with very strong binding affinity, however, is that the binding to the substrate may be too stable to observe measurable dissociation on experimental timescales. For EcSSB binding to ssDNA held at low force, removing free protein does not result in significant ssDNA extension change (Naufer et al., Reference Naufer2021) (Figure 4a). Subsequently increasing the EcSSB concentration, however, does result in a change in ssDNA extension. Compaction is reduced, and the ssDNA equilibrates to the same length as when the substrate is initially incubated with the same high protein concentration. In contrast to the equilibrium reached when the ssDNA is incubated with low EcSSB concentration, this less compact state is very unstable, and removing free protein at this point causes the ssDNA to quickly recompact. After reaching the same highly compact state observed during the initial incubation with low protein concentration, the system becomes stable again and no further extension change is observed. These result demonstrate that even under conditions where EcSSB does not fully dissociate (protein-free ssDNA is never recovered), the protein-NA complex must be able to rapidly reorganize based on the local density of protein present.
If the ssDNA tension is increased, binding is further destabilized such that full protein dissociation can be observed. That is, eventually the ssDNA returns to its original protein-free conformation (Figure 4b). However, even under these conditions, the ssDNA-EcSSB complex first rapidly recompacts when free protein is removed before slowly elongating as the rest of the protein dissociates. The same effect is again observed for L1-ORF1p (Cashen et al., Reference Cashen2022; Cashen et al., Reference Cashen2024b) (Figure 4c) and T4 gp32 as well (Cashen et al., Reference Cashen2023; Cashen et al., Reference Cashen2024a) (Figure 4d).
Similar to the incubation experiments, the dissociation data must be split into two distinct regimes. The rapid reorganization that results from protein re-wrapping the ssDNA when excess protein is removed occurs at a constant rate regardless of the initial EcSSB concentration during incubation (Figure 4e). For EcSSB, when compared to the two rates of binding observed during incubation, this initial dissociation rate intersects the initial binding rate at <1 nM, consistent with low protein concentration able to fully saturate the ssDNA substrate. In contrast, the secondary binding rate asymptotes to the initial dissociation rate at a high concentration (>10 nM), consistent with the high EcSSB concentration needed for the less wrapped state to become dominant. The secondary dissociation rate, where the ssDNA extends back to its original protein-free length, is much slower and is only detectable under conditions that augment full dissociation, such as destabilizing binding using excess applied force or high salt buffer that screens electrostatic binding interactions (Figure 4f). The order of magnitude difference between these two dissociation rates indicates that some biophysical process must be stimulating protein dissociation from this oversaturated state. This can be explained by the presence of multiple binding modes with different effective binding site sizes. If the ssDNA is protein oversaturated (bound protein is in its lower binding site size state to accommodate excess protein), then when a single protein dissociates, neighboring proteins can switch to the higher binding site size state, effectively absorbing the released NA substrate (Figure 4g). This process, in which the ssNA substrate released by a dissociating protein is reabsorbed by another protein, is referred to as facilitated dissociation (Erbaş & Marko, Reference Erbaş and Marko2019). In contrast, when protein dissociates from an under-saturated substrate, protein-free NA is left behind. Thus, full dissociation has an additional energy barrier due to the loss of protein–NA biding energy, while facilitated dissociation is closer to an isoenergetic process in which the loss of the final ssNA contact before a protein is released from the substrate is replaced by the analogous interaction with a neighboring protein.
Protein structure and function
The ssNA interactions exhibited by all three proteins discussed here can be related to the common features of their complexes with ssNA, despite clear differences in structure and multimerization (Figure 5a). EcSSB and L1-ORF1p naturally form homotetramers and homotrimers, respectively. As a result, each homo-oligomer intrinsically has multiple binding domains. If each binding domain can bind ssNA semi-independently, with the substrate winding around the protein oligomer in different conformations to access the binding grooves of a discrete number of subunits, then the proteins inherently have the ability to alter their effective binding site size.
T4 gp32, conversely, is primarily monomeric in solution. However, interactions between the NTD and the core domains of neighboring proteins bound to an ssDNA substrate confer the protein with cooperative binding, resulting in the formation of long protein clusters on the substrate. Again, this results in many binding interfaces present on each homo-oligomeric filament, which must allow for the modulation of protein:ssDNA stoichiometry. Our results are consistent with the ssDNA substrate helically winding around the protein filament while remaining highly dynamic due to the ability to partially unwind to accommodate additional protein at high protein:ssDNA stoichiometries (Cashen et al., Reference Cashen2023). Moreover, we found that critical protein oversaturation resulted in filament unwinding such that the cooperative interprotein interactions largely vanished, enabling rapid protein displacement from across the entire ssDNA substrate, relieving oversaturation.
However, the structural differences between these proteins confer different biophysical interactions with the ssNA that are measurable using OT. First, L1-ORF1p differs from the other proteins in that also exhibits an RNA packaging function mediated by intertrimer interactions, in which the protein must stably bind and compact a copy of the L1 sequence. Correspondingly, when an L1-ORF1p saturated ssDNA is held at low force (≤5 pN), the substrate continues to compact over time (Figure 5b). Additionally, when left for increasingly long incubation times, high-force stretching of the ssDNA reveals permanent compaction. When WT L1-ORF1p is replaced with an inactive (retrotransposition-deficient) mosaic of modern and ancestral strains of L1, this secondary compaction ability is lost (Cashen et al., Reference Cashen2022). As T4 gp32 forms long, continuous protein filaments rather than discrete tetramers or trimers, the length scale of interprotein interaction is greatly increased relative to that observed for L1 ORF1p or EcSSB. In addition to decreasing the contour length of the protein-saturated ssDNA substrate, a large (~30-fold) increase in persistence length is also observed (Figure 5c). ssDNA bound with T4 gp32 has an effective persistence length of approximately 20 nm, which is much longer than the length scale of the protein itself (Cashen et al., Reference Cashen2023). Thus, the orientation of neighboring proteins in the filament must be preserved, resulting in the protein–ssNA complex behaving like a semirigid structure.
Our hypothesis that the dynamic multistate binding of these proteins is related to their ability to multimerize is supported by comparison experiments with non-multimerizing protein variants that exhibit neither oversaturated protein binding nor facilitated dissociation. Instead, simple single-state binding is recovered (Figure 6). First, a point mutation in EcSSB’s OB domain (H55Y) prevents the formation of homotetramers in solution but does not inhibit the binding of the domain to ssDNA (Figure 6a) (Naufer et al., Reference Naufer2021). As a result, the binding of this monomeric mutant results in minimal ssDNA compaction, similar in amplitude to oversaturating WT protein conditions, where wrapping is destabilized to accommodate additional protein binding. Additionally, the binding is completely reversible, and the protein immediately begins dissociating when free protein is removed, extending the ssDNA back to its protein-free conformation. However, the rates of initial binding and full dissociation are still consistent with the rates observed for WT protein. Similarly, truncation of the L1-ORF1p protein at the 128th residue (m128p), removing the NTD and 10.5 heptads of the 14-heptad coiled coil, prevents trimer formation while leaving the binding domains in the RRM-CTD intact (Cashen et al., Reference Cashen2024b). Again, compaction is reduced and single-phased, and binding is reversible (Figure 6b). Finally, truncation of T4 gp32 to remove the entire NTD domain responsible for cooperative binding has the same effect (Figure 6c) (Cashen et al., Reference Cashen2023). Besides removing multistate binding, the large increase in persistence length is no longer observed, but is instead consistent with the length scale of a single binding site, indicating that neighboring proteins are no longer associated with a continuous filament, but instead act independently of one another.
Biological function of variable conformation binding
The role of these proteins’ multistate binding may be related to their need to perform the seemingly opposed functions of stably binding ssNA for protection while remaining dynamic enough to reorganize as NA processing proceeds. All three proteins bind lengths of ssNA that must be polymerized into ds form, ssDNA Okazaki fragments formed during lagging strand synthesis for EcSSB and T4 gp32, and during replication of the L1 RNA transcript for L1 ORF1p. When first formed, the ssNA region has a discrete length and the limited pool of binding proteins must be sufficient to fully saturate this length, and all other ssNA regions present at a given time (Figure 7). Under these conditions, it is beneficial that ssNA binding proteins can occupy and occlude as many nts as possible. For example, by fully wrapping around all available domains, 65 nt ssNA (~35 nm of linear length) can be fully occluded by a single EcSSB tetramer. However, for polymerization to proceed, these occluded nt must eventually be accessed by the polymerase enzyme. As the ssNA binding proteins do not naturally dissociate on a short timescale (as required by the protection function), the proteins must be removed by an active process. One candidate is a specific interaction between the polymerase and the binding proteins. Such interactions would be limited, however, to a single protein at the ds-ss junction and would have to proceed in a stepwise fashion. The ability of such strongly bound proteins to be removed one at a time in sufficiently rapid sequence as to allow efficient polymerization would appear difficult. However, the presence of multiple binding states alleviates this bottleneck. As polymerization proceeds, bound proteins can switch to a lower binding site state, giving the polymerase access to additional nts. Additionally, since these states have reduced contact with the ssNA substrate, binding is weakened, and dissociation is enabled for all proteins across the ssNA region. Finally, this mechanism also allows for facilitated dissociation, as observed in the OT experiments, where the dissociation of proteins from an oversaturated substrate is an order of magnitude faster than dissociation that leaves behind bare ssNA. Thus, the ssNA binding proteins can promptly reorganize and dissociate in front of the advancing polymerase so as to not delay DNA synthesis.
Conclusions
Our OT experiments shed new light on the mechanism of prompt protein dissociation in front of a moving DNA polymerase. We observed that ssNA oversaturation (crowding) with any of the three studied ssNA binding proteins leads to rapid non-cooperative dissociation of excess protein from along the ssNA template. This mechanism is enabled by the structure of the proteins and the oligomers they form, in which multiple binding interfaces are present, allowing proteins to interconvert between stable and unstable conformations with distinct dissociation kinetics. Competition between these binding interfaces, in which the NA substrate released from one site can be reabsorbed by another empty site, enables rapid protein reorganization and facilitated dissociation.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2024.21.
Data availability statement
All data available in previously published manuscripts and upon reasonable request from M.C.W.
Author contribution
M.M., B.A.C., I.R. wrote the original manuscript. M.M., B.A.C., I.R., and M.C.W. reviewed and edited the manuscript.
Financial support
This work was supported by the National Science Foundation [MCB-1817712 to M.C.W.]
Competing interest
The authors declare no conflicts of interest.
Comments
Fredrik Westerlund and Felix Ritort
Guest Editors, QRB Discovery
June 24, 2024
Dear Fredrik and Felix,
We are pleased to submit our manuscript entitled “Diverse single-stranded nucleic acid binding proteins enable both stable protection and rapid exchange required for biological function” by Michael Morse, Benjamin A. Cashen, Ioulia Rouzina, and Mark C. Williams, to be considered for publication in QRB Discovery as an invited review in the special issue “Special Collection on Single Molecule Challenges in the 21st Century”. This review details recent challenges emerging in the measurement of interactions between a wide variety of proteins and single stranded nucleic acids at a single molecule level.
Force spectroscopy tools are sensitive to minor changes in nucleic acid conformation, revealing details of biophysical interaction with binding proteins. However, a wide range of proteins are able to bind nucleic acids in multiple conformations, leading to data which convolves multiple different kinetic steps and must be analyzed and modeled as a multi-state system. We review recent studies that examine three different binding proteins, of vastly different origin and structure, that nevertheless exhibit remarkably similar behavior where binding conformation and stoichiometry are altered in response to changes in protein-nucleic acid ratio. Furthermore, we demonstrate that this behavior can be related to the common biological function of these protein systems, abetting the polymerization of duplex DNA. These results also elucidate the seemingly paradoxical function of these proteins in both stably binding and protecting single-stranded nucleic acids for protection, while simultaneously remaining dynamic enough to rapidly reorganize to allow continued nucleic acid processing.
We suggest the following reviewers:
Gijs Wuite, VU University Amsterdam, gwuite@nat.vu.nl (expert on single molecule biophysics, force spectroscopy)
Jie Yan, National University of Singapore, phyyj@nus.edu.sg (expert on single molecule biophysics, force spectroscopy)
Taekjip Ha, Harvard Medical School, taekjip.ha@childrens.harvard.edu (expert on single molecule studies of SSB)
Yann Chemla, University of Illinois, ychemla@illinois.edu (expert on single molecule studies of SSB)
Antoine van Oijen, University of Wollongong, vanoijen@uow.edu.au (expert on single molecule studies of E. coli replication)
Erwin Peterman, VU University Amsterdam, e.j.g.peterman@vu.nl (expert on single molecule studies of nucleic acid-protein interactions)
Sincerely,
Mark C. Williams, Professor and Chair of Physics