A molecular engineering toolbox for the structural biologist

Galia T. Debelouchina; Tom W. Muir

doi:10.1017/S0033583517000051

A molecular engineering toolbox for the structural biologist

Published online by Cambridge University Press: 02 May 2017

Galia T. Debelouchina and

Tom W. Muir

Show author details

Galia T. Debelouchina: Affiliation:
Department of Chemistry, Princeton University, Princeton, NJ 08540, USA
Tom W. Muir*: Affiliation:
Department of Chemistry, Princeton University, Princeton, NJ 08540, USA
*: *Author for correspondence: T. W. Muir, Department of Chemistry, Princeton University, Princeton, NJ 08540, USA. Email: muir@princeton.edu

Article contents

Abstract
Introduction
Advances and challenges in structural biology
Molecular engineering toolbox for complex biological samples
Protein engineering approaches for tackling outstanding challenges in structural biology
Outlook
References

Rights & Permissions

Abstract

Exciting new technological developments have pushed the boundaries of structural biology, and have enabled studies of biological macromolecules and assemblies that would have been unthinkable not long ago. Yet, the enhanced capabilities of structural biologists to pry into the complex molecular world have also placed new demands on the abilities of protein engineers to reproduce this complexity into the test tube. With this challenge in mind, we review the contents of the modern molecular engineering toolbox that allow the manipulation of proteins in a site-specific and chemically well-defined fashion. Thus, we cover concepts related to the modification of cysteines and other natural amino acids, native chemical ligation, intein and sortase-based approaches, amber suppression, as well as chemical and enzymatic bio-conjugation strategies. We also describe how these tools can be used to aid methodology development in X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy and in the studies of dynamic interactions. It is our hope that this monograph will inspire structural biologists and protein engineers alike to apply these tools to novel systems, and to enhance and broaden their scope to meet the outstanding challenges in understanding the molecular basis of cellular processes and disease.

Type: Review
Information: Quarterly Reviews of Biophysics , Volume 50 , 2017 , e7

DOI: https://doi.org/10.1017/S0033583517000051 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

1. Introduction

Since the first crystal structure of myoglobin (Kendrew et al. Reference Kendrew, Bodo, Dintzis, Parrish, Wyckoff and Phillips1958), the three-dimensional (3D) reconstruction image of T4 bacteriophage tails by electron microscopy (EM) (De Rosier & Klug, Reference De Rosier and Klug1968), and the solution nuclear magnetic resonance (NMR) structure of proteinase inhibitor IIA (Williamson et al. Reference Williamson, Havel and Wuthrich1985), structural biology has made tremendous strides toward revealing intimate atomic level details that guide the function of biological molecules. We live at a time when we know the structures of more than 120 000 (and counting – Source: http://www.rcsb.org/pdb/statistics/holdings.do) biological macromolecules, when we can visualize the inner workings of the ribosome (Ben-Shem et al. Reference Ben-Shem, Garreau DE Loubresse, Melnikov, Jenner, Yusupova and Yusupov2011), or the nucleosome interactions that preserve the integrity and identity of our genome (Luger et al. Reference Luger, Mader, Richmond, Sargent and Richmond1997). At the same time, advances in instrumentation engineering have pushed the frontiers of structural biology methodologies and have allowed experiments and accomplishments that would have been unthinkable 30 years ago. Thus, it is now possible to record high-resolution movies of fast protein motions using X rays (Tenboer et al. Reference Tenboer, Basu, Zatsepin, Pande, Milathianaki, Frank, Hunter, Boutet, Williams, Koglin, Oberthuer, Heymann, Kupitz, Conrad, Coe, Roy-Chowdhury, Weierstall, James, Wang, Grant, Barty, Yefanov, Scales, Gati, Seuring, Srajer, Henning, Schwander, Fromme, Ourmazd, Moffat, Van Thor, Spence, Fromme, Chapman and Schmidt2014), obtain cryo-EM electron density maps at sub 3 Å resolution (Campbell et al. Reference Campbell, Veesler, Cheng, Potter and Carragher2015; Merk et al. Reference Merk, Bartesaghi, Banerjee, Falconieri, Rao, Davis, Pragani, Boxer, Earl, Milne and Subramaniam2016), or record multidimensional NMR spectra of protein crystals (Igumenova et al. Reference Igumenova, Mcdermott, Zilm, Martin, Paulson and Wand2004). Yet, the task in front of the structural biologist is getting harder and harder. The wealth of structural, biochemical and biological data has revealed that many mammalian cellular proteins are very large (>50 kD) (Brocchieri & Karlin, Reference Brocchieri and Karlin2005), that they are often part of complex assemblies composed of many interchangeable molecular players, and that their function is often defined and regulated by an intricate layer of post-translational modifications (PTMs). In addition, many disease-related biological macromolecules do not have a defined secondary or tertiary structure at all, and function, instead, through intrinsic disorder and numerous weak, transient interactions (Hyman et al. Reference Hyman, Weber and Julicher2014; Tompa, Reference Tompa2012). To make sense of this complicated, multilayered and often chaotic biological world, the structural biologist will become more and more dependent on the ability of protein engineers to faithfully and efficiently reproduce this complexity in the test tube.

Analogous to the advances in instrumentation design and engineering that have allowed structural biology to travel far, the tools of protein engineering have also become much more sophisticated, efficient and ultimately broader in scope over time. It is now possible to routinely synthesize polypeptide chains that are 50 amino acids long, to stitch them together into much longer chains without leaving any chemical scars (Dawson et al. Reference Dawson, Muir, Clark-Lewis and Kent1994), and to decorate them with PTMs, biophysical probes and chemical moieties that perturb or enhance their function. It is also possible to ‘persuade’ the cellular protein synthesis machinery to produce polypeptide chains incorporating completely unnatural amino acids, thus expanding the genetic code of engineered living organisms (Wang et al. Reference Wang, Brock, Herberich and Schultz2001). The current protein engineering toolbox contains many biocompatible chemical reactions, proteins with unique polypeptide ‘stitching’ abilities, and concepts and ideas that might ultimately prove essential in solving the interesting and relevant structural biology problems of today (Fig. 1). As the structural biologist might not be aware of all the current developments in protein chemistry, we intend this review as a resource that describes the state-of-the-art protein engineering tools, keeping an eye on the past and future to provide context for their limitations and the exciting new possibilities that undoubtedly lie ahead. We start with a very brief overview of recent advances in X-ray crystallography, cryo-EM and NMR, and outline challenges where the tools of protein engineering might be the most impactful. We then describe the contents of the molecular engineering toolbox that allow the construction of large modified proteins and complex macromolecular assemblies. We continue with a discussion of the concepts and ideas that directly concern structural biology methodology development. Our monograph ends with an outlook toward emerging trends in structural and chemical biology and exciting new developments that will guide the two fields in the future.

Fig. 1. Molecular engineering toolbox for the structural biologist.

2. Advances and challenges in structural biology

2.1 X-ray crystallography

The workhorse of structural biology, X-ray crystallography, is more than 100 years old and has contributed nearly 90% of the macromolecular structures deposited in the PDB (http://www.rcsb.org/pdb/statistics/holdings.do). Protein engineering has long been part of the everyday life of the crystallographer as mutations, truncations and fusion proteins are often required to ‘trick’ proteins into adopting a crystal form. There are many crystal structures of proteins containing PTMs or their analogs, and chemical approaches are often used to trap interesting functional states, stabilize dynamic interactions or aid the formation of crystals (e.g. racemic crystallography (Yeates & Kent, Reference Yeates and Kent2012)). Yet, the voracious need to test hundreds if not thousands of single crystal growth conditions has certainly challenged the protein chemist to optimize her tools and deliver relevant samples with much greater yields. Recent instrumentation developments such as X-ray free-electron lasers may potentially alleviate this need as these sources allow the acquisition of room-temperature data from easier to obtain micro-, nano- and 2D crystals (Neutze et al. Reference Neutze, Branden and Schertler2015). Currently, there is also a growing demand for the construction of homogeneous, chemically well defined and stable complex biological assemblies such as those relevant for chromatin biology, for example.

2.2 Cryo-EM

Exciting developments in the last few years have propelled cryo-EM into the spotlight and turned this method into a mainstream and vital structural biology technique that can achieve crystallographic resolution (Cheng, Reference Cheng2015; Nogales, Reference Nogales2016). The commercialization of direct electron-detection cameras has allowed the acquisition of images with higher contrast and fast readouts that can overcome beam-induced motion and radiation damage (Brilot et al. Reference Brilot, Chen, Cheng, Pan, Harrison, Potter, Carragher, Henderson and Grigorieff2012; McMullan et al. Reference Mcmullan, Chen, Henderson and Faruqi2009). On the other hand, improvements in data analysis approaches have made it possible to characterize heterogeneous samples and even rare structural states (Fernandez et al. Reference Fernandez, Bai, Hussain, Kelley, Lorsch, Ramakrishnan and Scheres2013; Scheres, Reference Scheres2012). Coupled with other cryo-EM advantages (no need for crystallization and only small amounts of sample required), these advances have made it possible to obtain subnanometer (and in some cases <3 Å) resolution maps of integral membrane proteins (Matthies et al. Reference Matthies, Dalmas, Borgnia, Dominik, Merk, Rao, Reddy, Islam, Bartesaghi, Perozo and Subramaniam2016), biological polymers (von der Ecken et al. Reference Von der Ecken, Muller, Lehman, Manstein, Penczek and Raunser2015), chromatin (Song et al. Reference Song, Chen, Sun, Wang, Dong, Liang, Xu, Zhu and Li2014), as well as biological assemblies such as the transcription and translation initiation complexes (Fernandez et al. Reference Fernandez, Bai, Hussain, Kelley, Lorsch, Ramakrishnan and Scheres2013; He et al. Reference He, Fang, Taatjes and Nogales2013; Plaschka et al. Reference Plaschka, Hantsche, Dienemann, Burzinski, Plitzko and Cramer2016). In this context, protein chemistry and engineering can have a tremendous benefit for the cryo-EM structural biologist in the design and construction of relevant biological samples such as post-translationally modified proteins. Perhaps more importantly, however, chemical biology approaches such as cross-linking can allow the preparation of samples that are more robust and do not fall apart during sample vitrification. The addition of cross-linkers can also be extremely useful in integrated cryo-EM/mass-spec structural approaches for samples where the high-resolution identification of protein–protein interfaces might not be possible (Leitner et al. Reference Leitner, Faini, Stengel and Aebersold2016).

2.3 NMR spectroscopy

NMR spectroscopy detects the magnetic properties of nuclei in molecules, which in turn provide a window into their surrounding chemical environment. Uniquely suited to probe molecular structure and dynamics in solution at physiologically relevant conditions (temperature, pH and salts) and ultimately non-destructive in its readout, NMR spectroscopy has long been battered by its intrinsically low sensitivity. The introduction of ‘NMR-visible’ isotopic labels into biological macromolecules has become a standard practice in the field, and efficient molecular engineering approaches that allow the installation of nuclear isotopes at specific positions within the polypeptide or polynucleotide chain are highly desirable. Recent advances such as (methyl)-TROSY and dark-state exchange saturation transfer experiments have pushed the molecular size limits of solution NMR into the MDa regime (Fawzi et al. Reference Fawzi, Ying, Ghirlando, Torchia and Clore2011; Pervushin et al. Reference Pervushin, Riek, Wider and Wuthrich1997; Tugarinov et al. Reference Tugarinov, Hwang, Ollerenshaw and Kay2003), while the rapid instrumentation and pulse sequence developments in magic angle spinning NMR have made it possible to pursue the structures of large biological polymers such as amyloid fibrils (Fitzpatrick et al. Reference Fitzpatrick, Debelouchina, Bayro, Clare, Caporini, Bajaj, Jaroniec, Wang, Ladizhansky, Muller, Macphee, Waudby, Mott, De Simone, Knowles, Saibil, Vendruscolo, Orlova, Griffin and Dobson2013; Lu et al. Reference Lu, Qiang, Yau, Schwieters, Meredith and Tycko2013; Wasmer et al. Reference Wasmer, Lange, Van Melckebeke, Siemer, Riek and Meier2008), bacterial secretion needles (Loquet et al. Reference Loquet, Sgourakis, Gupta, Giller, Riedel, Goosmann, Griesinger, Kolbe, Baker, Becker and Lange2012), membrane proteins embedded in their native lipid environments (Cady et al. Reference Cady, Schmidt-Rohr, Wang, Soto, Degrado and Hong2010; Wang et al. Reference Wang, Munro, Shi, Kawamura, Okitsu, Wada, Kim, Jung, Brown and Ladizhansky2013), or even the molecular composition of bones (Chow et al. Reference Chow, Rajan, Muller, Reid, Skepper, Wong, Brooks, Green, Bihan, Farndale, Slatter, Shanahan and Duer2014). Thus, molecular engineering approaches can have a profound impact on the assembly of homogeneous, isotopically labeled and yet native substrates for structural investigation by in vitro NMR. Also, uniquely suited to probe structure and dynamics in the cellular milieu (Burz et al. Reference Burz, Dutta, Cowburn and Shekhtman2006; Frederick et al. Reference Frederick, Michaelis, Corzilius, Ong, Jacavone, Griffin and Lindquist2015; Inomata et al. Reference Inomata, Ohno, Tochio, Isogai, Tenno, Nakase, Takeuchi, Futaki, Ito, Hiroaki and Shirakawa2009; Sakakibara et al. Reference Sakakibara, Sasaki, Ikeya, Hamatsu, Hanashima, Mishima, Yoshimasu, Hayashi, Mikawa, Walchli, Smith, Shirakawa, Guntert and Ito2009), NMR spectroscopy can benefit tremendously from chemical and molecular biology techniques that allow the specific isotopic labeling of macromolecules in the cell.

3. Molecular engineering toolbox for complex biological samples

Before we delve into the chemistry, it is important to note that the methods described below complement the well-established molecular biology framework that allows the manipulation of protein sequences at the genetic level. Such manipulations can now be achieved in several different organisms ranging from bacteria (Escherichia coli, Lactococcus lactis), yeast (Saccharomyces cerevisiae, Pichia pastoris), insect cells, and stable mammalian expression cell lines such as HEK293 and CHO. Thus, we will start with review of the methodologies that can selectively modify natural amino acids introduced at specific protein positions with site-directed mutagenesis. We will then describe tools that can be used to ligate modified peptides and proteins into longer polypeptide chains, including native chemical ligation (NCL), inteins and transpeptidases. We will then discuss the molecular engineering toolbox afforded by incorporation of unnatural amino acids by amber suppression. These chemical and genetic tools give chemists the ability to position bio-orthogonal reactive handles into polypeptide chains with extraordinary precision and control, and we will end this section with discussion of bioconjugation approaches that take advantage of this rapidly evolving expertise.

3.1 Cysteine chemistry

The field of protein chemistry would have been very different in a world without cysteines. Indeed, many of the protein chemistry tools that exist today have been developed to exploit the reactivity of the cysteine sulfhydryl group that uniquely stands out among a sea of other protein side-chains. Cysteines are relatively rare in Nature (<2% abundance), and their high nucleophilicity makes them good candidates for the development of selective chemical reactions that work well in protein compatible conditions (aqueous solution, physiological pH and temperature). Generally, these reactions can be divided into three types: (1) cysteine alkylations, (2) oxidations and (3) desulfurization reactions, each providing a unique way to exploit the reactivity of this amino acid and a pathway to build proteins with distinct and desirable properties (reviewed in (Chalker et al. Reference Chalker, Bernardes, Lin and Davis2009; Spicer & Davis, Reference Spicer and Davis2014)) (Fig. 2).

Fig. 2. Chemical modification of cysteine residues.

Alkylation reactions have long been used to modify cysteine containing proteins and rely on familiar reagents such as iodoacetamide and maleimide. By careful control of the buffer pH, these electrophiles can selectively react with the cysteine sulfhydryl group and tether additional functionalities to the protein of interest. Iodoacetamides and other α-halocarbonyls, for example, have been used to attach carbohydrate moieties to proteins and create glycoprotein mimics (Macmillan et al. Reference Macmillan, Bill, Sage, Fern and Flitsch2001; Tey et al. Reference Tey, Loveridge, Swanwick, Flitsch and Allemann2010). Maleimides, on the other hand, are commercially available and user-friendly means to add spectroscopic probes to proteins, including fluorescent, electron paramagnetic resonance (EPR) and paramagnetic relaxation enhancement (PRE) labels. Aminoethylation of cysteines also provides a cheap and convenient way to generate lysine methylation analogs, available in the mono-, di- and tri-methylation states (Simon et al. Reference Simon, Chu, Racki, De La Cruz, Burlingame, Panning, Narlikar and Shokat2007). In recent years, the chemist's attention has turned toward the development of more advanced cysteine bioconjugation protocols that circumvent, for example, the reversibility of maleimide additions in the presence of external bases and thiols (Lyon et al. Reference Lyon, Setter, Bovee, Doronina, Hunter, Anderson, Balasubramanian, Duniho, Leiske, Li and Senter2014), or provide the opportunity to conjugate additional functionalities (such as aryl groups) under an expanded range of reaction conditions (Vinogradova et al. Reference Vinogradova, Zhang, Spokoyny, Pentelute and Buchwald2015).

Cysteine oxidation in the protein context is usually associated with the formation of disulfide bonds, a unique structural transformation with often dramatic consequences to protein function. In the test tube, disulfide bonds are very easy to build – all that is required is basic pH and exposure to air. Furthermore, the reaction is fast and does not require large excess of reagents. Therefore, it comes as no surprise that protein chemists often exploit disulfide bonds as means to attach useful functionalities to proteins of interest. The challenging aspect of such protocols is to ensure that only the desired disulfide bonds are formed and the final product is not a mixture of homo- and heterodimer species. A common strategy used to alleviate this problem relies on controlled cysteine activation and disulfide exchange based on the lower pKa of aromatic thiols. In this case, a selected cysteine side-chain can be activated (and protected) at low pH with the aromatic thiol, and upon addition of the other thiol-containing component and increase in pH, the aromatic disulfide bond is exchanged with the desired connectivity (Pollack & Schultz, Reference Pollack and Schultz1989; Rabanal et al. Reference Rabanal, Degrado and Dutton1996). This strategy has found diverse applications ranging from the design of cytochrome peptides (Rabanal et al. Reference Rabanal, Degrado and Dutton1996) to the tethering of ubiquitin moieties to various proteins (Chatterjee et al. Reference Chatterjee, Mcginty, Fierz and Muir2010; Chen et al. Reference Chen, Ai, Wang, Haracska and Zhuang2010a; Meier et al. Reference Meier, Abeywardana, Dhall, Marotta, Varkey, Langen, Chatterjee and Pratt2012). Other activating molecules include methanethiosulfonates, glycosyl and allylic thiols with applications in protein glycosylation, prenylation and sulfenylation (Gamblin et al. Reference Gamblin, Van Kasteren, Bernardes, Chalker, Oldham, Fairbanks and Davis2008; Grayson et al. Reference Grayson, Ward, Hall, Rendle, Gamblin, Batsanov and Davis2005; van Kasteren et al. Reference Van Kasteren, Kramer, Jensen, Campbell, Kirkpatrick, Oldham, Anthony and Davis2007).

While the formation of disulfide bonds is fast and facile, they are just as easily destroyed in the presence of common biochemical reducing reagents or in the cellular environment where glutathione is present at high concentrations. If compatible linkages are desired, it is possible to convert the disulfide to a thioether bond, which is stable under reducing conditions. This is essentially a desulfurization reaction that proceeds with the formation of a dehydroalanine intermediate. Dehydroalanines, on the other hand, can be useful stepping stones to a vast number of protein PTMs, reactions that will be discussed in more detail in Section 3.7. Another key desulfurization reaction involves the conversion of cysteine to alanine. This transformation, discussed in more detail in Section 3.3, is particularly important for NCL, as it allows the construction of polypeptide chains without cysteine ‘scars’.

Cysteines are relatively rare in polypeptide chains and are usually essential for protein function. Thus, the outstanding challenge for the protein chemist is to find new approaches and reaction conditions that target only the desired residues in a polypeptide chain. Many of these efforts are focused on the search of suitable peptide sequences that can provide the necessary amino acid context for tuning side-chain reactivity. For example, Tsien and co-workers have developed tetra-cysteine motifs that can be selectively targeted with biarsenic reagents, including in living cells (Griffin et al. Reference Griffin, Adams and Tsien1998). Pentelute and co-workers, on the other hand, have reported the so-called ‘π-clamp’ sequence (Phe–Pro–Cys–Phe) that reacts preferentially with perfluoroaromatic moieties in aqueous solvents (Zhang et al. Reference Zhang, Welborn, Zhu, Yang, Santos, Van Voorhis and Pentelute2016a). Excellent selectivity can be obtained by replacing cysteine with selenocysteine (reviewed in (Metanis et al. Reference Metanis, Beld, Hilvert and Rappoport2009; Yoshizawa & Bock, Reference Yoshizawa and Bock2009), a rare natural amino acid that shares many of the desirable properties of the cysteine side-chain, yet is more acidic and substantially more reactive at lower pH (selenol pKa is ~5·2 versus 8·3 for thiols).

3.2 Chemical modification of other amino acids

In addition to cysteine, several other natural amino acids present functional groups that can be targeted for protein modification (reviewed in (Basle et al. Reference Basle, Joubert and Pucheault2010; Spicer & Davis, Reference Spicer and Davis2014)). These include lysine, tyrosine, arginine, glutamate, aspartate, serine, threonine, methionine, histidine and tryptophan side-chains, as well as N-terminal amines or C-terminal carboxyls (Basle et al. Reference Basle, Joubert and Pucheault2010; Hu et al. Reference Hu, Berti and Adamo2016; Lin et al. Reference Lin, Yang, Jia, Weeks, Hornsby, LEE, Nichiporuk, Iavarone, Wells, Toste and Chang2017). The modification of these residues is less precise as they are significantly more abundant than cysteine, yet, some selectivity can be achieved in a context-dependent manner. The primary amines of lysine side-chains are often popular targets due to favorable reaction kinetics that can be achieved with activated esters such as N-hydroxysuccinimide (Kalkhof & Sinz, Reference Kalkhof and Sinz2008) (Fig. 3a ), isothiocyanates (Nakamura et al. Reference Nakamura, Kawai, Kitamoto, Osawa and Kato2009), or aldehydes in a reductive alkylation reaction with sodium cyanoborohydride (Jentoft & Dearborn, Reference Jentoft and Dearborn1979; McFarland & Francis, Reference Mcfarland and Francis2005). Since these reactions usually modify all accessible lysine side-chains, they can be used in applications that require multiple modifications (e.g. therapeutic protein conjugates) or in protein cross-linking for mass spectrometry analysis (Holding, Reference Holding2015). An example of a more discriminating lysine-based modification strategy involves the 6π-aza-electrocyclization reaction with unsaturated aldehyde esters that targets solvent-accessible lysine residues with excellent selectivity and reaction kinetics, and has been used for the attachment of fluorescent or positron emission tomography probes (Tanaka et al. Reference Tanaka, Kitadani and Fukase2011). Yet, further selectivity can be achieved by discriminating the lower pKa of N-terminal amino groups (~8) from the pKa of the ε-amine of a lysine side-chain (~10·5). The identity of the N-terminal amino acid may further change the reactivity of the α-amine, although more general modification strategies have been developed. For example, functionalized ketenes can preferentially react with the α-amine in the context of 13 different N-terminal amino acids (Chan et al. Reference Chan, Ho, Chong, Leung, Huang, Wong and Che2012), while 2-pyridinecarboxyaldehydes provide efficient and specific N-terminal labeling for all amino acids except proline (MacDonald et al. Reference Macdonald, Munch, Moore and Francis2015) (Fig. 3b ). The unique reactivity of the N-terminus is also central to the mechanism of native chemical ligation, discussed in Section 3.3.

Fig. 3. Chemical modification of natural amino acids. (a) Modification of lysine ε-amines with activated esters such as N-hydroxysuccinimide. (b) Modification of terminal α-amines with 2-pyridinecarboxyaldehydes. (c) Three-component Mannich reaction for tyrosine modification at the ortho-position. (d) Coupling of carboxyls and amines with carbodiimides such as EDC.

A set of chemical reactions has also been developed for the specific modification of the aromatic electron-rich tyrosine side-chain. The selectivity of tyrosine-focused reactions usually exploits the low exposure of this residue on native protein surfaces and its susceptibility to oxidation (ElSohly & Francis, Reference Elsohly and Francis2015; Seim et al. Reference Seim, Obermeyer and Francis2011). In particular, modification of the ortho-position can be achieved with diazonium salts (Schlick et al. Reference Schlick, Ding, Kovacs and Francis2005) or the three-component Mannich-type reaction with aldehydes and anilines (Joshi et al. Reference Joshi, Whitaker and Francis2004; McFarland et al. Reference Mcfarland, Joshi and Francis2008) (Fig. 3c ). While some side-reactions have been known to occur, it has been possible to optimize the selectivity of the Mannich-type reaction to achieve efficient modification of a single tyrosine residue with an EPR spin label in the presence of disulfide and tryptophan functionalities (Mileo et al. Reference Mileo, Etienne, Martinho, Lebrun, Roubaud, Tordo, Gontero, Guigliarelli, Marque and Belle2013). Proteins containing few surface exposed tyrosine residues can also be modified at low concentrations (5 µM) in aqueous solvents with π-allylpalladium reagents (Tilley & Francis, Reference Tilley and Francis2006).

Carboxyl groups in glutamate and aspartate side-chains can be targeted with water-soluble carbodiimides such as N-ethyl-3-N′-N′-dimethylaminopropylcarbodiimide (EDC) (Fig. 3 d). In what is essentially a standard peptide coupling reaction, this reagent pairs carboxyls and amines in a covalent amide bond under aqueous conditions in a pH-dependent manner (Gilles et al. Reference Gilles, Hudson and Borders1990). Commonly employed as a cross-linking reagent for the identification of protein–protein interactions in biochemical assays and mass spectrometry, EDC can also be used to modify protein assemblies such as viral capsids with a variety of functional and biophysical probes (Schlick et al. Reference Schlick, Ding, Kovacs and Francis2005). While EDC-based reactions do not discriminate between side-chain and C-terminal carboxyl groups, unique reactivity at the C-terminus can be generated by replacing the carboxyl functionality with a thioester, as discussed in the sections below.

3.3 Native chemical ligation

The unique reactivity of the cysteine side-chain and the positional control afforded by the protein N- and C-termini are at the heart of a simple, yet powerful, chemical reaction that allows the construction of native polypeptide chains from peptide-building blocks. This methodology, called NCL (Dawson et al. Reference Dawson, Muir, Clark-Lewis and Kent1994), grants unprecedented control over the chemical functionalities that can be introduced in a polypeptide chain and is a fundamental tool in the repertoire of protein chemists. NCL simply requires two components with the following properties: (1) a peptide with a C-terminal thioester, and (2) a peptide with an N-terminal cysteine residue (or functional equivalent). When these components are mixed, the cysteine side-chain attacks the thioester of the other peptide, resulting in the formation of an intermolecular thioester (Fig. 4). This intermediate rapidly and irreversibly rearranges to form a native peptide bond, thus ligating the two fragments. Since both peptides can be produced by solid-phase peptide synthesis, virtually any modified natural or unnatural amino acid, spectroscopic probe, cross-linker or isotopic label can be incorporated at a well-defined position into the ligated sequence (Dawson & Kent, Reference Dawson and Kent2000). Moreover, in contrast to biosynthetic approaches such as amber suppression (see Section 3.6), there is no practical restriction on the number, or type, of unnatural amino acids that can be introduced – this point is driven home by the total synthesis of enantiomeric proteins composed of all D-amino acids (Mandal et al. Reference Mandal, Uppalapati, Ault-Riche, Kenney, Lowitz, Sidhu and Kent2012). The remarkable chemoselectivity of the NCL reaction means that ligations can be performed under aqueous conditions in the presence of internal cysteine residues. NCL is also compatible with protein denaturants or detergents allowing the construction of aggregation prone polypeptide chains or even polytopic membrane proteins (Hejjaoui et al. Reference Hejjaoui, Butterfield, Fauvet, Vercruysse, Cui, Dikiy, Prudent, Olschewski, ZHANG, Eliezer and Lashuel2012; Kwon et al. Reference Kwon, Tietze, White, Liao and Hong2015; Valiyaveetil et al. Reference Valiyaveetil, Leonetti, Muir and Mackinnon2006). The fragment bearing the N-terminal cysteine can also be produced recombinantly, usually preceded by a cleavable tag or fusion protein to avoid N-terminal cysteine processing complications in bacteria. Alternatively, the thioester component can be generated recombinantly with the help of proteins called inteins (Muir et al. Reference Muir, Sondhi and Cole1998) (see Section 3.4). It is also possible to perform sequential NCL reactions, allowing three or more building blocks to be assembled in a regioselective fashion (Mandal et al. Reference Mandal, Uppalapati, Ault-Riche, Kenney, Lowitz, Sidhu and Kent2012; Torbeev & Hilvert, Reference Torbeev and Hilvert2013). These advances thus allow the construction of considerably larger polypeptide chains that would be accessible from two synthetic peptides alone.

Fig. 4. Native chemical ligation at cysteine followed by desulfurization to alanine for the construction of larger polypeptide chains without any ‘scars’.

While NCL results in a native peptide bond, the required cysteine may still produce an unwanted ‘scar’ at the ligation junction – i.e. in cases where the target does not contain a native cysteine at an appropriate ligation point. If the final ligation product does not contain other cysteine residues, then the undesirable cysteine can be converted to alanine in a subsequent desulfurization step. Options for desulfurization reactions include reduction with metals or radical desulfurization mediated by the familiar tris(2-carboxyethyl)phosphine (TCEP) reagent (Dawson, Reference Dawson2011; Wan & Danishefsky, Reference Wan and Danishefsky2007; Yan & Dawson, Reference Yan and Dawson2001). Selective desulfurization reactions are also possible in the presence of other cysteine residues, provided that compatible protecting groups are used on the non-targeted cysteine side-chains (Ficht et al. Reference Ficht, Payne, Brik and Wong2007; Pentelute & Kent, Reference Pentelute and Kent2007). Alternatively, selective ligation and desulfurization of selenocysteine can provide additional sequence positional control (Hondal et al. Reference Hondal, Nilsson and Raines2001; Reddy et al. Reference Reddy, Dery and Metanis2016). Today, desulfurization has become a routine part of NCL protocols and the abundant alanine residue is a commonplace choice for a ligation site. To expand the junction amino acid set, a more advanced strategy involves the incorporation of β- and γ-thio amino acids. These moieties replace the cysteine-like residue at the N-terminal position, and provide a reactive thiol for trans-thioesterification. Desulfurization protocols can then produce the desired native side-chain. While ligations at such sites proceed more slowly due to increased steric hindrance, NCL can now be performed at phenylalanine (Crich & Banerjee, Reference Crich and Banerjee2007), valine (Chen et al. Reference Chen, Wan, Yuan, Zhu and Danishefsky2008; Haase et al. Reference Haase, Rohde and Seitz2008), leucine (Harpaz et al. Reference Harpaz, Siman, Kumar and Brik2010; Tan et al. Reference Tan, Shang and Danishefsky2010), threonine (Chen et al. Reference Chen, Wang, Zhu, Wan and Danishefsky2010b), lysine (El Oualid et al. Reference El Oualid, Merkx, Ekkebus, Hameed, Smit, De Jong, Hilkmann, Sixma and Ovaa2010; Kumar et al. Reference Kumar, Haj-Yahya, Olschewski, Lashuel and Brik2009; Yang et al. Reference Yang, Pasunooti, Li, Liu and Liu2009), proline (Shang et al. Reference Shang, Tan, Dong and Danishefsky2011), glutamine (Siman et al. Reference Siman, Karthikeyan and Brik2012), arginine (Malins et al. Reference Malins, Cergol and Payne2013), tryptophan (Malins et al. Reference Malins, Cergol and Payne2014), aspartate (Thompson et al. Reference Thompson, Chan, Radom, Jolliffe and Payne2013), glutamate (Cergol et al. Reference Cergol, Thompson, Malins, Turner and Payne2014) and asparagine (Sayers et al. Reference Sayers, Thompson, Perry, Malins and Payne2015). Work also continues on more streamlined ligation/desulfurization approaches that remove purification steps and increase the yield of ligated products (Moyal et al. Reference Moyal, Hemantha, Siman, Refua and Brik2013; Thompson et al. Reference Thompson, Liu, Alonso-Garcia, Pereira, Jolliffe and Payne2014).

The preceding discussion highlights just a few of the many refinements and extensions the NCL strategy has undergone since its introduction over 20 years ago (Harmand et al. Reference Harmand, Murar and Bode2014; Malins & Payne, Reference Malins and Payne2015). As a consequence of this massive effort, the technique has become a central tool in protein science, having been applied to literally hundreds of protein targets. Of particular relevance here, it has provided the raw materials for numerous structural biology studies that have employed a broad range of spectroscopic or crystallographic methods (Fig. 5) (Grosse et al. Reference Grosse, Essen and Koert2011; Kent et al. Reference Kent, Sohma, Liu, Bang, Pentelute and Mandal2012; Muralidharan & Muir, Reference Muralidharan and Muir2006). In general, the ability to modify any atom in the protein of interest with the precision afforded by synthetic organic chemistry is enormously powerful for dissecting protein function, especially when combined with high-resolution structural approaches such as NMR spectroscopy and X-ray crystallography. Thus, we imagine that NCL will continue to evolve as an approach and be integrated into structural biology campaigns.

Fig. 5. Examples of constructs prepared by NCL and EPL for X-ray crystallography studies. (a) D-alanine was introduced at position 77 in the sequence of the potassium channel KcsA to elucidate its ion selectivity mechanism (Valiyaveetil et al. Reference Valiyaveetil, Leonetti, Muir and Mackinnon2006) (PDB ID: 2IH3). (b) Acetylated lysine (Ac) was incorporated at postions 401 and 408 in S-Adenosylhomocysteine hydrolase (SAHH) to evaluate the structural basis of enzyme inhibition (Wang et al. Reference Wang, Kavran, Chen, Karukurichi, Leahy and Cole2014b) (PDB ID: 4PFJ). (c) Chemical synthesis of HIV protease afforded the site-specific incorporation of unnatural amino acids such as 2-aminoisobutyric acid to modulate conformational dynamics and catalysis (Torbeev et al. Reference Torbeev, Raghuraman, Hamelberg, Tonelli, Westler, Perozo and Kent2011) (PDB ID: 3IAW). (d) Semi-synthesis of Mxe GyrA and the installation of β-thienyl-alanine instead of the native histidine at position 187 provided a route to trap the branched intermediate of the intein (Liu et al. Reference Liu, Frutos, Bick, Vila-Perello, Debelouchina, Darst and Muir2014b) (PDB ID: 4OZ6).

3.4 Inteins

Inteins (intervening proteins) are a peculiar group of proteins that can excise themselves from a larger precursor polypeptide chain, a process that leads to the formation of a native peptide bond between the flanking extein (external protein) fragments. This auto-processing event, called protein splicing, is analogous to the self-splicing of RNA introns and is spontaneous, i.e. it does not require external factors or ATP. Since they were first discovered in the early 1990s (Hirata et al. Reference Hirata, Ohsumk, Nakano, Kawasaki, Suzuki and Anraku1990; Kane et al. Reference Kane, Yamashiro, Wolczyk, Neff, Goebl and Stevens1990), thousands of putative intein domains have been identified in the genomes of many unicellular organisms and viruses, with some containing multiple inteins in their genomes or even within the same gene (Perler, Reference Perler2002; Shah & Muir, Reference Shah and Muir2014). A small fraction of the known inteins has an even more curious property – the intein is split into two fragments, each fused to a separate extein fragment (the N- and C-exteins) (Wu et al. Reference Wu, Hu and Liu1998). These intein fragments, called split inteins, are transcribed and translated separately, and upon a spontaneous and non-covalent association in the cellular milieu, they carry out protein splicing in trans to unite the extein fragments into a single polypeptide chain. While it is known that many inteins are embedded within essential protein genes (such as DNA or RNA polymerase, ribonucleotide reductase or metabolic enzymes), their evolutionary origins and biological significance remain mysterious, and only a small percentage of the identified intein domains have been carefully characterized (Pietrokovski, Reference Pietrokovski2001; Shah & Muir, Reference Shah and Muir2014). Despite these big gaps in our knowledge, the unique reactivity of inteins has turned them into a versatile and transformative tool in protein chemistry and chemical biology. For a detailed overview of intein applications, we refer the interested reader elsewhere (Shah & Muir, Reference Shah and Muir2014; Topilina & Mills, Reference Topilina and Mills2014; Volkmann & Mootz, Reference Volkmann and Mootz2013; Wood & Camarero, Reference Wood and Camarero2014). Here, we will focus on aspects of intein function that would be of use to the structural biologist looking to install site-specific PTMs, segmentally label proteins with NMR isotopes, or aid the purification of recombinant polypeptides. Inteins have come a long way since their first applications in structural biology (Xu et al. Reference Xu, Ayers, Cowburn and Muir1999; Yamazaki et al. Reference Yamazaki, Otomo, Oda, Kyogoku, Uegaki, Ito, Ishino and Nakamura1998), so we will end this section with a discussion of the current members of the intein toolbox and research directions taken to circumvent their limitations.

3.4.1 The intein splicing mechanism

Despite the low-sequence homology of known intein domains, they share a common protein splicing mechanism that relies on several conserved residues in the intein/extein polypeptide (Fig. 6a ) (Volkmann & Mootz, Reference Volkmann and Mootz2013). One of these key residues is a cysteine (or in some cases a serine) at position 1 of the intein sequence. This nucleophilic side-chain attacks the amide carbon of the N-extein at position −1 (Fig. 6b ) resulting in an N to S(O) acyl shift and the formation of a linear thio(oxy)ester intermediate. This intermediate is subject to a nucleophilic attack by a side-chain (cysteine, serine or threonine) at position +1 on the C-extein leading to trans-(thio)esterification and the generation of a branched intermediate. The branched intermediate is resolved through the cyclization of the C-terminal asparagine of the intein and results in intein excision from the polypeptide chain. Next, the spliced exteins quickly undergo an S(O) to N acyl shift to form a native peptide bond (i.e. identical to the last step in NCL). The protein splicing mechanism is facilitated by several conserved threonine and histidine residues occupying strategic positions in the intein structural fold (Frutos et al. Reference Frutos, Goger, Giovani, Cowburn and Muir2010). The efficiency and kinetics of the splicing mechanism may also depend on the identity of the residues immediately flanking the intein placing important constraints on the choice of ligation junction (Cheriyan et al. Reference Cheriyan, Pedamallu, Tori and Perler2013; Iwai et al. Reference Iwai, zuger, Jin and Tam2006; Shah et al. Reference Shah, Dann, Vila-Perello, Liu and Muir2012).

Fig. 6. Intein structure and mechanism. (a) Intein/extein residues important for splicing. (b) Protein splicing mechanism of contiguous inteins. In some cases, the hydroxyl groups of Ser/Thr act as nucleophiles in the first two steps.

3.4.2 Applications in protein engineering

Expressed protein ligation (EPL) is an extension of NCL that employs a contiguous intein to recombinantly generate a protein bearing a C-terminal thioester (Muir et al. Reference Muir, Sondhi and Cole1998). In this case, the N-extein is fused to a modified intein construct lacking the ability to perform trans-thioesterification. Instead, this step is performed by an exogenously added thiol, resulting in cleavage of the N-extein α-thioester intermediate (Fig. 7a ). The resultant thioester can be used in NCL reactions as described in Section 3.3, while the recombinant origin of this fragment allows the construction of much larger semi-synthetic proteins as compared to total chemical synthesis by NCL. The efficiency of thioester generation rests on the propensity of the intein to avoid unwanted side reactions that result in premature N-extein cleavage and hydrolyzed products, problems that can be alleviated by the use of streamlined EPL protocols (Vila-Perello et al. Reference Vila-Perello, Liu, Shah, Willis, Idoyaga and Muir2013). Alternatively, hydrolysis of the N-extein under slightly basic conditions can be exploited in the so-called tagless protein purification protocols (Batjargal et al. Reference Batjargal, Walters and Petersson2015; Guan et al. Reference Guan, Ramirez and Chen2013; Southworth et al. Reference Southworth, Amaya, Evans, Xu and Perler1999) (Fig. 7b ). In this case, the protein of interest is fused to an intein carrying the appropriate mutations and a suitable purification tag. The tag can be used for affinity column enrichment of the construct, followed by increase in the buffer pH. This results in the release of the tagless protein while the intein remains on the column.

Fig. 7. Protein engineering with inteins. (a) Expressed protein ligation. (b) Tagless protein purification. (c) Protein trans-splicing and recombinant production of segmentally isotopically labeled proteins.

Harnessing the protein trans-splicing (PTS) process mediated by split inteins offers an alternative approach to the ligation of polypeptide building blocks (Fig. 7c ). Natural split inteins are especially attractive in this regard due to the extremely high affinity between the fragments (Shah et al. Reference Shah, Vila-Perello and Muir2011, Reference Shah, Eryilmaz, Cowburn and Muir2013) – this renders the ligation reaction less dependent on reagent concentration as compared to strictly chemical processes like NCL/EPL. Using orthogonal split intein pairs, it is also possible to perform one-pot three-piece ligations (Carvajal-Vallejos et al. Reference Carvajal-Vallejos, Pallisse, Mootz and Schmidt2012; Shah et al. Reference Shah, Vila-Perello and Muir2011; Shi & Muir, Reference Shi and Muir2005) resulting in the regiospecific assembly of the associated extein building blocks. While most natural split inteins have N- and C-fragments that are relatively large, it is possible to generate artificially split inteins that are as short as six or eleven residues (Appleby et al. Reference Appleby, Zhou, Volkmann and Liu2009; Ludwig et al. Reference Ludwig, Pfeiff, Linne and Mootz2006). There is also an efficient natural split intein pair (AceL–TerL) where the N-intein fragment is only 25 amino acids long (Thiel et al. Reference Thiel, Volkmann, Pietrokovski and Mootz2014). Thus, it is now possible to use PTS with both synthetic or recombinant intein fragments and to install a wide range of N- or C-terminal chemical modification, including biophysical probes. One of the most important applications of PTS in structural biology is protein segmental isotopic labeling discussed in Section 4.2. Other applications include the cyclization of proteins and peptides (Lennard & Tavassoli, Reference Lennard and Tavassoli2014; Scott et al. Reference Scott, Abel-santos, Wall, Wahnon and Benkovic1999), conditional protein splicing (Mootz et al. Reference Mootz, Blum, Tyszkiewicz and Muir2003; Schwartz et al. Reference Schwartz, Saez, Young and Muir2007), and protein semi-synthesis in cells (David et al. Reference David, Vila-Perello, Verma and Muir2015).

3.4.3 Toward fast and promiscuous inteins

The first intein tools were introduced in the mid-to-late 1990s. One of the ‘early’ inteins, the 198-residue gyrase A intein from Mycobacterium xenopi (Mxe GyrA) (Southworth et al. Reference Southworth, Amaya, Evans, Xu and Perler1999), still used today, exhibits many desirable properties for applications in protein engineering – it is relatively small, can be efficiently expressed in E. coli, works in moderate concentrations of denaturants and its activity can be controlled with temperature. The first ‘natural’ split intein was discovered in 1998 in the cyanobacterium Synechocystis sp. strain PCC6803 (Ssp) where it was found to ligate two fragments of the catalytic subunit of DNA polymerase III (DnaE) (Wu et al. Reference Wu, Hu and Liu1998). This discovery paved the road to more efficient protein trans-splicing and opened the way to using split inteins in other applications such as the cyclization of proteins and peptides and the creation of large cyclized libraries for potential therapeutic applications (Scott et al. Reference Scott, Abel-santos, Wall, Wahnon and Benkovic1999).

While these first generation intein tools were certainly enabling (Vila-Perello & Muir, Reference Vila-Perello and Muir2010), they were not without their limitations – in retrospect, one could even say they were rather fussy and slow. For example, depending on the fusion partners, splicing (or thiolysis) could take hours to days and was often inefficient (Muralidharan & Muir, Reference Muralidharan and Muir2006). These ‘idiosyncrasies’ constrained the application of inteins in structural biology and fueled the search for faster, more promiscuous and efficient inteins. Several important discoveries in the mid-2000s challenged the view that all natural inteins are inefficient and slow. A genomic study of cyanobacterial genes expanded the DnaE intein family (Caspi et al. Reference Caspi, Amitai, Belenkiy and Pietrokovski2003) and the characterization of one newly discovered member, the DnaE intein from Nostoc punctiforme (Npu) revealed a few surprises. This split intein could perform protein trans-splicing reactions in vitro on a minute timescale and was much more tolerant to sequence deviations on the attached exteins than Ssp ((Iwai et al. Reference Iwai, zuger, Jin and Tam2006; Zettler et al. Reference Zettler, Schutz and Mootz2009). Now we know that many members of the DnaE family are fast (Shah et al. Reference Shah, Dann, Vila-Perello, Liu and Muir2012), thus greatly expanding the choice of natural intein tools for the efficient generation of protein α-thioesters for EPL or the ligation of protein fragments in PTS (Table 1). Furthermore, efficient natural split inteins that are not part of the DnaE family have also been discovered and these include the gp41-1 and gp41-8 inteins (with insertion sites in the gp41 DNA gyrase gene), the IMPDH-1 intein (splitting a gene coding for inosine-5′-monophosphate dehydrogenase), the NrdJ intein (splitting the gene coding for the ribonucleotide reductase subunit NrdJ) (Carvajal-Vallejos et al. Reference Carvajal-Vallejos, Pallisse, Mootz and Schmidt2012), and the AceL–TerL pair (discovered in metagenomics data from the antarctic permanently stratified saline Ace Lake) (Thiel et al. Reference Thiel, Volkmann, Pietrokovski and Mootz2014). These proteins bring more diversity to the intein molecular engineering toolbox, including splicing rates that are an order of magnitude faster than those for NpuDnaE; N- and C-intein fragments that are relatively short and can be made by peptide synthesis rather than recombinantly; the option of utilizing serine instead of cysteine at the +1 position; and the possibility of exploiting orthogonality in one-pot multi-piece ligations.

Table 1. Intein toolbox for protein semi-synthesis

* Optimal splicing kinetics in the presence of native sequences at the immediate intein-extein junctions. Variation from this sequence context can lead to less efficient splicing. See also ref. (Shah & Muir, Reference Shah and Muir2014).

Careful biochemical characterization of newly discovered inteins has provided insights into the principles governing fast splicing and extein tolerance. For example, a batch mutagenesis approach that compared the slow split intein DnaE family member Ssp and the fast Npu split intein revealed that speed is determined by a handful of ‘accelerator’ residues located in the second shell of the folded protein, adjacent to the intein active site (Stevens et al. Reference Stevens, Brown, Shah, Sekar, Cowburn and Muir2016). These residues were used as a filter in an informatics analysis of the DnaE sequence database, leading to the identification of several dozens of other split inteins predicted to support ultrafast splicing. A consensus split intein sequence, termed Cfa, was then derived from this putative fast set and was found to possess quite remarkable properties; in addition to splicing faster than Npu at ambient conditions, Cfa is extremely robust, maintaining efficient activity at 80 °C or in the presence of up to 4 M guanidinium chloride or 8 M urea. As a result of these attributes, Cfa was found to be a superior tool for several PTS applications (Stevens et al. Reference Stevens, Brown, Shah, Sekar, Cowburn and Muir2016).

3.5 Sortases

Sortases are a class of cysteine transpeptidases responsible for the attachment of virulence proteins to the cell wall of Gram-positive bacteria (Mazmanian et al. Reference Mazmanian, Liu, Ton-That and Schneewind1999). They are also involved in the polymerization of pilin subunits to form the pilus structures responsible for bacterial attachment to the host and biofilm formation (Mandlik et al. Reference Mandlik, Swierczynski, Das and Ton-That2008). As important players in bacterial virulence, they have evolved to recognize a specific sorting sequence (LPXTG in the case of Staphylococcus aureus) and to attach the virulence factor to the cell wall using a pentaglycine cross-bridge (Ton-That et al. Reference Ton-That, Mazmanian, Faull and Schneewind2000). Naturally, sortases are of a considerable interest as drug targets, but they have also become an important and versatile protein engineering tool (Mao et al. Reference Mao, Hart, Schink and Pollok2004). Most sortase-based applications utilize the soluble fragment of wild-type or modified sortase A from S. aureus. These enzymes recognize the LPXTG motif and use their catalytic cysteine residue to cleave between the threonine and glycine backbone within the recognition sequence (Fig. 8). The cleavage reaction involves a thioacyl intermediate similar to the intermediates generated by cysteine proteases (Aulabaugh et al. Reference Aulabaugh, Ding, Kapoor, Tabei, Alksne, Dushin, Zatz, Ellestad and Huang2007). Unlike the water molecules employed by proteases, however, sortases use a nucleophilic attack from the N-terminus of an oligoglycine motif to create a peptide bond between the acyl donor and acceptor. This results in the ligation of polypeptide chains that are subsequently connected with a LPXT(G)₅ linker. The sortase mechanism also requires binding of Ca²⁺ to a dynamic loop of the enzyme, an event that slows down the loop motion and allows enough time for the substrate to find the catalytic site (Naik et al. Reference Naik, Suree, Ilangovan, Liew, Thieu, Campbell, Clemens, Jung and Clubb2006). Another peculiarity of sortase-based ligations is their reversibility: the generated ligation site has the recognition sequence LPXTG and can serve as an acyl donor, while the released fragment contains an aminoglycine acyl receptor. Thus, to obtain efficient ligations, the donor or acceptor polypeptide chain typically has to be added in large excess (Guimaraes et al. Reference Guimaraes, Witte, Theile, Bozkurt, Kundrat, Blom and Ploegh2013).

Fig. 8. C-terminal protein labeling with sortase. The acyl donor requires the LPXTG recognition motif, while the acyl acceptor often contains a pentaglycine sequence.

Analogous to intein technology development, sortases have considerably improved as protein engineering tools since their introduction in 2004 (Antos et al. Reference Antos, Truttmann and Ploegh2016). The sortase-based protein engineering toolbox now contains evolved variants that exhibit much faster kinetics or that eliminate Ca²⁺-dependence (albeit at the cost of slightly reduced enzyme activity) (Chen et al. Reference Chen, Dorr and Liu2011; Hirakawa et al. Reference Hirakawa, Ishikawa and Nagamune2015; Wuethrich et al. Reference Wuethrich, Peeters, Blom, Theile, Li, Spooner, Ploegh and Guimaraes2014). There are also alternatives based on S. aureus sortase A or homologs from other organisms that can recognize variations of the LPXTG motif and/or allow non-glycine amino acids as the acyl acceptor (Antos et al. Reference Antos, Truttmann and Ploegh2016; Dorr et al. Reference Dorr, Ham, An, Chaikof and Liu2014; Glasgow et al. Reference Glasgow, Salit and Cochran2016). To increase the yields of the ligation reaction, several clever strategies have been employed. In situations where the released aminoglycine peptide fragment is relatively small, it can be removed by dialysis or centrifugation while the reaction is proceeding (Freiburger et al. Reference Freiburger, Sonntag, Hennig, Li, Zou and Sattler2015). Affinity immobilization strategies or flow-based platforms have also been used for the selective removal of reaction components (Policarpo et al. Reference Policarpo, Kang, Liao, Rabideau, Simon and Pentelute2014; Warden-Rothman et al. Reference Warden-Rothman, Caturegli, Popik and Tsourkas2013). Alternatively, the equilibrium of the reaction can be controlled by ligation product or by-product deactivation. In the first case, a WTWTW motif was added to the donor and acceptor, and upon ligation this sequence promoted a stable hairpin at the ligation junction, rendering the site inaccessible for cleavage (Yamamura et al. Reference Yamamura, Hirakawa, Yamaguchi and Nagamune2011). In the latter case, the acyl donor glycine was chemically modified such that upon release, chemical rearrangements occurred on the by-product transforming it into a poor nucleophile (Liu et al. Reference Liu, Luo, Flora and Mezo2014a; Williamson et al. Reference Williamson, Webb and Turnbull2014).

One important advantage of sortase-based ligations is that the acyl donor and acceptor polypeptide chains can be very short (only the LPXTG tag is required on the donor, and the oligoglycine motif is necessary on the acceptor) and thus are easily accessible by solid-phase peptide synthesis. Therefore, N- and C-terminal labeling reactions of large proteins are relatively straightforward (assuming by-products are efficiently removed) (Guimaraes et al. Reference Guimaraes, Witte, Theile, Bozkurt, Kundrat, Blom and Ploegh2013; Theile et al. Reference Theile, Witte, Blom, Kundrat, Ploegh and Guimaraes2013). Larger polypeptides, on the other hand, can be expressed recombinantly with the appropriate donor and acceptor tags and the ligation reaction unites them in a single polypeptide chain with an LPXT(G)_n ‘scar’. In such cases it is recommended that the ligation junction is chosen on an unstructured region where it will not affect the function and/or fold of the protein and will be accessible to the sortase catalytic site (Guimaraes et al. Reference Guimaraes, Witte, Theile, Bozkurt, Kundrat, Blom and Ploegh2013). Such reactions can be applied to create polymers and cyclized polypeptides (van ‘t Hof et al. Reference Van ‘T Hof, Hansenova Manaskova, Veerman and Bolscher2015), or to stitch together domains into bifunctional or segmentally labeled proteins (Matsumoto et al. Reference Matsumoto, Furuta, Tanaka and Kondo2016; Williams et al. Reference Williams, Milbradt, Embrey and Bobby2016; Witte et al. Reference Witte, Cragnolini, Dougan, Yoder, Popp and Ploegh2012). Sortase-based labeling has also been used in the functionalization of solid supports, nanoparticles, antibodies or cell surfaces, as well as in the labeling of proteins in vivo. We refer the interested reader to several comprehensive reviews on the subject (Popp & Ploegh, Reference Popp and Ploegh2011; Ritzefeld, Reference Ritzefeld2014; Schmohl & Schwarzer, Reference Schmohl and Schwarzer2014).

The success of sortase-based ligations has stimulated efforts to discover other protein ligases with expanded capabilities. A promising candidate is butelase-1, which was isolated from the plant Clitoria ternatea (Nguyen et al. Reference Nguyen, Wang, Qiu, Hemu, Lian and Tam2014). Butelase-1 is the fastest ligase known with catalytic efficiencies as high as 542 000 M⁻¹ s⁻¹. Furthermore, it only requires the recognition sequence NHV on the acyl donor and produces ligation junctions with a minimal ‘scar’ (NX). Currently, the major limitation of this technology is that the enzyme is not available in recombinant form, and therefore has to be extracted and purified from the native plant (Nguyen et al. Reference Nguyen, Kam, Loo, Jansson, Pan and Tam2015). There are, however, evolutionary related ligases that may be more amenable to protein engineering approaches (Yang et al. Reference Yang, Wong, Nguyen, Tam, Lescar and Wu2017).

3.6 Genetic code expansion

After billions of years of evolution, Nature has engineered extraordinary functional diversity into proteins with only 20 amino acid building blocks. Yet, many of these building blocks are often modified post-translationally, clearly indicating the need of living organisms to enhance and modulate their protein repertoire with additional chemical functionalities. There are also organisms from all domains of life that can produce and incorporate other building blocks into their proteins. This includes selenocysteine, often called the 21st amino acid that provides a unique reactive site for precise tuning of biological function in cells. Interestingly, this amino acid is incorporated into proteins by a natural reassignment of the UGA stop codon coupled with the recognition of a specific structural element on the mRNA transcript known as the selenocysteine-insertion sequence (reviewed in (Metanis et al. Reference Metanis, Beld, Hilvert and Rappoport2009; Yoshizawa & Bock, Reference Yoshizawa and Bock2009)). Similarly, there are methane producing Archaea species that have evolved a specialized tRNA/aminoacyl-tRNA synthetase (tRNA/aaRS) pair to exploit the UAG stop codon and insert pyrrolysine site-specifically into certain methyltransferase proteins (Srinivasan et al. Reference Srinivasan, James and Krzycki2002). Exploiting the natural translation machinery, protein engineers have worked hard to ‘persuade’ living organisms to incorporate additional building blocks into polypeptide chains (reviewed in (Liu & Schultz, Reference Liu and Schultz2010)). One approach involves the use of cell lines that are auxotrophic for one of the 20 amino acids, for example methionine, and that will only grow when the missing amino acid is included in the culture medium. Replacing this amino acid with a close structural analog that is utilized by the wild-type aminoacyl-tRNA synthetase, results in incorporation of the unnatural amino acid (UAA) into overexpressed proteins. Since this results in global incorporation of the UAA, this approach is often used for the replacement of rare amino acids with their structural analogs. For example, methionine can be substituted with selenomethionine to provide a heavy atom for phasing crystallographic data (Barton et al. Reference Barton, Tzvetkova-Robev, Erdjument-Bromage, Tempst and Nikolov2006; Yang et al. Reference Yang, Hendrickson, Crouch and Satow1990).

A second approach involves the semi-synthesis of tRNAs that are pre-loaded with the UAA of interest (Hecht et al. Reference Hecht, Alford, Kuroda and Kitano1978; Noren et al. Reference Noren, Anthony-cahill, Griffith and Schultz1989). These tRNAs have been used in in vitro translation systems that bypass the need for a matching aaRS, and since the identity of the UAA is decoupled from the information content of the tRNA, any coding or blank codon can be used for reassignment (Cornish et al. Reference Cornish, Benson, Altenbach, Hideg, Hubbell and Schultz1994; Judice et al. Reference Judice, Gamble, Murphy, De Vos and Schultz1993; Koh et al. Reference Koh, Cornish and Schultz1997). While the semi-synthesis of acylated tRNAs can be technically challenging, more efficient production strategies have been developed. This includes flexizymes, flexible tRNA acylation ribozymes that accept a versatile range of aminoacyl substrates and tRNAs with different sequences (Goto et al. Reference Goto, Katoh and SUGA2011). Pre-loaded tRNAs can also be injected or transfected directly into living cells (England et al. Reference England, Zhang, Dougherty and Lester1999; Kohrer et al. Reference Kohrer, Yoo, Bennett, Schaack and Rajbhandary2003), although the success of such strategies has been limited by their short lifetimes in the cellular environment and challenges associated with in-cell delivery.

Today, UAA incorporation in living cells is almost exclusively performed following the strategy introduced by Peter Schultz and co-workers in 2001 (Wang et al. Reference Wang, Brock, Herberich and Schultz2001). This methodology, commonly referred to as amber suppression, relied on the development of an orthogonal tRNA/aaRS pair that could be expressed in E. coli and was used to incorporate O-methyl-L-tyrosine into dihydrofolate reductase with 99% fidelity. In the 15 years since this landmark study, the unnatural building block palette for genetic incorporation has grown more than 100 amino acids strong (Lang & Chin, Reference Lang and Chin2014; Liu & Schultz, Reference Liu and Schultz2010; Neumann-Staubitz & Neumann, Reference Neumann-Staubitz and Neumann2016). This includes amino acids carrying natural modifications (e.g. phosphoserine or acetyllysine), biophysical and structural probes, cross-linkers, reactive handles for bio-orthogonal reactions, and site-specific protein engineering functionalities that can modify their attendant proteins upon a specific cellular or chemical cue. Here, we describe the basic principles of the technology, review UAAs of particular interest to the structural biologist, and discuss current limitations and efforts to improve the efficiency of UAA incorporation.

3.6.1 Amber codon suppression in living cells

The successful incorporation of an UAA into a protein synthesized by a living cell requires several important considerations and components (Fig. 9a ). First and foremost, the UAA of interest must be chemically and metabolically stable, cell permeable or otherwise biosynthetically accessible in the cellular environment. It also must be tolerated by the ribosome and the cellular elongation factors without being recognized as a substrate by any of the endogenous synthetases. The UAA then requires its own unique codon, with the amber stop codon (UAG) being a popular choice due to its low occurrence in both prokaryotic and eukaryotic systems. The successful site-specific incorporation of the UAA, however, rests on the presence of a dedicated tRNA/aaRS pair that is highly specific for the UAA of interest, yet orthogonal in the context of all endogenous tRNA/aaRS pairs. Developing such pairs for a chemically diverse set of UAA is currently one of the time consuming and difficult steps of this technology. Since tRNA recognition by aaRS is often species specific, it is sometimes possible to import a heterologous pair into the cell of interest and use it as a starting point to build orthogonality and specificity into the system. For example, many UAA incorporation systems in E. coli are based on the heterologous tRNA^Tyr/TyrRS pair from Methanococcus jannaschii, while the tRNA^Tyr/TyrRS and tRNA^Leu/LeuRS pairs from E. coli have been used in eukaryotic cells (reviewed in (Chin, Reference Chin2014)). The tRNA^Pyr/PyrRS pair from methanogenic bacteria that can incorporate pyrrolysine has also been a very useful tool, as it is orthogonal in E. coli, yeast and mammalian cell lines, and has allowed the incorporation of many lysine-based UAAs, including acetyllysine (Neumann et al. Reference Neumann, Peak-Chew and Chin2008). While these systems provide a useful starting point, it is usually necessary to use mutagenesis and rounds of negative and positive selection to improve on the selectivity and orthogonality of the pair. Directed evolution approaches can also be used for the generation of de novo tRNA/aaRS pairs, or to expand the function of other components of the translational machinery (reviewed in (Chin, Reference Chin2014)). Once an appropriate tRNA/aaRS pair is developed, however, the practical implementation of amber suppression for the UAA is relatively straightforward. E. coli cells, for example, can be transformed with two plasmids: (1) a plasmid encoding the protein of interest and an appropriate point mutation with the amber TAG codon, and (2) a plasmid carrying the appropriate DNA sequence to produce the optimized tRNA/aaRS pair. After addition of UAA to the media, gene expression is induced for both plasmids and the UAA is incorporated into the protein of interest by the bacterial translational machinery. To separate the full-length protein from prematurely truncated species, often a purification tag is added to the protein C-terminus – notably, these can involve ‘silent’ intein- or sortase-based purification tags (Batjargal et al. Reference Batjargal, Walters and Petersson2015; Warden-Rothman et al. Reference Warden-Rothman, Caturegli, Popik and Tsourkas2013). Well-established protocols for UAA incorporation are now available for yeast (Hancock et al. Reference Hancock, Uprety, Deiters and Chin2010), mammalian (Chen et al. Reference Chen, Groff, Guo, Ou, Cellitti, Geierstanger and Schultz2009) and insect cells (Koehler et al. Reference Koehler, Sauter, Wawryszyn, Girona, Gupta, Landry, Fritz, Radic, Hoffmann, Chen, Zou, Tan, Galik, Junttila, Stolt-Bergner, Pruneri, GYENESEI, Schultz, Biskup, Besir, Benes, Rappsilber, Jechlinger, Korbel, Berger, Braese and Lemke2016; Mukai et al. Reference Mukai, Wakiyama, Sakamoto and Yokoyama2010b) and amber suppression can even be performed in multicellular organisms including C. elegans (Greiss & Chin, Reference Greiss and Chin2011), D. melanogaster (Bianco et al. Reference Bianco, Townsley, Greiss, Lang and Chin2012), mice (Ernst et al. Reference Ernst, Krogager, Maywood, Zanchi, Beranek, Elliott, Barry, Hastings and Chin2016; Kang et al. Reference Kang, Kawaguchi, Coin, Xiang, O'leary, Slesinger and Wang2013) and plants (Li et al. Reference Li, Zhang, Sun, Pan, Zhou and Wang2013b).

Fig. 9. Unnatural amino acid (UAA) incorporation by amber suppression. (a) An orthogonal aminoacyl tRNA synthetase charges a matching tRNA with the UAA of interest. The ribosome incorporates the UAA into a growing polypeptide chain by decoding the amber stop codon (UAG) on the messenger RNA. The UAA toolbox includes UAAs that represent (b) protein post-translational modifications, (c) spectroscopic probes, (d) cross-linkers, (e) bio-orthogonal reactive handles.

3.6.2 The amber suppression toolbox

The amber suppression toolbox contains many UAAs designed with structural biology applications in mind (Fig. 9b–e ). For example, heavy atoms can be incorporated site-specifically for solving the phase problem in X-ray crystallography – appropriate UAAs include p-iodo-L-phenylalanine (Xie et al. Reference Xie, Wang, Wu, Brock, Spraggon and Schultz2004) and 3-iodo-L-tyrosine (Sakamoto et al. Reference Sakamoto, Murayama, Oki, Iraha, Kato-Murayama, Takahashi, Ohtake, Kobayashi, Kuramitsu, Shirouzu and Yokoyama2009), as well as metal-ion chelating amino acids (Lee et al. Reference Lee, Spraggon, Schultz and Wang2009b). For NMR spectroscopists, amber suppression offers a relatively cheap and efficient way to install site-specific isotopic labels in otherwise unlabeled proteins. Many of the NMR ‘friendly’ UAAs are fluorinated derivatives that exploit the unique spectroscopic properties of ¹⁹F as a reporter of global protein folding and dynamics (Jones et al. Reference Jones, Cellitti, Hao, Zhang, Jahnz, Summerer, Schultz, Uno and Geierstanger2010; Yang et al. Reference Yang, Yu, Liu, Qu, Gong, Liu, Li, Wang, He, Yi, Song, Tian, Xiao, Wang and Sun2015). Similarly, amber suppression has been used to install nitroxide spin labels for distance measurements by EPR, thus overcoming the problems often associated with cysteine-based approaches (Park et al. Reference Park, Wang, Radoicic, De Angelis, Berkamp and Opella2015; Schmidt et al. Reference Schmidt, Fedoseev, Bucker, Borbas, Peter, Drescher and Summerer2015). More importantly, however, amber suppression is a living cell protein engineering tool; thus, it is ideally suited for NMR or EPR studies designed to follow the structural fate of proteins in the cellular milieu.

Of particular interest to the structural biologist are UAAs carrying natural PTMs. Currently, amber suppression can directly incorporate the following modifications: phosphoserine (Rogerson et al. Reference Rogerson, Sachdeva, Wang, Haq, Kazlauskaite, Hancock, Huguenin-Dezot, Muqit, Fry, Bayliss and Chin2015), acetyllysine (Neumann et al. Reference Neumann, Peak-Chew and Chin2008), several lysine acylations (Gattner et al. Reference Gattner, Vrabel and Carell2013; Kim et al. Reference Kim, Kang, Kim, Chatterjee and Schultz2012), phosphotyrosine (Fan et al. Reference Fan, Ip and Soll2016) and sulfonated tyrosine (Liu et al. Reference Liu, Brustad, LIU and Schultz2007). To expand this toolbox, however, amber suppression can be used to install a reactive handle at the position of interest, and then the appropriate modification can be chemically generated after protein purification. Using this strategy, for example, the UAA δ-thiol-lysine can be incorporated, followed by traceless attachment of a ubiquitin moiety with NCL (Virdee et al. Reference Virdee, Kapadnis, Elliott, Lang, Madrzak, Nguyen, Riechmann and Chin2011). Similarly, methylated lysines can be generated by the incorporation of a suitable pre-cursor UAA (Nguyen et al. Reference Nguyen, Garcia Alai, Virdee and Chin2010; Wang et al. Reference Wang, Zeng, Kurra, Wang, Tharp, Vatansever, Hsu, Dai, Fang and Liu2017). Phosphoserine, on the other hand, can serve as a starting point for the generation of dehydroalanine, which in turn can be converted into a number of modified side-chains (Wright et al. Reference Wright, Bower, Chalker, Bernardes, Wiewiora, Ng, Raj, Faulkner, Vallee, Phanumartwiwath, Coleman, Thezenas, Khan, Galan, Lercher, Schombs, Gerstberger, Palm-Espling, Baldwin, Kessler, Claridge, Mohammed and Davis2016).

Amber suppression also allows the facile installation of site-specific cross-linkers, a valuable tool for the identification of protein–protein and protein–ligand interactions both in vitro and in vivo. There are several options for UV-activatable cross-linkers that exploit different cross-linking mechanisms, and afford temporal and spatial control of the reaction. The oldest members of this toolbox are p-benzophenylalanine (Chin et al. Reference Chin, Martin, King, Wang and Schultz2002a) and p-azido-L-phenylalanine (Chin et al. Reference Chin, Santoro, Martin, King, Wang and Schultz2002b), both available for bacterial and eukaryotic systems, and extensively used for cross-linking experiments of purified proteins or in the cellular environment. More recently, diazirine-modified lysine-based amber suppression systems have been developed (Ai et al. Reference Ai, Shen, Sagi, Chen and Schultz2011; Chou et al. Reference Chou, Uprety, Davis, Chin and Deiters2011; Zhang et al. Reference Zhang, Lin, Song, Liu, Fu, Ge, Fu, Chang and Chen2011), exhibiting superior cross-linking efficiency, smaller structural perturbation effects and ideally suited for cross-linking experiments of lysine-rich proteins such as histones. UAAs that can cross-link proteins to nucleic acids are also available, and these include p-benzophenylalanine (Winkelman et al. Reference Winkelman, Vvedenskaya, Zhang, Zhang, Bird, Taylor, Gourse, Ebright and Nickels2016), and a furan-based cross-linker activated upon red-light irradiation (Schmidt & Summerer, Reference Schmidt and Summerer2013). Interestingly, p-azido-L-phenylalanine can also serve as an infrared (IR) spectroscopy probe, and as such can be used to report on the structural transitions of a protein along its activation pathway (Ye et al. Reference Ye, Zaitseva, Caltabiano, Schertler, Sakmar, Deupi and Vogel2010). Similarly, para-cyanophenylalanine can be installed as a site-specific and sensitive vibrational probe of ligand binding (Schultz et al. Reference Schultz, Supekova, Ryu, Xie, Perera and Schultz2006).

A significant fraction of the amber suppression UAA library has been designed for imaging and fluorescence-based biophysical applications. Some fluorescent UAAs can be incorporated directly into polypeptide chains and these include coumarin- and dansyl-based modifications (Kuhn et al. Reference Kuhn, Rubini, Muller and Skerra2011; Summerer et al. Reference Summerer, Chen, Wu, Deiters, Chin and Schultz2006), as well as environmentally sensitive aminonaphthalene-based probes available for both yeast and mammalian applications (Chatterjee et al. Reference Chatterjee, Guo, Lee and Schultz2013a; Lee et al. Reference Lee, Guo, Lemke, Dimla and Schultz2009a). More commonly, however, optical probes are attached site-specifically using bio-orthogonal reactions. In this case, amber suppression is used to install a reactive handle at a site in the polypeptide chain, while the fluorescent probe is modified with a compatible reactive functionality. The conjugation reaction can be carried out in vitro with purified components, or the optical label can be added to the media and/or delivered into cells for bio-orthogonal reaction chemistry within the cellular milieu. While this places important UAA-fluorescent label design constraints with respect to cell permeability, chemical stability and reaction kinetics, this approach allows the incorporation of optical labels that work at a variety of wavelengths, amid reduced background fluorescence. Currently, UAAs with a diverse set of functionalities for bio-orthogonal reactions are available (reviewed in (Lang & Chin, Reference Lang and Chin2014)), and it is likely that this list will become much more expansive in the future as new fast and specific bio-compatible approaches are developed (Section 3.7).

3.6.3 Limitations and future directions

The remarkable plasticity of the natural and evolved cellular protein synthesis machinery has allowed chemical biologists to create a large and diverse set of UAAs that can be incorporated into biological polymers assembled in vivo. Yet, the incorporation of many of these UAAs is inefficient, context dependent and essentially limited to the inclusion of a single modification per protein. Since UAA incorporation relies on the reassignment of natural stop codons, the suppressor tRNAs compete for binding sites with the endogeneous release factors that terminate translation. Therefore, truncations of the desired protein are often produced, resulting in compromised yields, complicated purification protocols and potential toxicity for the recombinant cell. In E. coli, translation termination involves release factor protein 1 (RF1) responsible for recognition of ochre (UAA) and amber (UAG) stop codons, and release factor protein 2 (RF2) that identifies ochre (UAA) and opal (UGA) stop codons. Thus, it has been possible to engineer bacterial strains that lack RF1 to enhance amber codon translation efficiency, and in particular improve the incorporation of the same UAA at multiple positions in the protein sequence (Johnson et al. Reference Johnson, Xu, Shen, Takimoto, Schultz, Schmitz, Xiang, Ecker, Briggs and Wang2011; Mukai et al. Reference Mukai, Hayashi, Iraha, Sato, Ohtake, Yokoyama and Sakamoto2010a). These strains, however, can be problematic as misincorporation of glutamine and codon skipping have also been reported (George et al. Reference George, Aguirre, Spratt, Bi, Jeffery, Shaw and O'donoghue2016). More dramatically, organisms can be genomically recoded to replace the UAG codon completely with the synonomous UAA stop codons, and free this new unique codon for more efficient amber suppression (Lajoie et al. Reference Lajoie, Rovner, Goodman, Aerni, Haimovich, Kuznetsov, Mercer, Wang, Carr, Mosberg, Rohland, Schultz, Jacobson, Rinehart, Church and Isaacs2013).

Protein yields are also affected by the lower efficiency of the evolved synthetases and suboptimal interactions of the tRNA with the wild-type elongation factors and ribosomes. These problems can be partially alleviated by the design of more efficient plasmid systems that have optimized promoters and can produce higher copy numbers of the synthetase and the tRNA. For example, the pEVOL and the pUltra plasmids have significantly increased incorporation efficiency in E. coli (Chatterjee et al. Reference Chatterjee, Sun, Furman, Xiao and Schultz2013b; Young et al. Reference Young, Ahmad, Yin and Schultz2010). For mammalian cells, protein expression is further affected by the transfection efficiency of the delivered constructs, thus it is usually desirable to incorporate all of the necessary genetic components (multiple copies of the synthetase and tRNA, gene of interest, engineered release factor, etc.) on the same plasmid ((Cohen & Arbely, Reference Cohen and Arbely2016). Baculovirus vectors that can deliver a large cargo of genetic material (>30 kb) to a variety of mammalian cells have also been developed for more efficient UAA incorporation (Chatterjee et al. Reference Chatterjee, Xiao, Bollong, Ai and Schultz2013c; Zheng et al. Reference Zheng, Lewis, Igo, Polleux and Chatterjee2016). To avoid the heterogeneous expression levels associated with transient transfection and viral transduction altogether, the creation of stable mammalian cell lines capable of defined UAA incorporation is highly desirable and efforts have already been undertaken in this regard (Elsasser et al. Reference Elsasser, Ernst, Walker and Chin2016; Tian et al. Reference Tian, Lu, Manibusan, Sellers, Tran, Sun, Phuong, Barnett, Hehli, Song, Deguzman, Ensari, Pinkstaff, Sullivan, Biroc, Cho, Schultz, Dijoseph, Dougher, Ma, Dushin, Leal, Tchistiakova, Feyfant, Gerber and Sapra2014).

The incorporation of several different modifications into the same polypeptide chain not only requires optimized suppression of the most commonly reassigned amber stop codon, but also the availability of other unique or rare codons that can be used to develop orthogonal tRNA/aaRS pairs. Not surprisingly, the ochre and opal stop codons are often combined with the amber codon for dual incorporation of UAAs, primarily with fluorescence-type applications in mind (Chatterjee et al. Reference Chatterjee, Sun, Furman, Xiao and Schultz2013b; Wan et al. Reference Wan, Huang, Wang, Russell, Pai, Russell and Liu2010; Xiao et al. Reference Xiao, Chatterjee, Choi, Bajjuri, Sinha and Schultz2013). The use of two stop codons, however, suffers from and amplifies many of the drawbacks described above and, if successful, results in very low protein yields. To circumvent this problem, translation machinery engineered to recognize quadruplet codons has been developed (Neumann et al. Reference Neumann, Wang, Davis, Garcia-Alai and Chin2010; Wang et al. Reference Wang, Sachdeva, Cox, Wilf, Lang, Wallace, Mehl and Chin2014a). The use of quadruplet codons can in principle provide 256 blank codons, thus dramatically expanding the capabilities of the natural 64-codon based genomes. Such systems, however, require the engineering of orthogonal ribosomes that can efficiently recognize quadruplet messages (Neumann et al. Reference Neumann, Wang, Davis, Garcia-Alai and Chin2010). Alternatively, the genomes of organisms such as E. coli can be reprogrammed to create compressed codon schemes and free up unique codons for reassignment (Ostrov et al. Reference Ostrov, Landon, Guell, Kuznetsov, Teramoto, Cervantes, Zhou, Singh, Napolitano, Moosburner, Shrock, Pruitt, Conway, Goodman, Gardner, Tyree, Gonzales, Wanner, Norville, Lajoie and Church2016; Wang et al. Reference Wang, Fredens, Brunner, Kim, Chia and Chin2016). In the future, it may also be possible to create new coding schemes by expanding the genetic alphabet of living organisms and utilizing synthetic DNA bases in new synthetic organisms (Malyshev et al. Reference Malyshev, Dhami, Lavergne, Chen, Dai, Foster, Correa and Romesberg2014).

3.7 Chemical modification of unnatural amino acids

The ability to install unnatural amino acids in polypeptide chains with exquisite specificity has opened widely the doors to a post-cysteine world of bio-orthogonal reactions. There is now a rapidly expanding collection of reactions that work efficiently and selectively with low concentrations of reactants in physiological buffers or the cellular milieu. Here, we will review a small sample of these tools focusing primarily on reactions that might be used to install site-specific PTMs or their mimics, as well as probes for structural investigations. For a comprehensive discussion of the bio-orthogonal literature, we refer the interested reader to several excellent reviews on the subject (Lang & Chin, Reference Lang and Chin2014; Shieh & Bertozzi, Reference Shieh and Bertozzi2014; Spicer & Davis, Reference Spicer and Davis2014; Stephanopoulos & Francis, Reference Stephanopoulos and Francis2011).

The azido functional group is one of the most versatile modifications that can be installed on peptides and proteins by amber suppression, total chemical synthesis and/or peptide ligation. In addition to serving as a light-activatable cross-linker or infra-red probe, this small moiety is also at the heart of several important bio-orthogonal reactions. For example, in an extension of the classical Staudinger reduction reaction, azides react with triarylphosphines bearing an electrophilic trap (usually an ester) to form a stable amide bond (Saxon & Bertozzi, Reference Saxon and Bertozzi2000). While the first version of this methodology generated a residual phosphine oxide at the ligation junction, ‘traceless’ Staudinger ligation variants are now available (Nilsson et al. Reference Nilsson, Kiessling and Raines2000; Saxon et al. Reference Saxon, Armstrong and Bertozzi2000) (Fig. 10a ), as well as reactions based on azobenzene or caged phosphine reagents that can be activated with light (Shah et al. Reference Shah, Laughlin and Carrico2016; Szymanski et al. Reference Szymanski, Wu, Poloni, Janssen and Feringa2013). This reaction can be used to ligate a peptide fragment containing a C-terminal thioester to another fragment carrying an N-terminal azide resulting in an extended native polypeptide chain and bypassing the need for a cysteine residue at the ligation junction (Nilsson et al. Reference Nilsson, Kiessling and Raines2000). While displaying slower reaction kinetics compared with some other bio-orthogonal reactions, the Staudinger ligation can be impactful in protein engineering applications where versatility and orthogonality are desired. For example, it has been used to create native isopeptide linkages between ubiquitin and other proteins (Andersen & Raines, Reference Andersen and Raines2015), install multiple probes into proteins such as RNA polymerase or GPCRs (Chakraborty et al. Reference Chakraborty, Mazumder, Lin, Hasemeyer, Xu, Wang, Ebright and Ebright2015; Huber et al. Reference Huber, Naganathan, Tian, Ye and Sakmar2013), prepare glycoprotein or phosphoprotein mimics (Bernardes et al. Reference Bernardes, Linderoth, Doores, Boutureira and Davis2011; Serwa et al. Reference Serwa, Wilkening, Del Signore, Muhlberg, Claussnitzer, Weise, Gerrits and Hackenberger2009), and turn on the fluorescence of optical probes upon protein labeling (Lemieux et al. Reference Lemieux, De Graffenried and Bertozzi2003).

Fig. 10. Chemical modification of unnatural amino acids.

Another popular and versatile bio-orthogonal reaction is the azide-alkyne 1,3-dipolar cycloaddition (Rostovtsev et al. Reference Rostovtsev, Green, Fokin and Sharpless2002; Tornoe et al. Reference Tornoe, Christensen and Meldal2002), commonly referred to as ‘click chemistry’ (Kolb et al. Reference Kolb, Finn and Sharpless2001). The cycloaddition can be promoted either by the presence of Cu(I) ligands or the use of highly strained cyclooctyne systems (Agard et al. Reference Agard, Prescher and Bertozzi2004). Both versions are specific, exhibit relatively fast reaction kinetics and are easy to use, which has led to numerous applications in the protein conjugation area. Recent examples include the incorporation of dual EPR or FRET probes for distance measurements in proteins or RNA (Kucher et al. Reference Kucher, Korneev, Tyagi, Apfelbaum, Grohmann, Lemke, Klare, Steinhoff and Klose2016; Lavergne et al. Reference Lavergne, Lamichhane, Malyshev, Li, Li, Sperling, Williamson, Millar and Romesberg2016), and the attachment of rigid lanthanide tags to large proteins for paramagnetic relaxation experiments (Mallagaray et al. Reference Mallagaray, Dominguez, Peters and Perez-Castells2016). In vivo applications of the Cu(I) version have been limited by copper toxicity, although extracellular labeling of cells has been reported (Link et al. Reference Link, Vink and Tirrell2004; Uttamapinant et al. Reference Uttamapinant, Tangpeerachaikul, Grecian, Clarke, Singh, Slade, Gee and Ting2012). The strained cyclooctyne systems, on the other hand, are compatible with living systems and improved versions that exhibit faster labeling kinetics are available for genetic incorporation through amber suppression (Dommerholt et al. Reference Dommerholt, Schmidt, Temming, Hendriks, Rutjes, Van Hest, Lefeber, Friedl and Van Delft2010; Lang et al. Reference Lang, Davis, Wallace, Mahesh, Cox, Blackman, Fox and Chin2012b; Plass et al. Reference Plass, Milles, Koehler, Schultz and Lemke2011) (Fig. 10b ). These copper-free cycloadditions have thus been valuable in imaging and proteomic applications in living cells and organisms (Laughlin et al. Reference Laughlin, Baskin, Amacher and Bertozzi2008; Smits et al. Reference Smits, Borrmann, Roosjen, Van Hest and Vermeulen2016; Xie et al. Reference Xie, Dong, Huang, Hong, Lei and Chen2014), and can be useful for the structural biologist interested in understanding the dynamics of macromolecules in the cellular environment either by FRET or EPR (Kucher et al. Reference Kucher, Korneev, Tyagi, Apfelbaum, Grohmann, Lemke, Klare, Steinhoff and Klose2016).

The success of click chemistry bioconjugation has inspired the search for faster and more efficient reactions for labeling in the cellular milieu. An important development along this direction involves the use of inverse-electron demand Diels-Alder reactions between strained dienophiles and tetrazine dienes (Blackman et al. Reference Blackman, Royzen and Fox2008; Devaraj et al. Reference Devaraj, Weissleder and Hilderbrand2008). The dienophile system in this case can be a trans-cyclooctene or a norbornene, both exhibiting significantly faster reaction kinetics as compared to the 1,3-dipolar cycloadditions described above, and thus allowing labeling reactions of proteins at much lower concentrations (Fig. 10c ). These bio-orthogonal reactive handles are also available for genetic incorporation by amber suppression (Lang et al. Reference Lang, Davis, torres-Kolbus, Chou, Deiters and Chin2012a, Reference Lang, Davis, Wallace, Mahesh, Cox, Blackman, Fox and Chin2012b; Seitchik et al. Reference Seitchik, Peeler, Taylor, Blackman, Rhoads, Cooley, refakis, Fox and Mehl2012) and provide an expanded and rapidly evolving toolbox for bio-orthogonal protein manipulations and modifications in the cellular environment.

While most of the reactions described above are incredibly useful for the conjugation of small molecules and biophysical probes to proteins in vitro and in vivo, they generally leave a relatively large chemical ‘scar’ at the conjugation site to be practical in the generation of PTMs or their mimics. A versatile approach for the site-specific installation of PTMs involves the UAA dehydroalanine. This UAA can be generated from a cysteine residue installed by site-directed mutagenesis (Chalker et al. Reference Chalker, Lercher, Rose, Schofield and Davis2012) or from an O-phosphoserine precursor incorporated by amber suppression (Yang et al. Reference Yang, Ha, Ahn, Kim, Kim, Lee, Kim, Soll, Lee and Park2016). Under biocompatible conditions, dehydroalanine can then be reacted with a variety of alkyl halides, via a radical-mediated process, to produce an impressive list of more than 50 modified functionalities at the site of interest (Wright et al. Reference Wright, Bower, Chalker, Bernardes, Wiewiora, Ng, Raj, Faulkner, Vallee, Phanumartwiwath, Coleman, Thezenas, Khan, Galan, Lercher, Schombs, Gerstberger, Palm-Espling, Baldwin, Kessler, Claridge, Mohammed and Davis2016; Yang et al. Reference Yang, Ha, Ahn, Kim, Kim, Lee, Kim, Soll, Lee and Park2016). These include methylated lysine and arginine residues, as well as fluorinated, glycosylated, phosphorylated, alkylated or isotopically labeled natural and UAA side chains. It should be noted that this methodology currently leads to the generation of both D and L configurations at the mutation site (i.e. the reaction lacks stereochemical control), a potential limitation for some applications.

3.8 Enzymatic bioconjugation approaches

The site-specific incorporation of bio-orthogonal handles on proteins can also be achieved enzymatically. These enzymes choose their targets based on the recognition of a specific amino acid sequence and can perform various chemical modifications of cysteine, lysine, serine, glutamine or glycine residues within their target sequences. This toolbox, for example, includes the formylglycine generating enzyme (FGE) that recognizes the CXPXR sequence motif and converts the cysteine residue to formylglycine, thus introducing an aldehyde functional group (Wu et al. Reference Wu, Shui, Carlson, Hu, Rabuka, Lee and Bertozzi2009) (Fig. 11a ). The aldehyde handle can be subsequently elaborated with various probes via bio-orthogonal transformations such as oximation and Hydrazino-Pictet-Spengler reactions (Agarwal et al. Reference Agarwal, Kudirka, Albers, Barfield, DE Hart, Drake, Jones and Rabuka2013; Dirksen & Dawson, Reference Dirksen and Dawson2008). Another popular tool is lipoic acid ligase, an enzyme that modifies a lysine side-chain within the 13-residue target sequence (Uttamapinant et al. Reference Uttamapinant, White, Baruah, Thompson, Fernandez-Suarez, Puthenveetil and Ting2010). Engineered versions of this enzyme can accommodate lipoic acid analogs and have been used to introduce bio-orthogonal handles, including azides (Plaks et al. Reference Plaks, Falatach, Kastantin, Berberich and Kaar2015; Uttamapinant et al. Reference Uttamapinant, Tangpeerachaikul, Grecian, Clarke, Singh, Slade, Gee and Ting2012) (Fig. 11b ), aryl aldehydes and hydrazines (Cohen et al. Reference Cohen, Zou and Ting2012), p-iodophenyl derivatives (Hauke et al. Reference Hauke, Best, Schmidt, Baalmann, Krause and Wombacher2014), norbornenes (Best et al. Reference Best, Degen, Baalmann, Schmidt and Wombacher2015) an trans-cyclooctenes (Liu et al. Reference Liu, Tangpeerachaikul, Selvaraj, Taylor, Fox and Ting2012). It is also possible to engineer fluorescent lipoic acid analogs that can be installed directly on the protein of interest (Uttamapinant et al. Reference Uttamapinant, White, Baruah, Thompson, Fernandez-Suarez, Puthenveetil and Ting2010). Other members of the enzymatic toolbox include biotin ligase, farnesyltransferase, transglutaminase and N-myristoyltransferase (reviewed in (Rashidian et al. Reference Rashidian, Dozier and Distefano2013)).

Fig. 11. Examples of bioenzymatic conjugation approaches. (a) Site-specific modification of cysteine with formylglycine generating enzyme, followed by oxime ligation to attach a chemical or optical probe. (b) Introduction of ‘click’ handles into proteins using lipoic acid ligase and lipoic acid analogs.

Enzymatic bioconjugation is an increasingly useful tool for cell imaging applications, functionalization of therapeutic proteins, immobilization of proteins on solid supports or the preparation of protein—polymer or protein—nanoparticle conjugates (Hu et al. Reference Hu, Berti and Adamo2016; Slavoff et al. Reference Slavoff, Liu, Cohen and Ting2011; Walper et al. Reference Walper, Turner and Medintz2015). For the structural biologist and biophysicist, it offers an orthogonal strategy for protein modification that is based on genetically encodable peptide tags (Stephanopoulos & Francis, Reference Stephanopoulos and Francis2011). For example, it provides a conceptually straightforward way to introduce a second modification to a construct that already contains an UAA incorporated by amber suppression. Enzymatic bioconjugation can also be useful in the multiplex labeling of complex protein mixtures, and the modification of constructs that are large, difficult to purify or not easily amenable to the approaches outlined above. Since the recognition sequence is fused to the protein of interest, the modification site is usually limited in location to the N- and C-termini, or to a surface exposed and flexible loop of the protein. Other important considerations include the reaction kinetics, the stability and solubility of the enzymes, and the relatively large size of the modifications that are installed and that can perturb the function of the protein target.

4. Protein engineering approaches for tackling outstanding challenges in structural biology

4.1 X-ray crystallography

The molecular engineering toolbox presented here can aid crystallographers in all stages of the structure determination process. For example, when phase values cannot be determined by direct methods or molecular replacement, protein crystallographers can incorporate heavy atoms (e.g. Se, I or Br) into the protein crystal and use their anomalous diffraction patterns to solve the structure. Heavy atom incorporation can be accomplished in several ways (reviewed in (Pike et al. Reference Pike, Garman, Krojer, Von Delft and Carpenter2016). For example, cysteine residues can be derivatized with mercury by pre-labeling, co-crystallization or soaking of the protein crystals in mercury salts (Martinez et al. Reference Martinez, De Geus, Stanssens, Lauwereys and Cambillau1993). If necessary, cysteine accessibility can be pre-evaluated with Ellman's reagent (5,5′-dithiobis-2-nitrobenzoic acid) (Li et al. Reference Li, Pye and Caffrey2015). Another popular heavy atom labeling methodology relies on the incorporation of the modified amino acid selenomethionine (Hendrickson et al. Reference Hendrickson, Horton and Lemaster1990). In this case, selenomethionine is added to the culture of a methionine-auxotrophic bacterial strain resulting in substitution of all methionine residues in the protein. Selenomethionine incorporation is also possible in other non-bacterial expression systems, albeit with somewhat lower substitution efficiency (Cronin et al. Reference Cronin, Lim and Rogers2007; Nettleship et al. Reference Nettleship, Assenberg, Diprose, Rahman-Huq and Owens2010). Site-specific incorporation of appropriately modified unnatural amino acids (e.g. p-iodo-L-phenylalanine, p-bromo-L-phenylalanine and 3-iodo-L-tyrosine) can be performed with amber suppression or chemical synthesis (Kwon et al. Reference Kwon, Wang and Tirrell2006; Sakamoto et al. Reference Sakamoto, Murayama, Oki, Iraha, Kato-Murayama, Takahashi, Ohtake, Kobayashi, Kuramitsu, Shirouzu and Yokoyama2009; Xie et al. Reference Xie, Wang, Wu, Brock, Spraggon and Schultz2004; Yeung et al. Reference Yeung, Squire, Yosaatmadja, Panjikar, Lopez, Molina, Baker, Harris and Brimble2016). Solving the phase problem can also be aided by the development of metal-binding peptide tags (e.g. Tb³⁺) that can be genetically fused to the protein of interest (Silvaggi et al. Reference Silvaggi, Martin, Schwalbe, Imperiali and Allen2007).

Total chemical synthesis of proteins is an important tool for racemic and quasi-racemic crystallography (Yeates & Kent, Reference Yeates and Kent2012) (Fig. 12). These methods are based on the observation that mixing D- and L-forms of proteins can aid crystallization as macromolecules have access to a much larger set of crystallographic space groups, including those that contain mirror or center of inversion operations (Wukovitz & Yeates, Reference Wukovitz and Yeates1995). D-enantiomers of proteins can be made using solid-phase peptide synthesis and NCL, with synthesis efficiencies that have allowed the structure determination of constructs in the 200-amino acid range (Mandal et al. Reference Mandal, Uppalapati, Ault-Riche, Kenney, Lowitz, Sidhu and Kent2012; Pan et al. Reference Pan, Gao, Zheng, Tan, Lan, Tan, Sun, Lu, Wang, Zheng, Huang, Wang and LIU2016). The synthetic origin of such polypeptides also allows the facile co-incorporation of amino acids containing heavy atoms (Yeung et al. Reference Yeung, Squire, Yosaatmadja, Panjikar, Lopez, Molina, Baker, Harris and Brimble2016), or chemically well-defined post-translational modifications (Okamoto et al. Reference Okamoto, Mandal, Sawaya, Kajihara, Yeates and Kent2014).

Fig. 12. Principle of racemic crystallography. The L- and D- forms of the polypeptide chains are prepared separately by solid-phase peptide synthesis and native chemical ligation. The proteins are subsequently mixed and co-crystallized, thus gaining access to a much larger set of crystallographic space groups.

For structure characterization of larger biological macromolecules and complexes, it is often more practical to use mimics rather than the native PTM. For example, methylation mimics are accessible through the alkylation of cysteine residues, and produce constructs with high yields and chemical purity in all possible methylation states (Simon et al. Reference Simon, Chu, Racki, De La Cruz, Burlingame, Panning, Narlikar and Shokat2007). Such approaches have been invaluable in elucidating the impact of histone methylation on the nucleosome surface, a structural problem that requires the efficient crystallization of a biological assembly comprised of four different proteins and DNA (Lu et al. Reference Lu, Simon, Chodaparambil, Hansen, Shokat and Luger2008). Sometimes, non-native linkages and modifications are essential in trapping an important functional state of a protein complex, for example between an ubiquitylated protein and the corresponding deubiquitinating enzyme (Morgan et al. Reference Morgan, Haj-Yahya, Ringel, Bandi, Brik and Wolberger2016). The ongoing optimization of amber suppression systems for in-cell and cell-free protein synthesis has also made it easier to produce certain native modifications in high-enough yield for protein crystallization. There are, for example, several crystal structures of acetylated and phosphorylated proteins prepared by amber suppression in E. coli (Arbely et al. Reference Arbely, Natan, Brandt, Allen, Veprintsev, Robinson, Chin, Joerger and Fersht2011; Huguenin-Dezot et al. Reference Huguenin-Dezot, De Cesare, Peltier, Knebel, Kristaryianto, Rogerson, Kulathu, Trost and Chin2016; Kuhlmann et al. Reference Kuhlmann, Wroblowski, Knyphausen, DE Boor, Brenig, Zienert, Meyer-Teschendorf, Praefcke, Nolte, Kruger, SCHACHERL, Baumann, James, Chin and Lammers2016; Lammers et al. Reference Lammers, Neumann, Chin and James2010). This technology has also enabled the structural analysis of ubiquitin chains (Virdee et al. Reference Virdee, Ye, Nguyen, Komander and Chin2010). Crystallographic studies of proteins containing multiple modifications have been successfully addressed with cell-free protein synthesis (Wakamori et al. Reference Wakamori, Fujii, Suka, Shirouzu, Sakamoto, Umehara and Yokoyama2015) and EPL (Wang et al. Reference Wang, Kavran, Chen, Karukurichi, Leahy and Cole2014b).

Beyond the incorporation of PTMs, the modern protein engineering toolbox provides the means to precisely alter the covalent structure of proteins, importantly allowing access to both the amino acid side-chains and the polypeptide backbone. Crystallographic analysis of such modified proteins can be extremely powerful, yielding insights into processes as diverse as ion conduction (Grosse et al. Reference Grosse, Essen and Koert2011; Valiyaveetil et al. Reference Valiyaveetil, Leonetti, Muir and Mackinnon2006), enzyme catalysis (Torbeev et al. Reference Torbeev, Raghuraman, Hamelberg, Tonelli, Westler, Perozo and Kent2011; Wang et al. Reference Wang, Kavran, Chen, Karukurichi, Leahy and Cole2014b) and protein–protein interactions (Lu et al. Reference Lu, Randal, Kossiakoff and Kent1999; Morgan et al. Reference Morgan, Haj-Yahya, Ringel, Bandi, Brik and Wolberger2016).

4.2 Nuclear magnetic resonance

The site-specific incorporation of NMR probes into biological macromolecules presents two challenging requirements to the protein chemist. First, similarly to X-ray crystallography, large amounts of sample are required, and the incorporation methods should be robust and efficient. Second, the isotopic labeling precursors are often prohibitively expensive. It therefore comes as no surprise that most NMR labeling strategies exploit the metabolic pathways in bacteria to produce uniform or sparse isotopic labeling schemes, or incorporate isotopic labels in an amino acid specific fashion (for a comprehensive series of reviews on isotopic labeling of biomolecules, the interested reader is referred to volume 565 of Methods in Enzymology, 2015). Here, we will focus on labeling methodologies that allow more control over the placement of spectroscopic probes in a polypeptide chain, and that have historically been demonstrated to produce sufficient protein amounts for NMR analysis.

4.2.1 Segmental isotopic labeling

Segmental isotopic labeling is a powerful protein engineering approach that allows the generation of an intact and typically natively folded and functional protein where only a certain segment of the polypeptide chain is ‘visible’ by NMR (Xu et al. Reference Xu, Ayers, Cowburn and Muir1999; Yamazaki et al. Reference Yamazaki, Otomo, Oda, Kyogoku, Uegaki, Ito, Ishino and Nakamura1998). This strategy is particularly impactful if the protein of interest is large and/or has a degenerate amino acid sequence, thus simplifying assignment and interpretation of crowded and overlapped NMR spectra. Such constructs can be crucial in generating unambiguous structural constraints in large proteins and assemblies, and have been exploited in structural investigations both by solution and solid-state NMR (Mehler et al. Reference Mehler, Eckert, Busche, Kulhei, Michaelis, Becker-Baldus, Wachtveitl, Dotsch and Glaubitz2015; Schubeis et al. Reference Schubeis, Yuan, Ahmed, Nagaraj, Van Rossum and Ritter2015; Tremblay et al. Reference Tremblay, Xu, Lefevre, Sarker, Orrell, Leclerc, Meng, Pezolet, Auger, Liu and Rainey2015; Williams et al. Reference Williams, Milbradt, Embrey and Bobby2016). Several approaches can be used to achieve segmental labeling of polypeptide chains. For example, a protein domain can be fused to a contiguous intein and produced recombinantly in labeled media (e.g. supplemented with ¹⁵N and ¹³C-enriched nutrient sources). Following thiolysis of the intein, the thioester derivative of the labeled domain can then be ligated, via EPL, to a protein fragment that does not contain NMR isotopes (Xu et al. Reference Xu, Ayers, Cowburn and Muir1999). In principle, the NMR silent segment can be a synthetic peptide carrying a PTM or a paramagnetic tag, thus allowing the facile incorporation of C-terminal protein modifications. For example, this approach has been used to introduce a phosphorylated tyrosine residue at position 125 of the amyloid-related protein α-synuclein, while residues 1–106 were prepared recombinantly and isotopically labeled for NMR analysis (Hejjaoui et al. Reference Hejjaoui, Butterfield, Fauvet, Vercruysse, Cui, Dikiy, Prudent, Olschewski, ZHANG, Eliezer and Lashuel2012). In practice, however, the second fragment is usually produced recombinantly due to cost and yield considerations.

PTS offers an alternative intein-based approach for segmental isotopic labeling. The high affinity of naturally split intein fragments means that the splicing reactions can be conducted at very low concentrations of reactants (low μM), a capability that distinguishes the PTS strategy from EPL where reactant concentrations in the mM range are typically required. In practice, the target protein is divided into two segments, for example at domain boundaries, each fused to an appropriate intein fragment (Fig. 13a ). These constructs are expressed separately, allowing the differential incorporation of NMR-active isotopes. After purification, the two extein–intein fragments are mixed, protein trans-splicing takes place and the leftover intein components can be purified using an appropriate affinity tag or chromatographic separation. Notably, the splicing reaction can also be performed directly in the cellular environment. In this case, both constructs are transformed into the same cell, but with different promoters so that the expression of each construct can be controlled separately, and therefore timed with the addition of isotopically enriched nutrients to the media (Muona et al. Reference Muona, Aranko, Raulinaitis and Iwai2010) (Fig. 13b ). This strategy can be particularly helpful for labeling proteins that are hard to refold in vitro, e.g. membrane proteins (Mehler et al. Reference Mehler, Eckert, Busche, Kulhei, Michaelis, Becker-Baldus, Wachtveitl, Dotsch and Glaubitz2015). The first segmental labeling protocols relied on artificially split or natural inteins that were not very efficient, thus limiting the utility of the technology to few favorable cases (Yamazaki et al. Reference Yamazaki, Otomo, Oda, Kyogoku, Uegaki, Ito, Ishino and Nakamura1998). Today, however, the toolbox of useful split inteins has greatly expanded, and it is now possible to choose from inteins with diverse properties, e.g. those giving better expression yields for the protein of interest, capable of carrying out ligations at higher temperatures or denaturant concentrations, or more forgiving to the choice of extein junction site (see Section 3.4.3).

Fig. 13. Strategies for segmental isotopic labeling of proteins for NMR analysis. (a) Intein-based segmental labeling. (b) Strategy for segmental labeling of proteins in cells. (c) Labeling with sortases.

Another option for segmental isotopic labeling is provided by sortase-mediated ligation (Fig. 13c ). As discussed in Section 3.5, this method relies on the use of Sortase A, a transpeptidase that can ligate two constructs provided that one contains an LPXTG recognition sequence, and the other, a glycine repeat motif. Since it results in an LPXTG ‘scar’ at the ligation junction, sortase-mediated ligation is best suited for the segmental labeling of protein domains separated by a flexible, mutation-tolerant linker (Williams et al. Reference Williams, Milbradt, Embrey and Bobby2016). The reversibility of the sortase ligation also necessitates the removal of cleaved peptide byproducts during the reaction course for optimal yields (Freiburger et al. Reference Freiburger, Sonntag, Hennig, Li, Zou and Sattler2015). On the other hand, segmental labeling using sortase works well with low μM concentrations of reagents, and is a good alternative for constructs that do not express well as intein fusions.

The segmental labeling approaches described here have also been used to generate protein constructs with NMR silent solubility enhancement tags (Kobashigawa et al. Reference Kobashigawa, Kumeta, Ogura and Inagaki2009; Zuger & Iwai, Reference Zuger and Iwai2005). To keep proteins more stable in solution during a long multidimensional experiment, NMR spectroscopists can resort to fusing the protein of interest to a soluble domain such as SUMO, GB1, MBP or thioredoxin. If prepared as a single construct, however, the solubility tag will be isotopically labeled and will contribute to the NMR spectrum, a complication that can be avoided with segmental labeling.

4.2.2 Site-specific incorporation of magnetic resonance probes

The introduction of a single isotopic label at a well-defined position in a polypeptide chain provides NMR spectroscopists with a benign and unambiguous reporter of protein structure, dynamics and ligand binding. Strategically positioned isotopes can also be crucial in disentangling higher order structural interactions in complex biological assemblies such as amyloid fibrils (Debelouchina et al. Reference Debelouchina, Bayro, Fitzpatrick, Ladizhansky, Colvin, Caporini, Jaroniec, Bajaj, Rosay, Macphee, Vendruscolo, Maas, Dobson and Griffin2013; Petkova et al. Reference Petkova, Yau and Tycko2006). The incorporation of such probes is relatively straightforward in short polypeptides prepared by solid-phase peptide synthesis, and many protected isotopically labeled amino acids are available commercially at a reasonable cost. Longer polypeptide chains containing specific labels can be prepared by NCL as exemplified by the synthesis of the membrane protein M2 containing five ¹³C,¹⁵N-labeled amino acids dispersed throughout the sequence (Kwon et al. Reference Kwon, Tietze, White, Liao and Hong2015). A serendipitously positioned native cysteine residue in this case provided a convenient ligation site without the need for desulfurization.

In recent years, the biomolecular NMR field has seen a surge in the popularity of ¹⁹F as a tool to investigate large and complex systems both by solution and solid-state NMR spectroscopy. This nucleus is bio-orthogonal, has magnetic properties that ensure high sensitivity (~80% of the sensitivity of ¹H nuclei) and a wide chemical shift range (~100-fold larger than the range of ¹H). It can therefore serve as a highly sensitive probe of protein and peptide structure, folding and aggregation, dynamics and ligand binding (reviewed in (Chen et al. Reference Chen, Viel, Ziarelli and Peng2013; Sharaf & Gronenborn, Reference Sharaf and Gronenborn2015). Specific ¹⁹F labels can be installed by peptide synthesis, conjugation or fluorination of cysteine and lysine residues, or by supplementing the culture media with fluorinated amino acids (which results in amino acid specific labeling). Increasingly, amber suppression is becoming the site-specific incorporation method of choice ((Marsh & Suzuki, Reference Marsh and Suzuki2014; Sharaf & Gronenborn, Reference Sharaf and Gronenborn2015). There are several orthogonal tRNA/synthetase systems and fluorinated UAAs available, each imparting different spectroscopic signatures with respect to chemical shift anisotropy and relaxation. These include 3,5-difluorotyrosine (Li et al. Reference Li, Shi, Li, Yang, Wang, Zhang, Gao, Ding, Li, Li, Xiong, Sun, Gong, Tian and Wang2013a; Yang et al. Reference Yang, Yu, Liu, Qu, Gong, Liu, Li, Wang, He, Yi, Song, Tian, Xiao, Wang and Sun2015), p-trifluoromethoxyphenylalanine (Cellitti et al. Reference Cellitti, Jones, Lagpacan, Hao, Zhang, Hu, Brittain, Brinker, Caldwell, Bursulaya, Spraggon, Brock, Ryu, Uno, Schultz and Geierstanger2008) and p-trifluoromethylphenylalanine (Jackson et al. Reference Jackson, Hammill and Mehl2007; Loscha et al. Reference Loscha, Herlt, Qi, Huber, Ozawa and Otting2012). While most protein ¹⁹F NMR applications rely on amber suppression in E. coli, magnetic resonance imaging studies are already exploiting the unique spectroscopic properties of ¹⁹F in mammalian systems or whole organisms (Yu et al. Reference Yu, Hallac, Chiguru and Mason2013). Site-specific ¹³C and ¹⁵N probes can also be installed by amber suppression, including ¹³C/¹⁵N-labeled p-methoxyphenylalanine and ¹⁵N-labeled o-nitrobenzyl-tyrosine (Cellitti et al. Reference Cellitti, Jones, Lagpacan, Hao, Zhang, Hu, Brittain, Brinker, Caldwell, Bursulaya, Spraggon, Brock, Ryu, Uno, Schultz and Geierstanger2008). Ligand-binding studies of large molecular systems may also benefit from incorporation of o-tert-butyltyrosine, where rapid bond rotations of the tert-butyl group ensure a narrow linewidth of the characteristic peak in solution NMR spectra (Chen et al. Reference Chen, Kuppan, Lee, Jaudzems, Huber and Otting2015).

Another problem of particular relevance to the magnetic resonance spectroscopist concerns the site-directed installation of EPR and PRE probes. These probes carry stable nitroxide radicals or chelated metal ions such as Mn²⁺, Cu²⁺ or Gd³⁺, and can be used to measure distances by EPR (up to 80 Å), or change the relaxation properties of nuclear spins in their vicinity (up to 35 Å). Usually, paramagnetic tags are incorporated at available cysteine residues using alkylation with maleimides, and many appropriately functionalized reagents are available commercially. As discussed in Section 3.1, however, cysteine-based approaches often have limited selectivity, the generated maleimide-based linker can be unstable, and is generally not applicable to cellular investigations. The development of amber suppression and bio-orthogonal reactions has certainly provided more options for EPR and PRE label incorporation both in vitro and in vivo. For example, orthogonal tRNA/tRNA synthetase systems are available for direct incorporation of nitroxide-containing (Jones et al. Reference Jones, Cellitti, Hao, Zhang, Jahnz, Summerer, Schultz, Uno and Geierstanger2010; Schmidt et al. Reference Schmidt, Fedoseev, Bucker, Borbas, Peter, Drescher and Summerer2015) and metal chelating UAAs (Park et al. Reference Park, Wang, Radoicic, De Angelis, Berkamp and Opella2015). Another amber suppression incorporation route involves the UAA p-acetyl-L-phenylalanine that can react with paramagnetic tags functionalized with a hydroxylamine moiety (Fleissner et al. Reference Fleissner, Brustad, Kalai, Altenbach, Cascio, Peters, Hideg, Peuker, Schultz and hubbell2009). An NMR-specific and often prohibitive requirement for the incorporation of PRE tags with amber suppression, however, is the concurrent need to introduce isotopic labels in the protein of interest. Thus, general protocols that provide efficient UAA incorporation under suboptimal bacterial growth conditions (minimal ¹³C/¹⁵N-supplemented or perdeuterated media, for example) are desperately needed (Evans & Millhauser, Reference Evans and Millhauser2015; Venditti et al. Reference Venditti, Fawzi and Clore2012).

Another demanding problem concerns the site-specific installation of isotopically labeled PTMs such as different methylation states, glycosylation, ubiquitylation and various acylations (phosphorylation being somewhat of an exception as the naturally abundant ³¹P is the NMR active isotope). Coupled with the often problematic behavior of the modified proteins (e.g. aggregation prone intrinsically disordered and amyloidogeneic peptides and proteins, insoluble transmembrane domains or chromatin effectors), PTM installation in itself is a challenge even without the extra complication and associated cost of isotopic labeling. Not surprisingly, therefore, most NMR studies involving PTMs are performed with constructs that have been modified enzymatically. Enzymatic modification, however, is often incomplete and can result in a heterogeneous set of modifications, thus complicating the interpretation of NMR data. Well-established non-enzymatic site-directed protocols are available for ubiquitylation, taking advantage of the possibility to prepare and isotopically label ubiquitin recombinantly. The labeled ubiquitin moiety can then be installed at a specific position of the modified polypeptide using a variety of strategies including an asymmetric disulfide (Debelouchina et al. Reference Debelouchina, Gerecht and Muir2017), a condensation reaction of a C-terminal ubiquitin thioester and the ε-amine of a targeted lysine residue (Castaneda et al. Reference Castaneda, Liu, Chaturvedi, Nowicka, Cropp and Fushman2011a), or a combined recombinant expression–chemical synthesis approach (Castaneda et al. Reference Castaneda, Spasser, Bavikar, Brik and Fushman2011b). Site-specific phosphorylation can be incorporated by solid-phase peptide synthesis, followed by EPL to construct segmentally labeled and modified proteins (Hejjaoui et al. Reference Hejjaoui, Butterfield, Fauvet, Vercruysse, Cui, Dikiy, Prudent, Olschewski, ZHANG, Eliezer and Lashuel2012). Studies of glycosylated proteins also require the attachment of homogeneous glycans that are often isotopically labeled to facilitate resonance assignments and structure determination (Skrisovska et al. Reference Skrisovska, Schubert and Allain2010). Cysteine-based ¹³C-labeled methyllysine analogs are a convenient and affordable way to generate methylated constructs, and have been used in the NMR structural analysis of proteins such as HP1 and p53 (Cui et al. Reference Cui, Park, Badeaux, Kim, Lee, Thompson, Yan, Kaneko, Yuan, Botuyan, Bedford, Cheng and Mer2012; Munari et al. Reference Munari, Soeroes, Zenn, Schomburg, Kost, Schroder, Klingberg, Rezaei-Ghaleh, Stutzer, Gelato, Walla, Becker, Schwarzer, Zimmermann, Fischle and Zweckstetter2012).

4.3 Studies of dynamic interactions

4.3.1 Incorporation of optical probes

The advances in single-molecule optical spectroscopy and imaging have propelled the search for better fluorescent probes and the development of more efficient and targeted strategies for their incorporation in vitro and in cells. The information obtained from such experiments can complement high-resolution structural studies and provide data not easily accessible by other methods, including dynamic protein–protein interactions, conformational states and protein folding pathways. There are, of course, many excellent reviews on the subject (Dimura et al. Reference Dimura, Peulen, Hanke, Prakash, Gohlke and Seidel2016; Haney et al. Reference Haney, Wissner and Petersson2015; Minoshima & Kikuchi, Reference Minoshima and Kikuchi2017; Nikic & Lemke, Reference Nikic and Lemke2015). Therefore, we will limit our discussion to the problem of incorporating two small FRET probes for biophysical and structural investigations, still a surprisingly challenging task for the protein chemist.

A protein construct prepared for FRET experiments can be labeled with two different fluorophores (A and B) by cysteine mutagenesis at two appropriately selected sites. Incubation with the maleimide, iodoacetamide or methyl bromide derivatives of the two probes results in a statistical mixture of labeled constructs even if labeling is efficient (A–A, A–B, B–A, B–B). Since the mixture of products can complicate data analysis (Husada et al. Reference Husada, Gouridis, Vietrov, Schuurman-Wolters, Ploetz, De Boer, Poolman and Cordes2015), more selective strategies for labeling are highly desirable and have been explored in the literature. A popular approach, for example, combines cysteine labeling with amber suppression that can introduce a second fluorescent moiety either directly or through bio-orthogonal chemistries (Brustad et al. Reference Brustad, Lemke, Schultz and Deniz2008; Haney et al. Reference Haney, Wissner, Warner, Wang, Ferrie, Covell, Karpowicz, LEE and Petersson2016; Milles et al. Reference Milles, Tyagi, Banterle, Koehler, Vandelinder, Plass, Neal and Lemke2012; Ratzke et al. Reference Ratzke, Hellenkamp and Hugel2014). Suitable UAAs include p-acetylphenylalanine for labeling with commercially available hydroxylamine probe derivatives (Fig. 14a ), as well as azido- and alkyne-functionalized phenylalanine, tyrosine and lysine residues for click reactions. Cysteine labeling can also be combined with the site-specific incorporation of 1,2-aminothiols that react selectively with cyanobenzothiazole fluorescent-probe derivatives (Nguyen et al. Reference Nguyen, Elliott, Holt, Muir and Chin2011). A conceptually different cysteine-based labeling approach involves the FlAsH system that targets biarsenic reagents to the genetically encoded tetracysteine motif CCPGCC (Griffin et al. Reference Griffin, Adams and Tsien1998). This technology has the advantage that the small molecule ligands are cell permeable and virtually non-fluorescent until they bind their recognition motif, and are therefore well-suited for in-cell applications. In the context of dual labeling, FlAsH has been combined with amber suppression (Perdios et al. Reference Perdios, Lowe, Saladino, Bunney, Thiyagarajan, Alexandrov, Dunsby, French, Chin, Gervasio, Tate and Katan2017) (Fig. 14b ) or fluorescent proteins such as CFP (Hoffmann et al. Reference Hoffmann, Gaietta, Bunemann, Adams, Oberdorff-Maass, Behr, Vilardaga, Tsien, Ellisman and Lohse2005) for the investigation of protein conformational states in vitro or in the cellular environment.

Fig. 14. Dual labeling of proteins with fluorophores. (a) A labeling strategy based on the combination of cysteine chemistry and amber suppression (Brustad et al. Reference Brustad, Lemke, Schultz and Deniz2008). (b) The FlAsH labeling system can be used for the selective modification of a genetically encoded peptide tag, in combination with amber suppression (Perdios et al. Reference Perdios, Lowe, Saladino, Bunney, Thiyagarajan, Alexandrov, Dunsby, French, Chin, Gervasio, Tate and Katan2017). (c) A dual labeling strategy based on native chemical ligation and amber suppression (Wissner et al. Reference Wissner, Batjargal, Fadzen and Petersson2013). (d) Genetic incorporation of two UAAs using orthogonal ribosomes that can decode the AGTA quadruplet codon (Sachdeva et al. Reference Sachdeva, Wang, Elliott and Chin2014).

When the protein construct of interest contains functionally or structurally important cysteines, dual labeling can be attempted with NCL, EPL, the incorporation of genetically encodable UAAs and peptide tags for bioenzymatic conjugation, or a combination of these approaches. For example, a fluorescent probe and a thioamide quencher can be installed using a combination of amber suppression and protein ligation, where the quencher is a backbone modification that is only accessible through protein synthesis (Wissner et al. Reference Wissner, Batjargal, Fadzen and Petersson2013) (Fig. 14c ). As discussed in Section 3.6.3, incorporation of two UAAs in a polypeptide chain by genetic means is still difficult; however, the growing numbers of successful applications are almost exclusively developed for FRET-labeling. For example, using cells containing the orthogonal ribosome ribo-Q1, two fluorescent labels could be incorporated efficiently into calmodulin, one in response to the amber TAG codon, the other at the newly assigned AGTA quadruplet codon (Sachdeva et al. Reference Sachdeva, Wang, Elliott and Chin2014) (Fig. 14 d). Rapid, quantitative, one-pot labeling was achieved by using mutually orthogonal cycloadditions at UAAs bearing a terminal alkyne and a cyclopropene. Alternatively, amber–ochre or amber–opal dual incorporation schemes can be used (Chatterjee et al. Reference Chatterjee, Sun, Furman, Xiao and Schultz2013b; Wan et al. Reference Wan, Huang, Wang, Russell, Pai, Russell and Liu2010; Xiao et al. Reference Xiao, Chatterjee, Choi, Bajjuri, Sinha and Schultz2013).

4.3.2 Incorporation of vibrational probes

Two-dimensional infrared spectroscopy (2D IR) has emerged as a powerful technique to characterize the dynamic states and interactions of biological macromolecules (Baiz et al. Reference Baiz, Reppert and Tokmakoff2013; Le Sueur et al. Reference Le Sueur, Horness and Thielges2015). Dynamic and structural information is encoded in the vibrational modes of functional groups such as backbone amides and carbonyls, as well as side-chains with aromatic, carbonyl and guanidinium groups. Since even polypeptides of moderate length contain many overlapping signals, site-specific incorporation of vibrational reporters with distinct spectroscopic signatures is essential for data interpretation. The installation of such probes can take advantage of several of the protein engineering approaches described above (Zhang et al. Reference Zhang, Grechko, Moran and Zanni2016b). Segmental isotopic labeling, for example, can be used to distinguish the spectroscopic behavior of protein domains (Moran et al. Reference Moran, Woys, Buchanan, Bixby, Decatur and Zanni2012). In this case, one domain is labeled with ¹³C, while the other domain remains at natural abundance. The higher mass of the ¹³C isotope changes the vibrational modes of the backbone carbonyls, and results in a downward shift of the vibrational frequency, effectively decoupling the spectroscopic signatures of the two domains. Higher resolution in linear and multidimensional IR spectra can be achieved by the site-specific installation of isotopic labels such as ¹³C–¹⁸O pairs by solid-phase peptide synthesis, EPL, NCL or the incorporation of appropriately labeled methionine during recombinant protein expression (Courter et al. Reference Courter, Abdo, Brown, Tucker, Hochstrasser and Smith2014; Davis et al. Reference Davis, Cooper and Dyer2015; Dhayalan et al. Reference Dhayalan, Fitzpatrick, Mandal, Whittaker, Weiss, Tokmakoff and Kent2016; Marecek et al. Reference Marecek, Song, Brewer, Belyea, Dyer and Raleigh2007; Zhang et al. Reference Zhang, Grechko, Moran and Zanni2016b). Resolution and specificity can also be achieved by the introduction of orthogonal vibrational probes such as nitrile, cyano, azido and thiocyanate functional groups. To this end, several UAAs suitable for amber suppression are available, including 4-cyano-, 4-azido- and 4-azidomethyl-phenylalanine (Bazewicz et al. Reference Bazewicz, Liskov, Hines and Brewer2013; Schultz et al. Reference Schultz, Supekova, Ryu, Xie, Perera and Schultz2006; Ye et al. Reference Ye, Zaitseva, Caltabiano, Schertler, Sakmar, Deupi and Vogel2010). ¹³C, ¹⁵N-labeled thiocyanates, on the other hand, can be obtained by cyanylation of cysteine residues (van Wilderen et al. Reference Van Wilderen, Kern-Michler, Muller-Werkmeister and Bredenbeck2014).

5. Outlook

Our goal in this review has been to examine the contents of the modern protein engineering toolbox, with the particular needs of the structural biologist in mind. As is hopefully evident from the preceding sections, there are now many highly versatile strategies to manipulate and decorate protein constructs, allowing the creation of molecules and assemblies that faithfully represent the complexity of biological systems. We hope that this resource will serve as an inspiration to the structural biologist looking to improve on sample preparation protocols, construct interesting and relevant samples, or devise new strategies that push the boundaries of modern structural biology methods. Undoubtedly, these tools will continue to improve and evolve, expanding the range of options for both in vitro and in vivo applications. Yet, many challenges still lie ahead. For example, better tools are desperately needed to reign in obstinate systems such as membrane proteins, intrinsically disordered domains or amyloidogeneic polypeptides (Butterfield et al. Reference Butterfield, Hejjaoui, Fauvet, Awad and Lashuel2012; Uversky, Reference Uversky2015; Zuo et al. Reference Zuo, Tang and Zheng2015). These constructs, traditionally recalcitrant to protein engineering and structural biology analysis alike, beg for more efficient (semi)-synthetic, genetic or intein/sortase-based approaches that circumvent low expression yields and poor solubility issues. Next, the site-specific manipulation of large proteins and assemblies (especially those prepared in eukaryotic systems) is still an overwhelmingly difficult task, and new chemical tools and ideas are needed to supplement the existing genetic and enzymatic approaches. At the same time, the demand for constructs bearing multiple modifications will continue to increase. For example, dissecting the molecular basis of biological cross-talks requires access to homogeneous samples of proteins carrying multiple chemically diverse PTMs (Allis & Muir, Reference Allis and Muir2011; Bah & Forman-Kay, Reference Bah and Forman-Kay2016). Many biological molecules and assemblies are also too complex for structural analysis by one technique alone (Cramer, Reference Cramer2016; McGinty & Tan, Reference Mcginty and Tan2015; Tynan et al. Reference Tynan, Lo Schiavo, Zanetti-Domingues, Needham, Roberts, Hirsch, Rolfe, Korovesis, Clarke and Martin-Fernandez2016), and the increasing need for method integration will necessitate the construction of samples suitable for multi-modal studies.

Concurrently, structural biologists are devising new strategies to improve the sensitivity of their respective methodologies and reduce the amounts of precious biological materials required for structural analysis. New technological developments such as X-ray free-electron lasers (Neutze et al. Reference Neutze, Branden and Schertler2015), polarization enhancement strategies for NMR spectroscopy (Ardenkjaer-Larsen et al. Reference Ardenkjaer-Larsen, Boebinger, Comment, Duckett, Edison, Engelke, Griesinger, Griffin, Hilty, Maeda, Parigi, Prisner, Ravera, Van Bentum, Vega, Webb, Luchinat, Schwalbe and Frydman2015; Maly et al. Reference Maly, Debelouchina, Bajaj, Hu, Joo, Mak-Jurkauskas, Sirigiri, Van Der Wel, Herzfeld, Temkin and Griffin2008), and the enhanced capabilities of cryo-EM instrumentation (Nogales, Reference Nogales2016) will certainly be central to these efforts. The rise of mass spectrometry based structural approaches, including hydrogen–deuterium exchange, ion mobility-mass spectrometry or cross-linking based analysis, also holds the bright promise to reveal structural information from samples in the picomole regime (Lossl et al. Reference Lossl, Van De Waterbeemd and Heck2016). At the same time, structural biologists have started to turn to the cellular milieu as the future arena of structural endeavors (Beck & Baumeister, Reference Beck and Baumeister2016; Freedberg & Selenko, Reference Freedberg and Selenko2014). In this undoubtedly daunting task, structural biologists are not alone. Protein engineers, with their long track record of successes in the selective manipulation of complex systems both in vitro and in vivo, are ready to meet these exciting new challenges through innovations that will continue to push the boundaries of chemical biology.

Acknowledgements

We would like to thank Adam Stevens, Antony Burton, Felix Wojcik and Robert Thompson for many helpful discussions. We also acknowledge the contributions of many colleagues whose work could not be included in this review due to space limitations. This work was supported by US National Institutes of Health grants R37-GM086868, R01-GM107047 and P01-CA196539.

References

Agard, N. J., Prescher, J. A. & Bertozzi, C. R. (2004). A strain-promoted [3 + 2] azide-alkyne cycloaddition for covalent modification of biomolecules in living systems. Journal of the American Chemical Society 126(46), 15046–15047.Google Scholar

Agarwal, P., Kudirka, R., Albers, A. E., Barfield, R. M., DE Hart, G. W., Drake, P. M., Jones, L. C. & Rabuka, D. (2013). Hydrazino-Pictet-Spengler ligation as a biocompatible method for the generation of stable protein conjugates. Bioconjugate Chemistry 24(6), 846–851.Google Scholar

Ai, H. W., Shen, W., Sagi, A., Chen, P. R. & Schultz, P. G. (2011). Probing protein–protein interactions with a genetically encoded photo-crosslinking amino acid. ChemBioChem 12(12), 1854–1857.Google Scholar

Allis, C. D. & Muir, T. W. (2011). Spreading chromatin into chemical biology. ChemBioChem 12(2), 264–279.Google Scholar

Andersen, K. A. & Raines, R. T. (2015). Creating site-specific isopeptide linkages between proteins with the traceless Staudinger ligation. Methods in Molecular Biology 1248, 55–65.Google Scholar

Antos, J. M., Truttmann, M. C. & Ploegh, H. L. (2016). Recent advances in sortase-catalyzed ligation methodology. Current Opinion in Structural Biology 38, 111–118.Google Scholar

Appleby, J. H., Zhou, K., Volkmann, G. & Liu, X. Q. (2009). Novel split intein for trans-splicing synthetic peptide onto C terminus of protein. Journal of Biological Chemistry 284(10), 6194–6199.Google Scholar

Arbely, E., Natan, E., Brandt, T., Allen, M. D., Veprintsev, D. B., Robinson, C. V., Chin, J. W., Joerger, A. C. & Fersht, A. R. (2011). Acetylation of lysine 120 of p53 endows DNA-binding specificity at effective physiological salt concentration. Proceedings of the National Academy of Sciences United States of America 108(20), 8251–8256.Google Scholar

Ardenkjaer-Larsen, J. H., Boebinger, G. S., Comment, A., Duckett, S., Edison, A. S., Engelke, F., Griesinger, C., Griffin, R. G., Hilty, C., Maeda, H., Parigi, G., Prisner, T., Ravera, E., Van Bentum, J., Vega, S., Webb, A., Luchinat, C., Schwalbe, H. & Frydman, L. (2015). Facing and overcoming sensitivity challenges in biomolecular NMR spectroscopy. Angewandte Chemie (International edition in English) 54(32), 9162–9185.Google Scholar

Aulabaugh, A., Ding, W., Kapoor, B., Tabei, K., Alksne, L., Dushin, R., Zatz, T., Ellestad, G. & Huang, X. (2007). Development of an HPLC assay for Staphylococcus aureus sortase: evidence for the formation of the kinetically competent acyl enzyme intermediate. Analytical Biochemistry 360(1), 14–22.Google Scholar

Bah, A. & Forman-Kay, J. D. (2016). Modulation of intrinsically disordered protein function by post-translational modifications. Journal of Biological Chemistry 291(13), 6696–6705.Google Scholar

Baiz, C. R., Reppert, M. & Tokmakoff, A. (2013). An introduction to protein 2D IR spectroscopy. Ultrafast Infrared Vibrational Spectroscopy 361–403.Google Scholar

Barton, W. A., Tzvetkova-Robev, D., Erdjument-Bromage, H., Tempst, P. & Nikolov, D. B. (2006). Highly efficient selenomethionine labeling of recombinant proteins produced in mammalian cells. Protein Science 15(8), 2008–2013.Google Scholar

Basle, E., Joubert, N. & Pucheault, M. (2010). Protein chemical modification on endogenous amino acids. Chemistry & Biology 17(3), 213–227.Google Scholar

Batjargal, S., Walters, C. R. & Petersson, E. J. (2015). Inteins as traceless purification tags for unnatural amino acid proteins. Journal of the American Chemical Society 137(5), 1734–1737.Google Scholar

Bazewicz, C. G., Liskov, M. T., Hines, K. J. & Brewer, S. H. (2013). Sensitive, site-specific, and stable vibrational probe of local protein environments: 4-azidomethyl-L-phenylalanine. Journal of Physical Chemistry B 117(30), 8987–8993.Google Scholar

Beck, M. & Baumeister, W. (2016). Cryo-electron tomography: can it reveal the molecular sociology of cells in atomic detail? Trends in Cell Biology 26(11), 825–837.Google Scholar

Ben-Shem, A., Garreau DE Loubresse, N., Melnikov, S., Jenner, L., Yusupova, G. & Yusupov, M. (2011). The structure of the eukaryotic ribosome at 3·0 A resolution. Science 334(6062), 1524–1529.Google Scholar

Bernardes, G. J., Linderoth, L., Doores, K. J., Boutureira, O. & Davis, B. G. (2011). Site-selective traceless Staudinger ligation for glycoprotein synthesis reveals scope and limitations. ChemBioChem 12(9), 1383–1386.Google Scholar

Best, M., Degen, A., Baalmann, M., Schmidt, T. T. & Wombacher, R. (2015). Two-step protein labeling by using lipoic acid ligase with norbornene substrates and subsequent inverse-electron demand Diels–Alder reaction. ChemBioChem 16(8), 1158–1162.Google Scholar

Bianco, A., Townsley, F. M., Greiss, S., Lang, K. & Chin, J. W. (2012). Expanding the genetic code of Drosophila melanogaster . Nature Chemical Biology 8(9), 748–750.Google Scholar

Blackman, M. L., Royzen, M. & Fox, J. M. (2008). Tetrazine ligation: fast bioconjugation based on inverse-electron-demand Diels-Alder reactivity. Journal of the American Chemical Society 130(41), 13518–13519.Google Scholar

Brilot, A. F., Chen, J. Z., Cheng, A., Pan, J., Harrison, S. C., Potter, C. S., Carragher, B., Henderson, R. & Grigorieff, N. (2012). Beam-induced motion of vitrified specimen on holey carbon film. Journal of Structural Biology 177(3), 630–637.Google Scholar

Brocchieri, L. & Karlin, S. (2005). Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Research 33(10), 3390–3400.Google Scholar

Brustad, E. M., Lemke, E. A., Schultz, P. G. & Deniz, A. A. (2008). A general and efficient method for the site-specific dual-labeling of proteins for single molecule fluorescence resonance energy transfer. Journal of the American Chemical Society 130(52), 17664–17665.Google Scholar

Burz, D. S., Dutta, K., Cowburn, D. & Shekhtman, A. (2006). Mapping structural interactions using in-cell NMR spectroscopy (STINT-NMR). Nature Methods 3(2), 91–93.Google Scholar

Butterfield, S., Hejjaoui, M., Fauvet, B., Awad, L. & Lashuel, H. A. (2012). Chemical strategies for controlling protein folding and elucidating the molecular mechanisms of amyloid formation and toxicity. Journal of Molecular Biology 421(2–3), 204–236.Google Scholar

Cady, S. D., Schmidt-Rohr, K., Wang, J., Soto, C. S., Degrado, W. F. & Hong, M. (2010). Structure of the amantadine binding site of influenza M2 proton channels in lipid bilayers. Nature 463(7281), 689–692.Google Scholar

Campbell, M. G., Veesler, D., Cheng, A., Potter, C. S. & Carragher, B. (2015). 2·8 A resolution reconstruction of the thermoplasma acidophilum 20S proteasome using cryo-electron microscopy. Elife 4, e06380.Google Scholar

Carvajal-Vallejos, P., Pallisse, R., Mootz, H. D. & Schmidt, S. R. (2012). Unprecedented rates and efficiencies revealed for new natural split inteins from metagenomic sources. Journal of Biological Chemistry 287(34), 28686–28696.Google Scholar

Caspi, J., Amitai, G., Belenkiy, O. & Pietrokovski, S. (2003). Distribution of split DnaE inteins in cyanobacteria. Molecular Microbiology 50(5), 1569–1577.Google Scholar

Castaneda, C., Liu, J., Chaturvedi, A., Nowicka, U., Cropp, T. A. & Fushman, D. (2011a). Nonenzymatic assembly of natural polyubiquitin chains of any linkage composition and isotopic labeling scheme. Journal of the American Chemical Society 133(44), 17855–17868.Google Scholar

Castaneda, C. A., Spasser, L., Bavikar, S. N., Brik, A. & Fushman, D. (2011b). Segmental isotopic labeling of ubiquitin chains to unravel monomer-specific molecular behavior. Angewandte Chemie (International edition in English) 50(47), 11210–11214.Google Scholar

Cellitti, S. E., Jones, D. H., Lagpacan, L., Hao, X., Zhang, Q., Hu, H., Brittain, S. M., Brinker, A., Caldwell, J., Bursulaya, B., Spraggon, G., Brock, A., Ryu, Y., Uno, T., Schultz, P. G. & Geierstanger, B. H. (2008). In vivo incorporation of unnatural amino acids to probe structure, dynamics, and ligand binding in a large protein by nuclear magnetic resonance spectroscopy. Journal of the American Chemical Society 130(29), 9268–9281.Google Scholar

Cergol, K. M., Thompson, R. E., Malins, L. R., Turner, P. & Payne, R. J. (2014). One-pot peptide ligation-desulfurization at glutamate. Organic Letters 16(1), 290–293.Google Scholar

Chakraborty, A., Mazumder, A., Lin, M., Hasemeyer, A., Xu, Q., Wang, D., Ebright, Y. W. & Ebright, R. H. (2015). Site-specific incorporation of probes into RNA polymerase by unnatural-amino-acid mutagenesis and Staudinger–Bertozzi ligation. Methods in Molecular Biology 1276, 101–131.Google Scholar

Chalker, J. M., Bernardes, G. J., Lin, Y. A. & Davis, B. G. (2009). Chemical modification of proteins at cysteine: opportunities in chemistry and biology. Chemistry, an Asian Journal 4(5), 630–640.Google Scholar

Chalker, J. M., Lercher, L., Rose, N. R., Schofield, C. J. & Davis, B. G. (2012). Conversion of cysteine into dehydroalanine enables access to synthetic histones bearing diverse post-translational modifications. Angewandte Chemie (International edition in English) 51(8), 1835–1839.Google Scholar

Chan, A. O., Ho, C. M., Chong, H. C., Leung, Y. C., Huang, J. S., Wong, M. K. & Che, C. M. (2012). Modification of N-terminal alpha-amino groups of peptides and proteins using ketenes. Journal of the American Chemical Society 134(5), 2589–2598.CrossRef Google Scholar PubMed

Chatterjee, A., Guo, J., Lee, H. S. & Schultz, P. G. (2013a). A genetically encoded fluorescent probe in mammalian cells. Journal of the American Chemical Society 135(34), 12540–12543.Google Scholar

Chatterjee, A., Sun, S. B., Furman, J. L., Xiao, H. & Schultz, P. G. (2013b). A versatile platform for single- and multiple-unnatural amino acid mutagenesis in Escherichia coli . Biochemistry 52(10), 1828–1837.Google Scholar

Chatterjee, A., Xiao, H., Bollong, M., Ai, H. W. & Schultz, P. G. (2013c). Efficient viral delivery system for unnatural amino acid mutagenesis in mammalian cells. Proceedings of the National Academy of Sciences United States of America 110(29), 11803–11808.Google Scholar

Chatterjee, C., Mcginty, R. K., Fierz, B. & Muir, T. W. (2010). Disulfide-directed histone ubiquitylation reveals plasticity in hDot1L activation. Nature Chemical Biology 6(4), 267–269.Google Scholar

Chen, H., Viel, S., Ziarelli, F. & Peng, L. (2013). 19F NMR: a valuable tool for studying biological events. Chemical Society Reviews 42(20), 7971–7982.Google Scholar

Chen, I., Dorr, B. M. & Liu, D. R. (2011). A general strategy for the evolution of bond-forming enzymes using yeast display. Proceedings of the National Academy of Sciences United States of America 108(28), 11399–11404.Google Scholar

Chen, J., Ai, Y., Wang, J., Haracska, L. & Zhuang, Z. (2010a). Chemically ubiquitylated PCNA as a probe for eukaryotic translesion DNA synthesis. Nature Chemical Biology 6(4), 270–272.Google Scholar

Chen, J., Wan, Q., Yuan, Y., Zhu, J. L. & Danishefsky, S. J. (2008). Native chemical ligation at valine: a contribution to peptide and glycopeptide synthesis. Angewandte Chemie (International Edition in English) 47(44), 8521–8524.Google Scholar

Chen, J., Wang, P., Zhu, J. L., Wan, Q. & Danishefsky, S. J. (2010b). A program for ligation at threonine sites: application to the controlled total synthesis of glycopeptides. Tetrahedron 66(13), 2277–2283.Google Scholar

Chen, P. R., Groff, D., Guo, J., Ou, W., Cellitti, S., Geierstanger, B. H. & Schultz, P. G. (2009). A facile system for encoding unnatural amino acids in mammalian cells. Angewandte Chemie (International edition in English) 48(22), 4052–4055.Google Scholar

Chen, W. N., Kuppan, K. V., Lee, M. D., Jaudzems, K., Huber, T. & Otting, G. (2015). O-tert-Butyltyrosine, an NMR tag for high-molecular-weight systems and measurements of submicromolar ligand binding affinities. Journal of the American Chemical Society 137(13), 4581–4586.Google Scholar

Cheng, Y. (2015). Single-particle cryo-em at crystallographic resolution. Cell 161(3), 450–457.Google Scholar

Cheriyan, M., Pedamallu, C. S., Tori, K. & Perler, F. (2013). Faster protein splicing with the Nostoc punctiforme DnaE intein using non-native extein residues. Journal of Biological Chemistry 288(9), 6202–6211.Google Scholar

Chin, J. W. (2014). Expanding and reprogramming the genetic code of cells and animals. Annual Review of Biochemistry 83, 379–408.Google Scholar

Chin, J. W., Martin, A. B., King, D. S., Wang, L. & Schultz, P. G. (2002a). Addition of a photocrosslinking amino acid to the genetic code of Escherichia coli . Proceedings of the National Academy of Sciences United States of America 99(17), 11020–11024.CrossRef Google Scholar

Chin, J. W., Santoro, S. W., Martin, A. B., King, D. S., Wang, L. & Schultz, P. G. (2002b). Addition of p-azido-L-phenylalanine to the genetic code of Escherichia coli . Journal of the American Chemical Society 124(31), 9026–9027.Google Scholar

Chou, C. J., Uprety, R., Davis, L., Chin, J. W. & Deiters, A. (2011). Genetically encoding an aliphatic diazirine for protein photocrosslinking. Chemical Science 2(3), 480–483.Google Scholar

Chow, W. Y., Rajan, R., Muller, K. H., Reid, D. G., Skepper, J. N., Wong, W. C., Brooks, R. A., Green, M., Bihan, D., Farndale, R. W., Slatter, D. A., Shanahan, C. M. & Duer, M. J. (2014). NMR spectroscopy of native and in vitro tissues implicates polyADP ribose in biomineralization. Science 344(6185), 742–746.Google Scholar

Cohen, J. D., Zou, P. & Ting, A. Y. (2012). Site-specific protein modification using lipoic acid ligase and bis-aryl hydrazone formation. ChemBioChem 13(6), 888–894.Google Scholar

Cohen, S. & Arbely, E. (2016). Single-plasmid-based system for efficient noncanonical amino acid mutagenesis in cultured mammalian cells. ChemBioChem 17(11), 1008–1011.Google Scholar

Cornish, V. W., Benson, D. R., Altenbach, C. A., Hideg, K., Hubbell, W. L. & Schultz, P. G. (1994). Site-specific incorporation of biophysical probes into proteins. Proceedings of the National Academy of Sciences United States of America 91(8), 2910–2914.Google Scholar

Courter, J. R., Abdo, M., Brown, S. P., Tucker, M. J., Hochstrasser, R. M. & Smith, A. B. III (2014). The design and synthesis of alanine-rich alpha-helical peptides constrained by an S,S-tetrazine photochemical trigger: a fragment union approach. Journal of Organic Chemistry 79(2), 759–768.Google Scholar

Cramer, P. (2016). Structure determination of transient transcription complexes. Biochemical Society Transactions 44(4), 1177–1182.Google Scholar

Crich, D. & Banerjee, A. (2007). Native chemical ligation at phenylalanine. Journal of the American Chemical Society 129(33), 10064.Google Scholar

Cronin, C. N., Lim, K. B. & Rogers, J. (2007). Production of selenomethionyl-derivatized proteins in baculovirus-infected insect cells. Protein Science 16(9), 2023–2029.Google Scholar

Cui, G., Park, S., Badeaux, A. I., Kim, D., Lee, J., Thompson, J. R., Yan, F., Kaneko, S., Yuan, Z., Botuyan, M. V., Bedford, M. T., Cheng, J. Q. & Mer, G. (2012). PHF20 is an effector protein of p53 double lysine methylation that stabilizes and activates p53. Nature Structural & Molecular Biology 19(9), 916–924.Google Scholar

David, Y., Vila-Perello, M., Verma, S. & Muir, T. W. (2015). Chemical tagging and customizing of cellular chromatin states using ultrafast trans-splicing inteins. Nature Chemistry 7(5), 394–402.Google Scholar

Davis, C. M., Cooper, A. K. & Dyer, R. B. (2015). Fast helix formation in the B domain of protein A revealed by site-specific infrared probes. Biochemistry 54(9), 1758–1766.Google Scholar

Dawson, P. E. (2011). Native chemical ligation combined with desulfurization and deselenization: a general strategy for chemical protein synthesis. Israel Journal of Chemistry 51(8–9), 862–867.CrossRef Google Scholar

Dawson, P. E. & Kent, S. B. (2000). Synthesis of native proteins by chemical ligation. Annual Review of Biochemistry 69, 923–960.Google Scholar

Dawson, P. E., Muir, T. W., Clark-Lewis, I. & Kent, S. B. (1994). Synthesis of proteins by native chemical ligation. Science 266(5186), 776–779.Google Scholar

Debelouchina, G. T., Bayro, M. J., Fitzpatrick, A. W., Ladizhansky, V., Colvin, M. T., Caporini, M. A., Jaroniec, C. P., Bajaj, V. S., Rosay, M., Macphee, C. E., Vendruscolo, M., Maas, W. E., Dobson, C. M. & Griffin, R. G. (2013). Higher order amyloid fibril structure by MAS NMR and DNP spectroscopy. Journal of the American Chemical Society 135(51), 19237–19247.Google Scholar

Debelouchina, G. T., Gerecht, K. & Muir, T. W. (2017). Ubiquitin utilizes an acidic surface patch to alter chromatin structure. Nature Chemical Biology 13(1), 105–110.Google Scholar

De Rosier, D. J. & Klug, A. (1968). Reconstruction of three dimensional structures from electron micrographs. Nature 217(5124), 130–134.Google Scholar

Devaraj, N. K., Weissleder, R. & Hilderbrand, S. A. (2008). Tetrazine-based cycloadditions: application to pretargeted live cell imaging. Bioconjugate Chemistry 19(12), 2297–2299.Google Scholar

Dhayalan, B., Fitzpatrick, A., Mandal, K., Whittaker, J., Weiss, M. A., Tokmakoff, A. & Kent, S. B. (2016). Efficient total chemical synthesis of (13) C=(18) O isotopomers of human insulin for isotope-edited FTIR. ChemBioChem 17(5), 415–420.Google Scholar

Dimura, M., Peulen, T. O., Hanke, C. A., Prakash, A., Gohlke, H. & Seidel, C. A. (2016). Quantitative FRET studies and integrative modeling unravel the structure and dynamics of biomolecular systems. Current Opinion in Structural Biology 40, 163–185.Google Scholar

Dirksen, A. & Dawson, P. E. (2008). Rapid oxime and hydrazone ligations with aromatic aldehydes for biomolecular labeling. Bioconjugate Chemistry 19(12), 2543–2548.Google Scholar

Dommerholt, J., Schmidt, S., Temming, R., Hendriks, L. J., Rutjes, F. P., Van Hest, J. C., Lefeber, D. J., Friedl, P. & Van Delft, F. L. (2010). Readily accessible bicyclononynes for bioorthogonal labeling and three-dimensional imaging of living cells. Angewandte Chemie (International edition in English) 49(49), 9422–9425.Google Scholar

Dorr, B. M., Ham, H. O., An, C., Chaikof, E. L. & Liu, D. R. (2014). Reprogramming the specificity of sortase enzymes. Proceedings of the National Academy of Sciences United States of America 111(37), 13343–13348.CrossRef Google Scholar PubMed

El Oualid, F., Merkx, R., Ekkebus, R., Hameed, D. S., Smit, J. J., De Jong, A., Hilkmann, H., Sixma, T. K. & Ovaa, H. (2010). Chemical synthesis of ubiquitin, ubiquitin-based probes, and diubiquitin. Angewandte Chemie (International edition in English) 49(52), 10149–10153.Google Scholar

Elsasser, S. J., Ernst, R. J., Walker, O. S. & Chin, J. W. (2016). Genetic code expansion in stable cell lines enables encoded chromatin modification. Nature Methods 13(2), 158–164.Google Scholar

Elsohly, A. M. & Francis, M. B. (2015). Development of oxidative coupling strategies for site-selective protein modification. Accounts of Chemical Research 48(7), 1971–1978.Google Scholar

England, P. M., Zhang, Y., Dougherty, D. A. & Lester, H. A. (1999). Backbone mutations in transmembrane domains of a ligand-gated ion channel: implications for the mechanism of gating. Cell 96(1), 89–98.Google Scholar

Ernst, R. J., Krogager, T. P., Maywood, E. S., Zanchi, R., Beranek, V., Elliott, T. S., Barry, N. P., Hastings, M. H. & Chin, J. W. (2016). Genetic code expansion in the mouse brain. Nature Chemical Biology 12(10), 776–778.Google Scholar

Evans, E. G. & Millhauser, G. L. (2015). Genetic incorporation of the unnatural amino acid p-acetyl phenylalanine into proteins for site-directed spin labeling. Methods in Enzymology 563, 503–527.Google Scholar

Fan, C., Ip, K. & Soll, D. (2016). Expanding the genetic code of Escherichia coli with phosphotyrosine. FEBS Letters 590(17), 3040–3047.Google Scholar

Fawzi, N. L., Ying, J., Ghirlando, R., Torchia, D. A. & Clore, G. M. (2011). Atomic-resolution dynamics on the surface of amyloid-beta protofibrils probed by solution NMR. Nature 480(7376), 268–272.Google Scholar

Fernandez, I. S., Bai, X. C., Hussain, T., Kelley, A. C., Lorsch, J. R., Ramakrishnan, V. & Scheres, S. H. (2013). Molecular architecture of a eukaryotic translational initiation complex. Science 342(6160), 1240585.Google Scholar

Ficht, S., Payne, R. J., Brik, A. & Wong, C. H. (2007). Second-generation sugar-assisted ligation: a method for the synthesis of cysteine-containing glycopeptides. Angewandte Chemie (International edition in English) 46(31), 5975–5979.Google Scholar

Fitzpatrick, A. W., Debelouchina, G. T., Bayro, M. J., Clare, D. K., Caporini, M. A., Bajaj, V. S., Jaroniec, C. P., Wang, L., Ladizhansky, V., Muller, S. A., Macphee, C. E., Waudby, C. A., Mott, H. R., De Simone, A., Knowles, T. P., Saibil, H. R., Vendruscolo, M., Orlova, E. V., Griffin, R. G. & Dobson, C. M. (2013). Atomic structure and hierarchical assembly of a cross-beta amyloid fibril. Proceedings of the National Academy of Sciences United States of America 110(14), 5468–5473.Google Scholar

Fleissner, M. R., Brustad, E. M., Kalai, T., Altenbach, C., Cascio, D., Peters, F. B., Hideg, K., Peuker, S., Schultz, P. G. & hubbell, W. L. (2009). Site-directed spin labeling of a genetically encoded unnatural amino acid. Proceedings of the National Academy of Sciences United States of America 106(51), 21637–21642.Google Scholar

Frederick, K. K., Michaelis, V. K., Corzilius, B., Ong, T. C., Jacavone, A. C., Griffin, R. G. & Lindquist, S. (2015). Sensitivity-enhanced NMR reveals alterations in protein structure by cellular milieus. Cell 163(3), 620–628.CrossRef Google Scholar PubMed

Freedberg, D. I. & Selenko, P. (2014). Live cell NMR. Annual Review of Biophysics 43, 171–192.Google Scholar

Freiburger, L., Sonntag, M., Hennig, J., Li, J., Zou, P. & Sattler, M. (2015). Efficient segmental isotope labeling of multi-domain proteins using Sortase A. Journal of Biomolecular NMR 63(1), 1–8.Google Scholar

Frutos, S., Goger, M., Giovani, B., Cowburn, D. & Muir, T. W. (2010). Branched intermediate formation stimulates peptide bond cleavage in protein splicing. Nature Chemical Biology 6(7), 527–533.Google Scholar

Gamblin, D. P., Van Kasteren, S., Bernardes, G. J., Chalker, J. M., Oldham, N. J., Fairbanks, A. J. & Davis, B. G. (2008). Chemical site-selective prenylation of proteins. Molecular BioSystems 4(6), 558–561.Google Scholar

Gattner, M. J., Vrabel, M. & Carell, T. (2013). Synthesis of epsilon-N-propionyl-, epsilon-N-butyryl-, and epsilon-N-crotonyl-lysine containing histone H3 using the pyrrolysine system. Chemical Communications (Cambridge) 49(4), 379–381.Google Scholar

George, S., Aguirre, J. D., Spratt, D. E., Bi, Y., Jeffery, M., Shaw, G. S. & O'donoghue, P. (2016). Generation of phospho-ubiquitin variants by orthogonal translation reveals codon skipping. FEBS Letters 590(10), 1530–1542.Google Scholar

Gilles, M. A., Hudson, A. Q. & Borders, C. L. Jr. (1990). Stability of water-soluble carbodiimides in aqueous solution. Analytical Biochemistry 184(2), 244–248.Google Scholar

Glasgow, J. E., Salit, M. L. & Cochran, J. R. (2016). In vivo site-specific protein tagging with diverse amines using an engineered sortase variant. Journal of the American Chemical Society 138(24), 7496–7499.Google Scholar

Goto, Y., Katoh, T. & SUGA, H. (2011). Flexizymes for genetic code reprogramming. Nature Protocols 6(6), 779–790.Google Scholar

Grayson, E. J., Ward, S. J., Hall, A. L., Rendle, P. M., Gamblin, D. P., Batsanov, A. S. & Davis, B. G. (2005). Glycosyl disulfides: novel glycosylating reagents with flexible aglycon alteration. Journal of Organic Chemistry 70(24), 9740–9754.Google Scholar

Greiss, S. & Chin, J. W. (2011). Expanding the genetic code of an animal. Journal of the American Chemical Society 133(36), 14196–14199.Google Scholar

Griffin, B. A., Adams, S. R. & Tsien, R. Y. (1998). Specific covalent labeling of recombinant protein molecules inside live cells. Science 281(5374), 269–272.CrossRef Google Scholar PubMed

Grosse, W., Essen, L. O. & Koert, U. (2011). Strategies and perspectives in ion-channel engineering. ChemBioChem 12(6), 830–839.Google Scholar

Guan, D., Ramirez, M. & Chen, Z. (2013). Split intein mediated ultra-rapid purification of tagless protein (SIRP). Biotechnology and Bioengineering 110(9), 2471–2481.Google Scholar

Guimaraes, C. P., Witte, M. D., Theile, C. S., Bozkurt, G., Kundrat, L., Blom, A. E. & Ploegh, H. L. (2013). Site-specific C-terminal and internal loop labeling of proteins using sortase-mediated reactions. Nature Protocols 8(9), 1787–1799.Google Scholar

Haase, C., Rohde, H. & Seitz, O. (2008). Native chemical ligation at valine. Angewandte Chemie (International edition in English) 47(36), 6807–6810.Google Scholar

Hancock, S. M., Uprety, R., Deiters, A. & Chin, J. W. (2010). Expanding the genetic code of yeast for incorporation of diverse unnatural amino acids via a pyrrolysyl-tRNA synthetase/tRNA pair. Journal of the American Chemical Society 132(42), 14819–14824.Google Scholar

Haney, C. M., Wissner, R. F. & Petersson, E. J. (2015). Multiply labeling proteins for studies of folding and stability. Current Opinion in Chemical Biology 28, 123–130.Google Scholar

Haney, C. M., Wissner, R. F., Warner, J. B., Wang, Y. J., Ferrie, J. J., Covell, D. J., Karpowicz, R. J., LEE, V. M. & Petersson, E. J. (2016). Comparison of strategies for non-perturbing labeling of alpha-synuclein to study amyloidogenesis. Organic & Biomolecular 14(5), 1584–1592.Google Scholar

Harmand, T. J., Murar, C. E. & Bode, J. W. (2014). New chemistries for chemoselective peptide ligations and the total synthesis of proteins. Current Opinion in Chemical Biology 22, 115–121.Google Scholar

Harpaz, Z., Siman, P., Kumar, K. S. A. & Brik, A. (2010). Protein synthesis assisted by native chemical ligation at leucine. ChemBioChem 11(9), 1232–1235.Google Scholar

Hauke, S., Best, M., Schmidt, T. T., Baalmann, M., Krause, A. & Wombacher, R. (2014). Two-step protein labeling utilizing lipoic acid ligase and Sonogashira cross-coupling. Bioconjugate Chemistry 25(9), 1632–1637.Google Scholar

He, Y., Fang, J., Taatjes, D. J. & Nogales, E. (2013). Structural visualization of key steps in human transcription initiation. Nature 495(7442), 481–486.Google Scholar

Hecht, S. M., Alford, B. L., Kuroda, Y. & Kitano, S. (1978). ‘Chemical aminoacylation’ of tRNA's. Journal of Biological Chemistry 253(13), 4517–4520.Google Scholar

Hejjaoui, M., Butterfield, S., Fauvet, B., Vercruysse, F., Cui, J., Dikiy, I., Prudent, M., Olschewski, D., ZHANG, Y., Eliezer, D. & Lashuel, H. A. (2012). Elucidating the role of C-terminal post-translational modifications using protein semisynthesis strategies: alpha-synuclein phosphorylation at tyrosine 125. Journal of the American Chemical Society 134(11), 5196–5210.Google Scholar

Hendrickson, W. A., Horton, J. R. & Lemaster, D. M. (1990). Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure. EMBO Journal 9(5), 1665–1672.Google Scholar

Hirakawa, H., Ishikawa, S. & Nagamune, T. (2015). Ca²⁺ -independent sortase-A exhibits high selective protein ligation activity in the cytoplasm of Escherichia coli . Biotechnol Journal 10(9), 1487–1492.Google Scholar

Hirata, R., Ohsumk, Y., Nakano, A., Kawasaki, H., Suzuki, K. & Anraku, Y. (1990). Molecular structure of a gene, VMA1, encoding the catalytic subunit of H(+)-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae . Journal of Biological Chemistry 265(12), 6726–6733.Google Scholar

Hoffmann, C., Gaietta, G., Bunemann, M., Adams, S. R., Oberdorff-Maass, S., Behr, B., Vilardaga, J. P., Tsien, R. Y., Ellisman, M. H. & Lohse, M. J. (2005). A FlAsH-based FRET approach to determine G protein-coupled receptor activation in living cells. Nature Methods 2(3), 171–176.Google Scholar

Holding, A. N. (2015). XL-MS: protein cross-linking coupled with mass spectrometry. Methods 89, 54–63.Google Scholar

Hondal, R. J., Nilsson, B. L. & Raines, R. T. (2001). Selenocysteine in native chemical ligation and expressed protein ligation. Journal of the American Chemical Society 123(21), 5140–5141.Google Scholar

Hu, Q. Y., Berti, F. & Adamo, R. (2016). Towards the next generation of biomedicines by site-selective conjugation. Chemical Society Reviews 45(6), 1691–1719.Google Scholar

Huber, T., Naganathan, S., Tian, H., Ye, S. & Sakmar, T. P. (2013). Unnatural amino acid mutagenesis of GPCRs using amber codon suppression and bioorthogonal labeling. Methods in Enzymology 520, 281–305.Google Scholar

Huguenin-Dezot, N., De Cesare, V., Peltier, J., Knebel, A., Kristaryianto, Y. A., Rogerson, D. T., Kulathu, Y., Trost, M. & Chin, J. W. (2016). Synthesis of isomeric phosphoubiquitin chains reveals that phosphorylation controls deubiquitinase activity and specificity. Cell Reports 16(4), 1180–1193.Google Scholar

Husada, F., Gouridis, G., Vietrov, R., Schuurman-Wolters, G. K., Ploetz, E., De Boer, M., Poolman, B. & Cordes, T. (2015). Watching conformational dynamics of ABC transporters with single-molecule tools. Biochemical Society Transactions 43(5), 1041–1047.Google Scholar

Hyman, A. A., Weber, C. A. & Julicher, F. (2014). Liquid-liquid phase separation in biology. Annual Review of Cell and Developmental Biology 30, 39–58.Google Scholar

Igumenova, T. I., Mcdermott, A. E., Zilm, K. W., Martin, R. W., Paulson, E. K. & Wand, A. J. (2004). Assignments of carbon NMR resonances for microcrystalline ubiquitin. Journal of the American Chemical Society 126(21), 6720–6727.Google Scholar

Inomata, K., Ohno, A., Tochio, H., Isogai, S., Tenno, T., Nakase, I., Takeuchi, T., Futaki, S., Ito, Y., Hiroaki, H. & Shirakawa, M. (2009). High-resolution multi-dimensional NMR spectroscopy of proteins in human cells. Nature 458(7234), 106–109.Google Scholar

Iwai, H., zuger, S., Jin, J. & Tam, P. H. (2006). Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme . FEBS Letters 580(7), 1853–1858.Google Scholar

Jackson, J. C., Hammill, J. T. & Mehl, R. A. (2007). Site-specific incorporation of a (19)F-amino acid into proteins as an NMR probe for characterizing protein structure and reactivity. Journal of the American Chemical Society 129(5), 1160–1166.Google Scholar

Jentoft, N. & Dearborn, D. G. (1979). Labeling of proteins by reductive methylation using sodium cyanoborohydride. Journal of Biological Chemistry 254(11), 4359–4365.Google Scholar

Johnson, D. B., Xu, J., Shen, Z., Takimoto, J. K., Schultz, M. D., Schmitz, R. J., Xiang, Z., Ecker, J. R., Briggs, S. P. & Wang, L. (2011). RF1 knockout allows ribosomal incorporation of unnatural amino acids at multiple sites. Nature Chemical Biology 7(11), 779–786.Google Scholar

Jones, D. H., Cellitti, S. E., Hao, X., Zhang, Q., Jahnz, M., Summerer, D., Schultz, P. G., Uno, T. & Geierstanger, B. H. (2010). Site-specific labeling of proteins with NMR-active unnatural amino acids. Journal of Biomolecular NMR 46(1), 89–100.Google Scholar

Joshi, N. S., Whitaker, L. R. & Francis, M. B. (2004). A three-component Mannich-type reaction for selective tyrosine bioconjugation. Journal of the American Chemical Society 126(49), 15942–15943.Google Scholar

Judice, J. K., Gamble, T. R., Murphy, E. C., De Vos, A. M. & Schultz, P. G. (1993). Probing the mechanism of staphylococcal nuclease with unnatural amino acids: kinetic and structural studies. Science 261(5128), 1578–1581.Google Scholar

Kalkhof, S. & Sinz, A. (2008). Chances and pitfalls of chemical cross-linking with amine-reactive N-hydroxysuccinimide esters. Analytical and Bioanalytical Chemistry 392(1–2), 305–312.Google Scholar

Kane, P. M., Yamashiro, C. T., Wolczyk, D. F., Neff, N., Goebl, M. & Stevens, T. H. (1990). Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H(+)-adenosine triphosphatase. Science 250(4981), 651–657.Google Scholar

Kang, J. Y., Kawaguchi, D., Coin, I., Xiang, Z., O'leary, D. D., Slesinger, P. A. & Wang, L. (2013). In vivo expression of a light-activatable potassium channel using unnatural amino acids. Neuron 80(2), 358–370.Google Scholar

Kendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R. G., Wyckoff, H. & Phillips, D. C. (1958). A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181(4610), 662–666.Google Scholar

Kent, S., Sohma, Y., Liu, S., Bang, D., Pentelute, B. & Mandal, K. (2012). Through the looking glass--a new world of proteins enabled by chemical synthesis. Journal of Peptide Science 18(7), 428–436.Google Scholar

Kim, C. H., Kang, M., Kim, H. J., Chatterjee, A. & Schultz, P. G. (2012). Site-specific incorporation of epsilon-N-crotonyllysine into histones. Angewandte Chemie (International edition in English) 51(29), 7246–7249.Google Scholar

Kobashigawa, Y., Kumeta, H., Ogura, K. & Inagaki, F. (2009). Attachment of an NMR-invisible solubility enhancement tag using a sortase-mediated protein ligation method. Journal of Biomolecular NMR 43(3), 145–150.Google Scholar

Koehler, C., Sauter, P. F., Wawryszyn, M., Girona, G. E., Gupta, K., Landry, J. J., Fritz, M. H., Radic, K., Hoffmann, J. E., Chen, Z. A., Zou, J., Tan, P. S., Galik, B., Junttila, S., Stolt-Bergner, P., Pruneri, G., GYENESEI, A., Schultz, C., Biskup, M. B., Besir, H., Benes, V., Rappsilber, J., Jechlinger, M., Korbel, J. O., Berger, I., Braese, S. & Lemke, E. A. (2016). Genetic code expansion for multiprotein complex engineering. Nature Methods 13, 997–1000.Google Scholar

Koh, J. T., Cornish, V. W. & Schultz, P. G. (1997). An experimental approach to evaluating the role of backbone interactions in proteins using unnatural amino acid mutagenesis. Biochemistry 36(38), 11314–11322.Google Scholar

Kohrer, C., Yoo, J. H., Bennett, M., Schaack, J. & Rajbhandary, U. L. (2003). A possible approach to site-specific insertion of two different unnatural amino acids into proteins in mammalian cells via nonsense suppression. Chemistry & Biology 10(11), 1095–1102.Google Scholar

Kolb, H. C., Finn, M. G. & Sharpless, K. B. (2001). Click chemistry: diverse chemical function from a few good reactions. Angewandte Chemie (International edition in English) 40(11), 2004–2021.Google Scholar

Kucher, S., Korneev, S., Tyagi, S., Apfelbaum, R., Grohmann, D., Lemke, E. A., Klare, J. P., Steinhoff, H. J. & Klose, D. (2016). Orthogonal spin labeling using click chemistry for in vitro and in vivo applications. Journal of Magnetic Resonance 275, 38–45.Google Scholar

Kuhlmann, N., Wroblowski, S., Knyphausen, P., DE Boor, S., Brenig, J., Zienert, A. Y., Meyer-Teschendorf, K., Praefcke, G. J., Nolte, H., Kruger, M., SCHACHERL, M., Baumann, U., James, L. C., Chin, J. W. & Lammers, M. (2016). Structural and mechanistic insights into the regulation of the fundamental rho regulator rhogdialpha by lysine acetylation. Journal of Biological Chemistry 291(11), 5484–5499.Google Scholar

Kuhn, S. M., Rubini, M., Muller, M. A. & Skerra, A. (2011). Biosynthesis of a fluorescent protein with extreme pseudo-Stokes shift by introducing a genetically encoded non-natural amino acid outside the fluorophore. Journal of the American Chemical Society 133(11), 3708–3711.Google Scholar

Kumar, K. S. A., Haj-Yahya, M., Olschewski, D., Lashuel, H. A. & Brik, A. (2009). Highly efficient and chemoselective peptide ubiquitylation. Angewandte Chemie (International edition in English) 48(43), 8090–8094.Google Scholar

Kwon, B., Tietze, D., White, P. B., Liao, S. Y. & Hong, M. (2015). Chemical ligation of the influenza M2 protein for solid-state NMR characterization of the cytoplasmic domain. Protein Science 24(7), 1087–1099.Google Scholar

Kwon, I., Wang, P. & Tirrell, D. A. (2006). Design of a bacterial host for site-specific incorporation of p-bromophenylalanine into recombinant proteins. Journal of the American Chemical Society 128(36), 11778–11783.Google Scholar

Lajoie, M. J., Rovner, A. J., Goodman, D. B., Aerni, H. R., Haimovich, A. D., Kuznetsov, G., Mercer, J. A., Wang, H. H., Carr, P. A., Mosberg, J. A., Rohland, N., Schultz, P. G., Jacobson, J. M., Rinehart, J., Church, G. M. & Isaacs, F. J. (2013). Genomically recoded organisms expand biological functions. Science 342(6156), 357–360.Google Scholar

Lammers, M., Neumann, H., Chin, J. W. & James, L. C. (2010). Acetylation regulates cyclophilin A catalysis, immunosuppression and HIV isomerization. Nature Chemical Biology 6(5), 331–337.Google Scholar

Lang, K. & Chin, J. W. (2014). Cellular incorporation of unnatural amino acids and bioorthogonal labeling of proteins. Chem Rev 114(9), 4764–4806.Google Scholar

Lang, K., Davis, L., torres-Kolbus, J., Chou, C., Deiters, A. & Chin, J. W. (2012a). Genetically encoded norbornene directs site-specific cellular protein labelling via a rapid bioorthogonal reaction. Nature Chemistry 4(4), 298–304.Google Scholar

Lang, K., Davis, L., Wallace, S., Mahesh, M., Cox, D. J., Blackman, M. L., Fox, J. M. & Chin, J. W. (2012b). Genetic encoding of bicyclononynes and trans-cyclooctenes for site-specific protein labeling in vitro and in live mammalian cells via rapid fluorogenic Diels-Alder reactions. Journal of the American Chemical Society 134(25), 10317–10320.Google Scholar

Laughlin, S. T., Baskin, J. M., Amacher, S. L. & Bertozzi, C. R. (2008). In vivo imaging of membrane-associated glycans in developing zebrafish. Science 320(5876), 664–667.Google Scholar

Lavergne, T., Lamichhane, R., Malyshev, D. A., Li, Z., Li, L., Sperling, E., Williamson, J. R., Millar, D. P. & Romesberg, F. E. (2016). FRET characterization of complex conformational changes in a large 16S ribosomal RNA fragment site-specifically labeled using unnatural base pairs. ACS Chemical Biology 11(5), 1347–1353.Google Scholar

Lee, H. S., Guo, J., Lemke, E. A., Dimla, R. D. & Schultz, P. G. (2009a). Genetic incorporation of a small, environmentally sensitive, fluorescent probe into proteins in Saccharomyces cerevisiae . Journal of the American Chemical Society 131(36), 12921–12923.Google Scholar

Lee, H. S., Spraggon, G., Schultz, P. G. & Wang, F. (2009b). Genetic incorporation of a metal-ion chelating amino acid into proteins as a biophysical probe. Journal of the American Chemical Society 131(7), 2481–2483.Google Scholar

Leitner, A., Faini, M., Stengel, F. & Aebersold, R. (2016). Crosslinking and mass spectrometry: an integrated technology to understand the structure and function of molecular machines. Trends in Biochemical Sciences 41(1), 20–32.Google Scholar

Lemieux, G. A., De Graffenried, C. L. & Bertozzi, C. R. (2003). A fluorogenic dye activated by the staudinger ligation. Journal of the American Chemical Society 125(16), 4708–4709.Google Scholar

Lennard, K. R. & Tavassoli, A. (2014). Peptides come round: using SICLOPPS libraries for early stage drug discovery. Chemistry 20(34), 10608–10614.Google Scholar

Le Sueur, A. L., Horness, R. E. & Thielges, M. C. (2015). Applications of two-dimensional infrared spectroscopy. Analyst 140(13), 4336–4349.Google Scholar

Li, D., Pye, V. E. & Caffrey, M. (2015). Experimental phasing for structure determination using membrane-protein crystals grown by the lipid cubic phase method. Acta Crystallographica D, Biological Crystallography 71(Pt 1), 104–122.Google Scholar

Li, F., Shi, P., Li, J., Yang, F., Wang, T., Zhang, W., Gao, F., Ding, W., Li, D., Li, J., Xiong, Y., Sun, J., Gong, W., Tian, C. & Wang, J. (2013a). A genetically encoded 19F NMR probe for tyrosine phosphorylation. Angewandte Chemie (International edition in English) 52(14), 3958–3962.Google Scholar

Li, F., Zhang, H., Sun, Y., Pan, Y., Zhou, J. & Wang, J. (2013b). Expanding the genetic code for photoclick chemistry in E. coli, mammalian cells, and A. thaliana . Angewandte Chemie (International edition in English) 52(37), 9700–9704.Google Scholar

Lin, S., Yang, X., Jia, S., Weeks, A. M., Hornsby, M., LEE, P. S., Nichiporuk, R. V., Iavarone, A. T., Wells, J. A., Toste, F. D. & Chang, C. J. (2017). Redox-based reagents for chemoselective methionine bioconjugation. Science 355(6325), 597–602.Google Scholar

Link, A. J., Vink, M. K. & Tirrell, D. A. (2004). Presentation and detection of azide functionality in bacterial cell surface proteins. Journal of the American Chemical Society 126(34), 10598–10602.Google Scholar

Liu, C. C., Brustad, E., LIU, W. & Schultz, P. G. (2007). Crystal structure of a biosynthetic sulfo-hirudin complexed to thrombin. Journal of the American Chemical Society 129(35), 10648–10649.Google Scholar

Liu, C. C. & Schultz, P. G. (2010). Adding new chemistries to the genetic code. Annual Review of Biochemistry 79, 413–444.Google Scholar

Liu, D. S., Tangpeerachaikul, A., Selvaraj, R., Taylor, M. T., Fox, J. M. & Ting, A. Y. (2012). Diels-Alder cycloaddition for fluorophore targeting to specific proteins inside living cells. Journal of the American Chemical Society 134(2), 792–795.Google Scholar

Liu, F., Luo, E. Y., Flora, D. B. & Mezo, A. R. (2014a). Irreversible sortase A-mediated ligation driven by diketopiperazine formation. Journal of Organic Chemistry 79(2), 487–492.Google Scholar

Liu, Z., Frutos, S., Bick, M. J., Vila-Perello, M., Debelouchina, G. T., Darst, S. A. & Muir, T. W. (2014b). Structure of the branched intermediate in protein splicing. Proceedings of the National Academy of Sciences United States of America 111(23), 8422–8427.Google Scholar

Loquet, A., Sgourakis, N. G., Gupta, R., Giller, K., Riedel, D., Goosmann, C., Griesinger, C., Kolbe, M., Baker, D., Becker, S. & Lange, A. (2012). Atomic model of the type III secretion system needle. Nature 486(7402), 276–279.Google Scholar

Loscha, K. V., Herlt, A. J., Qi, R., Huber, T., Ozawa, K. & Otting, G. (2012). Multiple-site labeling of proteins with unnatural amino acids. Angewandte Chemie (International edition in English) 51(9), 2243–2246.Google Scholar

Lossl, P., Van De Waterbeemd, M. & Heck, A. J. (2016). The diverse and expanding role of mass spectrometry in structural and molecular biology. EMBO Journal 35(24), 2634–2657.Google Scholar

Lu, J. X., Qiang, W., Yau, W. M., Schwieters, C. D., Meredith, S. C. & Tycko, R. (2013). Molecular structure of beta-amyloid fibrils in Alzheimer's disease brain tissue. Cell 154(6), 1257–1268.Google Scholar

Lu, W., Randal, M., Kossiakoff, A. & Kent, S. B. (1999). Probing intermolecular backbone H-bonding in serine proteinase-protein inhibitor complexes. Chemistry & Biology 6(7), 419–427.Google Scholar

Lu, X., Simon, M. D., Chodaparambil, J. V., Hansen, J. C., Shokat, K. M. & Luger, K. (2008). The effect of H3K79 dimethylation and H4K20 trimethylation on nucleosome and chromatin structure. Nature Structural & Molecular Biology 15(10), 1122–1124.Google Scholar

Ludwig, C., Pfeiff, M., Linne, U. & Mootz, H. D. (2006). Ligation of a synthetic peptide to the N terminus of a recombinant protein using semisynthetic protein trans-splicing. Angewandte Chemie (International edition in English) 45(31), 5218–5221.Google Scholar

Luger, K., Mader, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. (1997). Crystal structure of the nucleosome core particle at 2·8 A resolution. Nature 389(6648), 251–260.Google Scholar

Lyon, R. P., Setter, J. R., Bovee, T. D., Doronina, S. O., Hunter, J. H., Anderson, M. E., Balasubramanian, C. L., Duniho, S. M., Leiske, C. I., Li, F. & Senter, P. D. (2014). Self-hydrolyzing maleimides improve the stability and pharmacological properties of antibody-drug conjugates. Nature Biotechnology 32(10), 1059–1062.Google Scholar

Macdonald, J. I., Munch, H. K., Moore, T. & Francis, M. B. (2015). One-step site-specific modification of native proteins with 2-pyridinecarboxyaldehydes. Nature Chemical Biology 11(5), 326–331.Google Scholar

Macmillan, D., Bill, R. M., Sage, K. A., Fern, D. & Flitsch, S. L. (2001). Selective in vitro glycosylation of recombinant proteins: semi-synthesis of novel homogeneous glycoforms of human erythropoietin. Chemistry & Biology 8(2), 133–145.Google Scholar

Malins, L. R., Cergol, K. M. & Payne, R. J. (2013). Peptide ligation-desulfurization chemistry at Arginine. ChemBioChem 14(5), 559–563.Google Scholar

Malins, L. R., Cergol, K. M. & Payne, R. J. (2014). Chemoselective sulfenylation and peptide ligation at tryptophan. Chemical Science 5(1), 260–266.Google Scholar

Malins, L. R. & Payne, R. J. (2015). Modern extensions of native chemical ligation for chemical protein synthesis. Topics in Current Chemistry 362, 27–87.Google Scholar

Mallagaray, A., Dominguez, G., Peters, T. & Perez-Castells, J. (2016). A rigid lanthanide binding tag to aid NMR studies of a 70 kDa homodimeric coat protein of human norovirus. Chemical Communications (Cambridge) 52(3), 601–604.Google Scholar

Maly, T., Debelouchina, G. T., Bajaj, V. S., Hu, K. N., Joo, C. G., Mak-Jurkauskas, M. L., Sirigiri, J. R., Van Der Wel, P. C., Herzfeld, J., Temkin, R. J. & Griffin, R. G. (2008). Dynamic nuclear polarization at high magnetic fields. Journal of Chemical Physics 128(5), 052211.Google Scholar

Malyshev, D. A., Dhami, K., Lavergne, T., Chen, T., Dai, N., Foster, J. M., Correa, I. R. Jr. & Romesberg, F. E. (2014). A semi-synthetic organism with an expanded genetic alphabet. Nature 509(7500), 385–388.CrossRef Google Scholar PubMed

Mandal, K., Uppalapati, M., Ault-Riche, D., Kenney, J., Lowitz, J., Sidhu, S. S. & Kent, S. B. (2012). Chemical synthesis and X-ray structure of a heterochiral {D-protein antagonist plus vascular endothelial growth factor} protein complex by racemic crystallography. Proceedings of the National Academy of Sciences United States of America 109(37), 14779–14784.Google Scholar

Mandlik, A., Swierczynski, A., Das, A. & Ton-That, H. (2008). Pili in Gram-positive bacteria: assembly, involvement in colonization and biofilm development. Trends in Microbiology 16(1), 33–40.Google Scholar

Mao, H., Hart, S. A., Schink, A. & Pollok, B. A. (2004). Sortase-mediated protein ligation: a new method for protein engineering. Journal of the American Chemical Society 126(9), 2670–2671.Google Scholar

Marecek, J., Song, B., Brewer, S., Belyea, J., Dyer, R. B. & Raleigh, D. P. (2007). A simple and economical method for the production of 13C,18O-labeled Fmoc-amino acids with high levels of enrichment: applications to isotope-edited IR studies of proteins. Organic Letters 9(24), 4935–4937.Google Scholar

Marsh, E. N. & Suzuki, Y. (2014). Using (19)F NMR to probe biological interactions of proteins and peptides. ACS Chemical Biology 9(6), 1242–1250.Google Scholar

Martinez, C., De Geus, P., Stanssens, P., Lauwereys, M. & Cambillau, C. (1993). Engineering cysteine mutants to obtain crystallographic phases with a cutinase from Fusarium solani pisi . Protein Engineering 6(2), 157–165.Google Scholar

Matsumoto, T., Furuta, K., Tanaka, T. & Kondo, A. (2016). Sortase A-mediated metabolic enzyme ligation in Escherichia coli . ACS Synthetic Biology 5(11), 1284–1289.Google Scholar

Matthies, D., Dalmas, O., Borgnia, M. J., Dominik, P. K., Merk, A., Rao, P., Reddy, B. G., Islam, S., Bartesaghi, A., Perozo, E. & Subramaniam, S. (2016). Cryo-EM structures of the magnesium channel CorA reveal symmetry break upon gating. Cell 164(4), 747–756.Google Scholar

Mazmanian, S. K., Liu, G., Ton-That, H. & Schneewind, O. (1999). Staphylococcus aureus sortase, an enzyme that anchors surface proteins to the cell wall. Science 285(5428), 760–763.Google Scholar

Mcfarland, J. M. & Francis, M. B. (2005). Reductive alkylation of proteins using iridium catalyzed transfer hydrogenation. Journal of the American Chemical Society 127(39), 13490–13491.Google Scholar

Mcfarland, J. M., Joshi, N. S. & Francis, M. B. (2008). Characterization of a three-component coupling reaction on proteins by isotopic labeling and nuclear magnetic resonance spectroscopy. Journal of the American Chemical Society 130(24), 7639–7644.Google Scholar

Mcginty, R. K. & Tan, S. (2015). Nucleosome structure and function. Chemical Reviews 115(6), 2255–2273.Google Scholar

Mcmullan, G., Chen, S., Henderson, R. & Faruqi, A. R. (2009). Detective quantum efficiency of electron area detectors in electron microscopy. Ultramicroscopy 109(9), 1126–1143.Google Scholar

Mehler, M., Eckert, C. E., Busche, A., Kulhei, J., Michaelis, J., Becker-Baldus, J., Wachtveitl, J., Dotsch, V. & Glaubitz, C. (2015). Assembling a correctly folded and functional heptahelical membrane protein by protein trans-splicing. Journal of Biological Chemistry 290(46), 27712–27722.Google Scholar

Meier, F., Abeywardana, T., Dhall, A., Marotta, N. P., Varkey, J., Langen, R., Chatterjee, C. & Pratt, M. R. (2012). Semisynthetic, site-specific ubiquitin modification of alpha-synuclein reveals differential effects on aggregation. Journal of the American Chemical Society 134(12), 5468–5471.Google Scholar

Merk, A., Bartesaghi, A., Banerjee, S., Falconieri, V., Rao, P., Davis, M. I., Pragani, R., Boxer, M. B., Earl, L. A., Milne, J. L. & Subramaniam, S. (2016). Breaking Cryo-EM resolution barriers to facilitate drug discovery. Cell 165(7), 1698–1707.Google Scholar

Metanis, N., Beld, J. & Hilvert, D. (2009). The chemistry of selenocysteine. In PATAI'S Chemistry of Functional Groups. Edited by Rappoport, Z. John Wiley & Sons, Ltd. doi: 10.1002/9780470682531.pat0582.Google Scholar

Mileo, E., Etienne, E., Martinho, M., Lebrun, R., Roubaud, V., Tordo, P., Gontero, B., Guigliarelli, B., Marque, S. R. & Belle, V. (2013). Enlarging the panoply of site-directed spin labeling electron paramagnetic resonance (SDSL-EPR): sensitive and selective spin-labeling of tyrosine using an isoindoline-based nitroxide. Bioconjugate Chemistry 24(6), 1110–1117.Google Scholar

Milles, S., Tyagi, S., Banterle, N., Koehler, C., Vandelinder, V., Plass, T., Neal, A. P. & Lemke, E. A. (2012). Click strategies for single-molecule protein fluorescence. Journal of the American Chemical Society 134(11), 5187–5195.Google Scholar

Minoshima, M. & Kikuchi, K. (2017). Photostable and photoswitching fluorescent dyes for super-resolution imaging. Journal of Biological Inorganic Chemistry. doi: 10.1007/s00775-016-1435-y.Google Scholar

Mootz, H. D., Blum, E. S., Tyszkiewicz, A. B. & Muir, T. W. (2003). Conditional protein splicing: a new tool to control protein structure and function in vitro and in vivo . Journal of the American Chemical Society 125(35), 10561–10569.Google Scholar

Moran, S. D., Woys, A. M., Buchanan, L. E., Bixby, E., Decatur, S. M. & Zanni, M. T. (2012). Two-dimensional IR spectroscopy and segmental C-13 labeling reveals the domain structure of human gamma D-crystallin amyloid fibrils. Proceedings of the National Academy of Sciences of the United States of America 109(9), 3329–3334.Google Scholar

Morgan, M. T., Haj-Yahya, M., Ringel, A. E., Bandi, P., Brik, A. & Wolberger, C. (2016). Structural basis for histone H2B deubiquitination by the SAGA DUB module. Science 351(6274), 725–728.Google Scholar

Moyal, T., Hemantha, H. P., Siman, P., Refua, M. & Brik, A. (2013). Highly efficient one-pot ligation and desulfurization. Chemical Science 4(6), 2496–2501.Google Scholar

Muir, T. W., Sondhi, D. & Cole, P. A. (1998). Expressed protein ligation: a general method for protein engineering. Proceedings of the National Academy of Sciences United States of America 95(12), 6705–6710.Google Scholar

Mukai, T., Hayashi, A., Iraha, F., Sato, A., Ohtake, K., Yokoyama, S. & Sakamoto, K. (2010a). Codon reassignment in the Escherichia coli genetic code. Nucleic Acids Research 38(22), 8188–8195.Google Scholar

Mukai, T., Wakiyama, M., Sakamoto, K. & Yokoyama, S. (2010b). Genetic encoding of non-natural amino acids in Drosophila melanogaster Schneider 2 cells. Protein Science 19(3), 440–448.Google Scholar

Munari, F., Soeroes, S., Zenn, H. M., Schomburg, A., Kost, N., Schroder, S., Klingberg, R., Rezaei-Ghaleh, N., Stutzer, A., Gelato, K. A., Walla, P. J., Becker, S., Schwarzer, D., Zimmermann, B., Fischle, W. & Zweckstetter, M. (2012). Methylation of lysine 9 in histone H3 directs alternative modes of highly dynamic interaction of heterochromatin protein hHP1beta with the nucleosome. Journal of Biological Chemistry 287(40), 33756–33765.Google Scholar

Muona, M., Aranko, A. S., Raulinaitis, V. & Iwai, H. (2010). Segmental isotopic labeling of multi-domain and fusion proteins by protein trans-splicing in vivo and in vitro . Nature Protocols 5(3), 574–587.Google Scholar

Muralidharan, V. & Muir, T. W. (2006). Protein ligation: an enabling technology for the biophysical analysis of proteins. Nature Methods 3(6), 429–438.Google Scholar

Naik, M. T., Suree, N., Ilangovan, U., Liew, C. K., Thieu, W., Campbell, D. O., Clemens, J. J., Jung, M. E. & Clubb, R. T. (2006). Staphylococcus aureus Sortase A transpeptidase. Calcium promotes sorting signal binding by altering the mobility and structure of an active site loop. Journal of Biological Chemistry 281(3), 1817–1826.Google Scholar

Nakamura, T., Kawai, Y., Kitamoto, N., Osawa, T. & Kato, Y. (2009). Covalent modification of lysine residues by allyl isothiocyanate in physiological conditions: plausible transformation of isothiocyanate from thiol to amine. Chem Res Toxicol 22(3), 536–542.Google Scholar

Nettleship, J. E., Assenberg, R., Diprose, J. M., Rahman-Huq, N. & Owens, R. J. (2010). Recent advances in the production of proteins in insect and mammalian cells for structural biology. Journal of Structural Biology 172(1), 55–65.Google Scholar

Neumann, H., Peak-Chew, S. Y. & Chin, J. W. (2008). Genetically encoding N(epsilon)-acetyllysine in recombinant proteins. Nature Chemical Biology 4(4), 232–234.Google Scholar

Neumann, H., Wang, K., Davis, L., Garcia-Alai, M. & Chin, J. W. (2010). Encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature 464(7287), 441–444.Google Scholar

Neumann-Staubitz, P. & Neumann, H. (2016). The use of unnatural amino acids to study and engineer protein function. Current Opinion in Structural Biology 38, 119–128.Google Scholar

Neutze, R., Branden, G. & Schertler, G. F. (2015). Membrane protein structural biology using X-ray free electron lasers. Current Opinion in Structural Biology 33, 115–125.Google Scholar

Nguyen, D. P., Elliott, T., Holt, M., Muir, T. W. & Chin, J. W. (2011). Genetically encoded 1,2-aminothiols facilitate rapid and site-specific protein labeling via a bio-orthogonal cyanobenzothiazole condensation. Journal of the American Chemical Society 133(30), 11418–11421.Google Scholar

Nguyen, D. P., Garcia Alai, M. M., Virdee, S. & Chin, J. W. (2010). Genetically directing varepsilon-N, N-dimethyl-L-lysine in recombinant histones. Chemistry & Biology 17(10), 1072–1076.Google Scholar

Nguyen, G. K., Kam, A., Loo, S., Jansson, A. E., Pan, L. X. & Tam, J. P. (2015). Butelase 1: a versatile ligase for peptide and protein macrocyclization. Journal of the American Chemical Society 137(49), 15398–15401.Google Scholar

Nguyen, G. K., Wang, S., Qiu, Y., Hemu, X., Lian, Y. & Tam, J. P. (2014). Butelase 1 is an Asx-specific ligase enabling peptide macrocyclization and synthesis. Nature Chemical Biology 10(9), 732–738.Google Scholar

Nikic, I. & Lemke, E. A. (2015). Genetic code expansion enabled site-specific dual-color protein labeling: superresolution microscopy and beyond. Current Opinion in Chemical Biology 28, 164–173.Google Scholar

Nilsson, B. L., Kiessling, L. L. & Raines, R. T. (2000). Staudinger ligation: a peptide from a thioester and azide. Organic Letters 2(13), 1939–1941.Google Scholar

Nogales, E. (2016). The development of cryo-EM into a mainstream structural biology technique. Nature Methods 13(1), 24–27.Google Scholar

Noren, C. J., Anthony-cahill, S. J., Griffith, M. C. & Schultz, P. G. (1989). A general method for site-specific incorporation of unnatural amino acids into proteins. Science 244(4901), 182–188.Google Scholar

Okamoto, R., Mandal, K., Sawaya, M. R., Kajihara, Y., Yeates, T. O. & Kent, S. B. (2014). (Quasi-)racemic X-ray structures of glycosylated and non-glycosylated forms of the chemokine Ser-CCL1 prepared by total chemical synthesis. Angewandte Chemie (International edition in English) 53(20), 5194–5198.Google Scholar

Ostrov, N., Landon, M., Guell, M., Kuznetsov, G., Teramoto, J., Cervantes, N., Zhou, M., Singh, K., Napolitano, M. G., Moosburner, M., Shrock, E., Pruitt, B. W., Conway, N., Goodman, D. B., Gardner, C. L., Tyree, G., Gonzales, A., Wanner, B. L., Norville, J. E., Lajoie, M. J. & Church, G. M. (2016). Design, synthesis, and testing toward a 57-codon genome. Science 353(6301), 819–822.Google Scholar

Pan, M., Gao, S., Zheng, Y., Tan, X., Lan, H., Tan, X., Sun, D., Lu, L., Wang, T., Zheng, Q., Huang, Y., Wang, J. & LIU, L. (2016). Quasi-racemic X-ray structures of K27-linked ubiquitin chains prepared by total chemical synthesis. Journal of the American Chemical Society 138(23), 7429–7435.Google Scholar

Park, S. H., Wang, V. S., Radoicic, J., De Angelis, A. A., Berkamp, S. & Opella, S. J. (2015). Paramagnetic relaxation enhancement of membrane proteins by incorporation of the metal-chelating unnatural amino acid 2-amino-3-(8-hydroxyquinolin-3-yl)propanoic acid (HQA). Journal of Biomolecular NMR, 61(3–4), 185–196.Google Scholar

Pentelute, B. L. & Kent, S. B. H. (2007). Selective desulfurization of cysteine in the presence of Cys(Acm) in polypeptides obtained by native chemical ligation. Organic Letters 9(4), 687–690.Google Scholar

Perdios, L., Lowe, A. R., Saladino, G., Bunney, T. D., Thiyagarajan, N., Alexandrov, Y., Dunsby, C., French, P. M., Chin, J. W., Gervasio, F. L., Tate, E. W. & Katan, M. (2017). Conformational transition of FGFR kinase activation revealed by site-specific unnatural amino acid reporter and single molecule FRET. Science Report 7, 39841.Google Scholar

Perler, F. B. (2002). InBase: the intein database. Nucleic Acids Research 30(1), 383–384.Google Scholar

Pervushin, K., Riek, R., Wider, G. & Wuthrich, K. (1997). Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proceedings of the National Academy of Sciences United States of America 94(23), 12366–12371.Google Scholar

Petkova, A. T., Yau, W. M. & Tycko, R. (2006). Experimental constraints on quaternary structure in Alzheimer's beta-amyloid fibrils. Biochemistry 45(2), 498–512.Google Scholar

Pietrokovski, S. (2001). Intein spread and extinction in evolution. Trends in Genetics 17(8), 465–472.Google Scholar

Pike, A. C., Garman, E. F., Krojer, T., Von Delft, F. & Carpenter, E. P. (2016). An overview of heavy-atom derivatization of protein crystals. Acta Crystallographica D, Biological Crystallography 72(Pt 3), 303–318.Google Scholar

Plaks, J. G., Falatach, R., Kastantin, M., Berberich, J. A. & Kaar, J. L. (2015). Multisite clickable modification of proteins using lipoic acid ligase. Bioconjugate Chemistry 26(6), 1104–1112.Google Scholar

Plaschka, C., Hantsche, M., Dienemann, C., Burzinski, C., Plitzko, J. & Cramer, P. (2016). Transcription initiation complex structures elucidate DNA opening. Nature 533(7603), 353–358.Google Scholar

Plass, T., Milles, S., Koehler, C., Schultz, C. & Lemke, E. A. (2011). Genetically encoded copper-free click chemistry. Angewandte Chemie (International edition in English) 50(17), 3878–3881.Google Scholar

Policarpo, R. L., Kang, H., Liao, X., Rabideau, A. E., Simon, M. D. & Pentelute, B. L. (2014). Flow-based enzymatic ligation by sortase A. Angewandte Chemie (International edition in English) 53(35), 9203–9208.Google Scholar

Pollack, S. J. & Schultz, P. G. (1989). A semisynthetic catalytic antibody. Journal of the American Chemical Society 111(5), 1929–1931.Google Scholar

Popp, M. W. & Ploegh, H. L. (2011). Making and breaking peptide bonds: protein engineering using sortase. Angewandte Chemie (International edition in English) 50(22), 5024–5032.Google Scholar

Rabanal, F., Degrado, W. F. & Dutton, P. L. (1996). Use of 2,2′-dithiobis(5-nitropyridine) for the heterodimerization of cysteine containing peptides. Introduction of the 5-nitro-2-pyridinesulfenyl group. Tetrahedron Letters 37(9), 1347–1350.Google Scholar

Rashidian, M., Dozier, J. K. & Distefano, M. D. (2013). Enzymatic labeling of proteins: techniques and approaches. Bioconjugate Chemistry 24(8), 1277–1294.Google Scholar

Ratzke, C., Hellenkamp, B. & Hugel, T. (2014). Four-colour FRET reveals directionality in the Hsp90 multicomponent machinery. Nature Communications 5, 4192.Google Scholar

Reddy, P. S., Dery, S. & Metanis, N. (2016). Chemical synthesis of proteins with non-strategically placed cysteines using selenazolidine and selective deselenization. Angewandte Chemie (International edition in English) 55(3), 992–995.Google Scholar

Ritzefeld, M. (2014). Sortagging: a robust and efficient chemoenzymatic ligation strategy. Chemistry 20(28), 8516–8529.Google Scholar

Rogerson, D. T., Sachdeva, A., Wang, K., Haq, T., Kazlauskaite, A., Hancock, S. M., Huguenin-Dezot, N., Muqit, M. M., Fry, A. M., Bayliss, R. & Chin, J. W. (2015). Efficient genetic encoding of phosphoserine and its nonhydrolyzable analog. Nature Chemical Biology 11(7), 496–503.Google Scholar

Rostovtsev, V. V., Green, L. G., Fokin, V. V. & Sharpless, K. B. (2002). A stepwise huisgen cycloaddition process: copper(I)-catalyzed regioselective ‘ligation’ of azides and terminal alkynes. Angewandte Chemie (International edition in English) 41(14), 2596–2599.Google Scholar

Sachdeva, A., Wang, K., Elliott, T. & Chin, J. W. (2014). Concerted, rapid, quantitative, and site-specific dual labeling of proteins. Journal of the American Chemical Society 136(22), 7785–7788.Google Scholar

Sakakibara, D., Sasaki, A., Ikeya, T., Hamatsu, J., Hanashima, T., Mishima, M., Yoshimasu, M., Hayashi, N., Mikawa, T., Walchli, M., Smith, B. O., Shirakawa, M., Guntert, P. & Ito, Y. (2009). Protein structure determination in living cells by in-cell NMR spectroscopy. Nature 458(7234), 102–105.Google Scholar

Sakamoto, K., Murayama, K., Oki, K., Iraha, F., Kato-Murayama, M., Takahashi, M., Ohtake, K., Kobayashi, T., Kuramitsu, S., Shirouzu, M. & Yokoyama, S. (2009). Genetic encoding of 3-iodo-L-tyrosine in Escherichia coli for single-wavelength anomalous dispersion phasing in protein crystallography. Structure 17(3), 335–344.Google Scholar

Saxon, E., Armstrong, J. I. & Bertozzi, C. R. (2000). A ‘traceless’ Staudinger ligation for the chemoselective synthesis of amide bonds. Organic Letters 2(14), 2141–2143.Google Scholar

Saxon, E. & Bertozzi, C. R. (2000). Cell surface engineering by a modified Staudinger reaction. Science 287(5460), 2007–2010.Google Scholar

Sayers, J., Thompson, R. E., Perry, K. J., Malins, L. R. & Payne, R. J. (2015). Thiazolidine-protected beta-thiol asparagine: applications in one-pot ligation-desulfurization chemistry. Organic Letters 17(19), 4902–4905.Google Scholar

Scheres, S. H. (2012). A Bayesian view on cryo-EM structure determination. Journal of Molecular Biology 415(2), 406–418.Google Scholar

Schlick, T. L., Ding, Z., Kovacs, E. W. & Francis, M. B. (2005). Dual-surface modification of the tobacco mosaic virus. Journal of the American Chemical Society 127(11), 3718–3723.Google Scholar

Schmidt, M. J., Fedoseev, A., Bucker, D., Borbas, J., Peter, C., Drescher, M. & Summerer, D. (2015). EPR distance measurements in native proteins with genetically encoded spin labels. ACS Chemical Biology 10(12), 2764–2771.Google Scholar

Schmidt, M. J. & Summerer, D. (2013). Red-light-controlled protein-RNA crosslinking with a genetically encoded furan. Angewandte Chemie (International edition in English) 52(17), 4690–4693.Google Scholar

Schmohl, L. & Schwarzer, D. (2014). Sortase-mediated ligations for the site-specific modification of proteins. Current Opinion in Chemical Biology 22, 122–128.Google Scholar

Schubeis, T., Yuan, P., Ahmed, M., Nagaraj, M., Van Rossum, B. J. & Ritter, C. (2015). Untangling a repetitive amyloid sequence: correlating biofilm-derived and segmentally labeled curli fimbriae by solid-state NMR spectroscopy. Angewandte Chemie (International edition in English) 54(49), 14669–14672.Google Scholar

Schultz, K. C., Supekova, L., Ryu, Y., Xie, J., Perera, R. & Schultz, P. G. (2006). A genetically encoded infrared probe. Journal of the American Chemical Society 128(43), 13984–13985.Google Scholar

Schwartz, E. C., Saez, L., Young, M. W. & Muir, T. W. (2007). Post-translational enzyme activation in an animal via optimized conditional protein splicing. Nature Chemical Biology 3(1), 50–54.Google Scholar

Scott, C. P., Abel-santos, E., Wall, M., Wahnon, D. C. & Benkovic, S. J. (1999). Production of cyclic peptides and proteins in vivo . Proceedings of the National Academy of Sciences United States of America 96(24), 13638–13643.Google Scholar

Seim, K. L., Obermeyer, A. C. & Francis, M. B. (2011). Oxidative modification of native protein residues using cerium(IV) ammonium nitrate. Journal of the American Chemical Society 133(42), 16970–16976.Google Scholar

Seitchik, J. L., Peeler, J. C., Taylor, M. T., Blackman, M. L., Rhoads, T. W., Cooley, R. B., refakis, C., Fox, J. M. & Mehl, R. A. (2012). Genetically encoded tetrazine amino acid directs rapid site-specific in vivo bioorthogonal ligation with trans-cyclooctenes. Journal of the American Chemical Society 134(6), 2898–2901.Google Scholar

Serwa, R., Wilkening, I., Del Signore, G., Muhlberg, M., Claussnitzer, I., Weise, C., Gerrits, M. & Hackenberger, C. P. (2009). Chemoselective Staudinger-phosphite reaction of azides for the phosphorylation of proteins. Angewandte Chemie (International edition in English) 48(44), 8234–8239.Google Scholar

Shah, L., Laughlin, S. T. & Carrico, I. S. (2016). Light-activated staudinger-bertozzi ligation within living animals. Journal of the American Chemical Society 138(16), 5186–5189.Google Scholar

Shah, N. H., Dann, G. P., Vila-Perello, M., Liu, Z. & Muir, T. W. (2012). Ultrafast protein splicing is common among cyanobacterial split inteins: implications for protein engineering. Journal of the American Chemical Society 134(28), 11338–11341.Google Scholar

Shah, N. H., Eryilmaz, E., Cowburn, D. & Muir, T. W. (2013). Naturally split inteins assemble through a ‘capture and collapse’ mechanism. Journal of the American Chemical Society 135(49), 18673–18681.Google Scholar

Shah, N. H. & Muir, T. W. (2014). Inteins: nature's gift to protein chemists. Chemical Science 5(1), 446–461.Google Scholar

Shah, N. H., Vila-Perello, M. & Muir, T. W. (2011). Kinetic control of one-pot trans-splicing reactions by using a wild-type and designed split intein. Angewandte Chemie (International edition in English) 50(29), 6511–6515.Google Scholar

Shang, S. Y., Tan, Z. P., Dong, S. W. & Danishefsky, S. J. (2011). An advance in proline ligation. Journal of the American Chemical Society 133(28), 10784–10786.Google Scholar

Sharaf, N. G. & Gronenborn, A. M. (2015). (19)F-modified proteins and (19)F-containing ligands as tools in solution NMR studies of protein interactions. Methods in Enzymology 565, 67–95.Google Scholar

Shi, J. & Muir, T. W. (2005). Development of a tandem protein trans-splicing system based on native and engineered split inteins. Journal of the American Chemical Society 127(17), 6198–6206.Google Scholar

Shieh, P. & Bertozzi, C. R. (2014). Design strategies for bioorthogonal smart probes. Organic & Biomolecular Chemistry 12(46), 9307–9320.Google Scholar

Silvaggi, N. R., Martin, L. J., Schwalbe, H., Imperiali, B. & Allen, K. N. (2007). Double-lanthanide-binding tags for macromolecular crystallographic structure determination. Journal of the American Chemical Society 129(22), 7114–7120.Google Scholar

Siman, P., Karthikeyan, S. V. & Brik, A. (2012). Native chemical ligation at glutamine. Organic Letters 14(6), 1520–1523.Google Scholar

Simon, M. D., Chu, F., Racki, L. R., De La Cruz, C. C., Burlingame, A. L., Panning, B., Narlikar, G. J. & Shokat, K. M. (2007). The site-specific installation of methyl-lysine analogs into recombinant histones. Cell 128(5), 1003–1012.Google Scholar

Skrisovska, L., Schubert, M. & Allain, F. H. (2010). Recent advances in segmental isotope labeling of proteins: NMR applications to large proteins and glycoproteins. Journal of Biomolecular NMR 46(1), 51–65.Google Scholar

Slavoff, S. A., Liu, D. S., Cohen, J. D. & Ting, A. Y. (2011). Imaging protein–protein interactions inside living cells via interaction-dependent fluorophore ligation. Journal of the American Chemical Society 133(49), 19769–19776.Google Scholar

Smits, A. H., Borrmann, A., Roosjen, M., Van Hest, J. C. & Vermeulen, M. (2016). Click-MS: tagless protein enrichment using bioorthogonal chemistry for quantitative proteomics. ACS Chemical Biology 11(12), 3245–3250.Google Scholar

Song, F., Chen, P., Sun, D., Wang, M., Dong, L., Liang, D., Xu, R. M., Zhu, P. & Li, G. (2014). Cryo-EM study of the chromatin fiber reveals a double helix twisted by tetranucleosomal units. Science 344(6182), 376–380.Google Scholar

Southworth, M. W., Amaya, K., Evans, T. C., Xu, M. Q. & Perler, F. B. (1999). Purification of proteins fused to either the amino or carboxy terminus of the Mycobacterium xenopi gyrase A intein. Biotechniques 27(1), 110–114, 116, 118–120.Google Scholar

Spicer, C. D. & Davis, B. G. (2014). Selective chemical protein modification. Nature Communications 5, 4740.Google Scholar

Srinivasan, G., James, C. M. & Krzycki, J. A. (2002). Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science 296(5572), 1459–1462.Google Scholar

Stephanopoulos, N. & Francis, M. B. (2011). Choosing an effective protein bioconjugation strategy. Nature Chemical Biology 7(12), 876–884.Google Scholar

Stevens, A. J., Brown, Z. Z., Shah, N. H., Sekar, G., Cowburn, D. & Muir, T. W. (2016). Design of a split intein with exceptional protein splicing activity. Journal of the American Chemical Society 138(7), 2162–2165.Google Scholar

Summerer, D., Chen, S., Wu, N., Deiters, A., Chin, J. W. & Schultz, P. G. (2006). A genetically encoded fluorescent amino acid. Proceedings of the National Academy of Sciences United States of America 103(26), 9785–9789.Google Scholar

Szymanski, W., Wu, B., Poloni, C., Janssen, D. B. & Feringa, B. L. (2013). Azobenzene photoswitches for Staudinger-Bertozzi ligation. Angewandte Chemie (International edition in English) 52(7), 2068–2072.Google Scholar

Tan, Z. P., Shang, S. Y. & Danishefsky, S. J. (2010). Insights into the finer issues of native chemical ligation: an approach to cascade ligations. Angewandte Chemie (International edition in English) 49(49), 9500–9503.Google Scholar

Tanaka, K., Kitadani, M. & Fukase, K. (2011). Target-selective fluorescent ‘switch-on’ protein labeling by 6pi-azaelectrocyclization. Organic & Biomolecular 9(15), 5346–5349.Google Scholar

Tenboer, J., Basu, S., Zatsepin, N., Pande, K., Milathianaki, D., Frank, M., Hunter, M., Boutet, S., Williams, G. J., Koglin, J. E., Oberthuer, D., Heymann, M., Kupitz, C., Conrad, C., Coe, J., Roy-Chowdhury, S., Weierstall, U., James, D., Wang, D., Grant, T., Barty, A., Yefanov, O., Scales, J., Gati, C., Seuring, C., Srajer, V., Henning, R., Schwander, P., Fromme, R., Ourmazd, A., Moffat, K., Van Thor, J. J., Spence, J. C., Fromme, P., Chapman, H. N. & Schmidt, M. (2014). Time-resolved serial crystallography captures high-resolution intermediates of photoactive yellow protein. Science 346(6214), 1242–1246.Google Scholar

Tey, L. H., Loveridge, E. J., Swanwick, R. S., Flitsch, S. L. & Allemann, R. K. (2010). Highly site-selective stability increases by glycosylation of dihydrofolate reductase. FEBS Journal 277(9), 2171–2179.Google Scholar

Theile, C. S., Witte, M. D., Blom, A. E., Kundrat, L., Ploegh, H. L. & Guimaraes, C. P. (2013). Site-specific N-terminal labeling of proteins using sortase-mediated reactions. Nature Protocols 8(9), 1800–1807.Google Scholar

Thiel, I. V., Volkmann, G., Pietrokovski, S. & Mootz, H. D. (2014). An atypical naturally split intein engineered for highly efficient protein labeling. Angewandte Chemie (International edition in English) 53(5), 1306–1310.Google Scholar

Thompson, R. E., Chan, B., Radom, L., Jolliffe, K. A. & Payne, R. J. (2013). Chemoselective peptide ligation-desulfurization at aspartate. Angewandte Chemie (International edition in English) 52(37), 9723–9727.Google Scholar

Thompson, R. E., Liu, X., Alonso-Garcia, N., Pereira, P. J., Jolliffe, K. A. & Payne, R. J. (2014). Trifluoroethanethiol: an additive for efficient one-pot peptide ligation-desulfurization chemistry. Journal of the American Chemical Society 136(23), 8161–8164.Google Scholar

Tian, F., Lu, Y., Manibusan, A., Sellers, A., Tran, H., Sun, Y., Phuong, T., Barnett, R., Hehli, B., Song, F., Deguzman, M. J., Ensari, S., Pinkstaff, J. K., Sullivan, L. M., Biroc, S. L., Cho, H., Schultz, P. G., Dijoseph, J., Dougher, M., Ma, D., Dushin, R., Leal, M., Tchistiakova, L., Feyfant, E., Gerber, H. P. & Sapra, P. (2014). A general approach to site-specific antibody drug conjugates. Proceedings of the National Academy of Sciences United States of America 111(5), 1766–1771.Google Scholar

Tilley, S. D. & Francis, M. B. (2006). Tyrosine-selective protein alkylation using pi-allylpalladium complexes. Journal of the American Chemical Society 128(4), 1080–1081.Google Scholar

Tompa, P. (2012). Intrinsically disordered proteins: a 10-year recap. Trends in Biochemical Sciences 37(12), 509–516.Google Scholar

Ton-That, H., Mazmanian, S. K., Faull, K. F. & Schneewind, O. (2000). Anchoring of surface proteins to the cell wall of Staphylococcus aureus. Sortase catalyzed in vitro transpeptidation reaction using LPXTG peptide and NH(2)-Gly(3) substrates. Journal of Biological Chemistry 275(13), 9876–9881.Google Scholar

Topilina, N. I. & Mills, K. V. (2014). Recent advances in in vivo applications of intein-mediated protein splicing. Mobile DNA 5(1), 5.Google Scholar

Torbeev, V. Y. & Hilvert, D. (2013). Both the cis-trans equilibrium and isomerization dynamics of a single proline amide modulate beta2-microglobulin amyloid assembly. Proceedings of the National Academy of Sciences United States of America 110(50), 20051–20056.Google Scholar

Torbeev, V. Y., Raghuraman, H., Hamelberg, D., Tonelli, M., Westler, W. M., Perozo, E. & Kent, S. B. (2011). Protein conformational dynamics in the mechanism of HIV-1 protease catalysis. Proceedings of the National Academy of Sciences United States of America 108(52), 20982–20987.Google Scholar

Tornoe, C. W., Christensen, C. & Meldal, M. (2002). Peptidotriazoles on solid phase: [1,2,3]-triazoles by regiospecific copper(i)-catalyzed 1,3-dipolar cycloadditions of terminal alkynes to azides. Journal of Organic Chemistry 67(9), 3057–3064.Google Scholar

Tremblay, M. L., Xu, L., Lefevre, T., Sarker, M., Orrell, K. E., Leclerc, J., Meng, Q., Pezolet, M., Auger, M., Liu, X. Q. & Rainey, J. K. (2015). Spider wrapping silk fibre architecture arising from its modular soluble protein precursor. Science Report 5, 11502.Google Scholar

Tugarinov, V., Hwang, P. M., Ollerenshaw, J. E. & Kay, L. E. (2003). Cross-correlated relaxation enhanced 1H[bond]13C NMR spectroscopy of methyl groups in very high molecular weight proteins and protein complexes. Journal of the American Chemical Society 125(34), 10420–10428.Google Scholar

Tynan, C. J., Lo Schiavo, V., Zanetti-Domingues, L., Needham, S. R., Roberts, S. K., Hirsch, M., Rolfe, D. J., Korovesis, D., Clarke, D. T. & Martin-Fernandez, M. L. (2016). A tale of the epidermal growth factor receptor: the quest for structural resolution on cells. Methods 95, 86–93.Google Scholar

Uttamapinant, C., Tangpeerachaikul, A., Grecian, S., Clarke, S., Singh, U., Slade, P., Gee, K. R. & Ting, A. Y. (2012). Fast, cell-compatible click chemistry with copper-chelating azides for biomolecular labeling. Angewandte Chemie (International edition in English) 51(24), 5852–5856.Google Scholar

Uttamapinant, C., White, K. A., Baruah, H., Thompson, S., Fernandez-Suarez, M., Puthenveetil, S. & Ting, A. Y. (2010). A fluorophore ligase for site-specific protein labeling inside living cells. Proceedings of the National Academy of Sciences United States of America 107(24), 10914–10919.Google Scholar

Uversky, V. N. (2015). Biophysical methods to investigate intrinsically disordered proteins: avoiding an ‘elephant and blind men’ situation. Advances in Experimental Medicine and Biology 870, 215–260.Google Scholar

Valiyaveetil, F. I., Leonetti, M., Muir, T. W. & Mackinnon, R. (2006). Ion selectivity in a semisynthetic K+ channel locked in the conductive conformation. Science 314(5801), 1004–1007.Google Scholar

Van Kasteren, S. I., Kramer, H. B., Jensen, H. H., Campbell, S. J., Kirkpatrick, J., Oldham, N. J., Anthony, D. C. & Davis, B. G. (2007). Expanding the diversity of chemical protein modification allows post-translational mimicry. Nature 446(7139), 1105–1109.Google Scholar

Van ‘T Hof, W., Hansenova Manaskova, S., Veerman, E. C. & Bolscher, J. G. (2015). Sortase-mediated backbone cyclization of proteins and peptides. Biological Chemistry 396(4), 283–293.Google Scholar

Van Wilderen, L. J., Kern-Michler, D., Muller-Werkmeister, H. M. & Bredenbeck, J. (2014). Vibrational dynamics and solvatochromism of the label SCN in various solvents and hemoglobin by time dependent IR and 2D-IR spectroscopy. PhysChemChemPhys 16(36), 19643–19653.Google Scholar

Venditti, V., Fawzi, N. L. & Clore, G. M. (2012). An efficient protocol for incorporation of an unnatural amino acid in perdeuterated recombinant proteins using glucose-based media. Journal of Biomolecular NMR 52(3), 191–195.Google Scholar

Vila-Perello, M., Liu, Z., Shah, N. H., Willis, J. A., Idoyaga, J. & Muir, T. W. (2013). Streamlined expressed protein ligation using split inteins. Journal of the American Chemical Society 135(1), 286–292.Google Scholar

Vila-Perello, M. & Muir, T. W. (2010). Biological applications of protein splicing. Cell 143(2), 191–200.Google Scholar

Vinogradova, E. V., Zhang, C., Spokoyny, A. M., Pentelute, B. L. & Buchwald, S. L. (2015). Organometallic palladium reagents for cysteine bioconjugation. Nature 526(7575), 687–691.Google Scholar

Virdee, S., Kapadnis, P. B., Elliott, T., Lang, K., Madrzak, J., Nguyen, D. P., Riechmann, L. & Chin, J. W. (2011). Traceless and site-specific ubiquitination of recombinant proteins. Journal of the American Chemical Society 133(28), 10708–10711.Google Scholar

Virdee, S., Ye, Y., Nguyen, D. P., Komander, D. & Chin, J. W. (2010). Engineered diubiquitin synthesis reveals Lys29-isopeptide specificity of an OTU deubiquitinase. Nature Chemical Biology 6(10), 750–757.Google Scholar

Volkmann, G. & Mootz, H. D. (2013). Recent progress in intein research: from mechanism to directed evolution and applications. Cellular and Molecular Life Science 70(7), 1185–1206.Google Scholar

Von der Ecken, J., Muller, M., Lehman, W., Manstein, D. J., Penczek, P. A. & Raunser, S. (2015). Structure of the F-actin-tropomyosin complex. Nature 519(7541), 114–117.Google Scholar

Wakamori, M., Fujii, Y., Suka, N., Shirouzu, M., Sakamoto, K., Umehara, T. & Yokoyama, S. (2015). Intra- and inter-nucleosomal interactions of the histone H4 tail revealed with a human nucleosome core particle with genetically-incorporated H4 tetra-acetylation. Science Report 5, 17204.Google Scholar

Walper, S. A., Turner, K. B. & Medintz, I. L. (2015). Enzymatic bioconjugation of nanoparticles: developing specificity and control. Current Opinion Biotechnology 34, 232–241.Google Scholar

Wan, Q. & Danishefsky, S. J. (2007). Free-radical-based, specific desulfurization of cysteine: a powerful advance in the synthesis of polypeptides and glycopolypeptides. Angewandte Chemie (International edition in English) 46(48), 9248–9252.Google Scholar

Wan, W., Huang, Y., Wang, Z., Russell, W. K., Pai, P. J., Russell, D. H. & Liu, W. R. (2010). A facile system for genetic incorporation of two different noncanonical amino acids into one protein in Escherichia coli . Angewandte Chemie (International edition in English) 49(18), 3211–3214.Google Scholar

Wang, K., Fredens, J., Brunner, S. F., Kim, S. H., Chia, T. & Chin, J. W. (2016). Defining synonymous codon compression schemes by genome recoding. Nature 539(7627), 59–64.Google Scholar

Wang, K., Sachdeva, A., Cox, D. J., Wilf, N. M., Lang, K., Wallace, S., Mehl, R. A. & Chin, J. W. (2014a). Optimized orthogonal translation of unnatural amino acids enables spontaneous protein double-labelling and FRET. Nature Chemistry 6(5), 393–403.Google Scholar

Wang, L., Brock, A., Herberich, B. & Schultz, P. G. (2001). Expanding the genetic code of Escherichia coli . Science 292(5516), 498–500.Google Scholar

Wang, S., Munro, R. A., Shi, L., Kawamura, I., Okitsu, T., Wada, A., Kim, S. Y., Jung, K. H., Brown, L. S. & Ladizhansky, V. (2013). Solid-state NMR spectroscopy structure determination of a lipid-embedded heptahelical membrane protein. Nature Methods 10(10), 1007–1012.Google Scholar

Wang, Y., Kavran, J. M., Chen, Z., Karukurichi, K. R., Leahy, D. J. & Cole, P. A. (2014b). Regulation of S-adenosylhomocysteine hydrolase by lysine acetylation. Journal of Biological Chemistry 289(45), 31361–31372.Google Scholar

Wang, Z. A., Zeng, Y., Kurra, Y., Wang, X., Tharp, J. M., Vatansever, E. C., Hsu, W. W., Dai, S., Fang, X. & Liu, W. R. (2017). A genetically encoded allysine for the synthesis of proteins with site-specific lysine dimethylation. Angewandte Chemie (International edition in English) 56(1), 212–216.Google Scholar

Warden-Rothman, R., Caturegli, I., Popik, V. & Tsourkas, A. (2013). Sortase-tag expressed protein ligation: combining protein purification and site-specific bioconjugation into a single step. Analytical Chemistry 85(22), 11090–11097.Google Scholar

Wasmer, C., Lange, A., Van Melckebeke, H., Siemer, A. B., Riek, R. & Meier, B. H. (2008). Amyloid fibrils of the HET-s(218–289) prion form a beta solenoid with a triangular hydrophobic core. Science 319(5869), 1523–1526.Google Scholar

Williams, F. P., Milbradt, A. G., Embrey, K. J. & Bobby, R. (2016). Segmental isotope labelling of an individual bromodomain of a tandem domain BRD4 using sortase A. PLoS ONE 11(4), e0154607.Google Scholar

Williamson, D. J., Webb, M. E. & Turnbull, W. B. (2014). Depsipeptide substrates for sortase-mediated N-terminal protein ligation. Nature Protocols 9(2), 253–262.Google Scholar

Williamson, M. P., Havel, T. F. & Wuthrich, K. (1985). Solution conformation of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry. Journal of Molecular Biology 182(2), 295–315.Google Scholar

Winkelman, J. T., Vvedenskaya, I. O., Zhang, Y., Zhang, Y., Bird, J. G., Taylor, D. M., Gourse, R. L., Ebright, R. H. & Nickels, B. E. (2016). Multiplexed protein-DNA cross-linking: scrunching in transcription start site selection. Science 351(6277), 1090–1093.Google Scholar

Wissner, R. F., Batjargal, S., Fadzen, C. M. & Petersson, E. J. (2013). Labeling proteins with fluorophore/thioamide Forster resonant energy transfer pairs by combining unnatural amino acid mutagenesis and native chemical ligation. Journal of the American Chemical Society 135(17), 6529–6540.Google Scholar

Witte, M. D., Cragnolini, J. J., Dougan, S. K., Yoder, N. C., Popp, M. W. & Ploegh, H. L. (2012). Preparation of unnatural N-to-N and C-to-C protein fusions. Proceedings of the National Academy of Sciences United States of America 109(30), 11993–11998.Google Scholar

Wood, D. W. & Camarero, J. A. (2014). Intein applications: from protein purification and labeling to metabolic control methods. Journal of Biological Chemistry 289(21), 14512–14519.Google Scholar

Wright, T. H., Bower, B. J., Chalker, J. M., Bernardes, G. J., Wiewiora, R., Ng, W. L., Raj, R., Faulkner, S., Vallee, M. R., Phanumartwiwath, A., Coleman, O. D., Thezenas, M. L., Khan, M., Galan, S. R., Lercher, L., Schombs, M. W., Gerstberger, S., Palm-Espling, M. E., Baldwin, A. J., Kessler, B. M., Claridge, T. D., Mohammed, S. & Davis, B. G. (2016). Posttranslational mutagenesis: a chemical strategy for exploring protein side-chain diversity. Science, 354(6312).Google Scholar

Wu, H., Hu, Z. & Liu, X. Q. (1998). Protein trans-splicing by a split intein encoded in a split DnaE gene of Synechocystis sp. PCC6803 . Proceedings of the National Academy of Sciences United States of America 95(16), 9226–9231.Google Scholar

Wu, P., Shui, W., Carlson, B. L., Hu, N., Rabuka, D., Lee, J. & Bertozzi, C. R. (2009). Site-specific chemical modification of recombinant proteins produced in mammalian cells by using the genetically encoded aldehyde tag. Proceedings of the National Academy of Sciences United States of America 106(9), 3000–3005.Google Scholar

Wuethrich, I., Peeters, J. G., Blom, A. E., Theile, C. S., Li, Z., Spooner, E., Ploegh, H. L. & Guimaraes, C. P. (2014). Site-specific chemoenzymatic labeling of aerolysin enables the identification of new aerolysin receptors. PLoS ONE 9(10), e109883.Google Scholar

Wukovitz, S. W. & Yeates, T. O. (1995). Why protein crystals favour some space-groups over others. Nature Structural Biology 2(12), 1062–1067.Google Scholar

Xiao, H., Chatterjee, A., Choi, S. H., Bajjuri, K. M., Sinha, S. C. & Schultz, P. G. (2013). Genetic incorporation of multiple unnatural amino acids into proteins in mammalian cells. Angewandte Chemie (International edition in English) 52(52), 14080–14083.Google Scholar

Xie, J., Wang, L., Wu, N., Brock, A., Spraggon, G. & Schultz, P. G. (2004). The site-specific incorporation of p-iodo-L-phenylalanine into proteins for structure determination. Nature Biotechnology 22(10), 1297–1301.Google Scholar

Xie, R., Dong, L., Huang, R., Hong, S., Lei, R. & Chen, X. (2014). Targeted imaging and proteomic analysis of tumor-associated glycans in living animals. Angewandte Chemie (International edition in English) 53(51), 14082–14086.Google Scholar

Xu, R., Ayers, B., Cowburn, D. & Muir, T. W. (1999). Chemical ligation of folded recombinant proteins: segmental isotopic labeling of domains for NMR studies. Proceedings of the National Academy of Sciences United States of America 96(2), 388–393.Google Scholar

Yamamura, Y., Hirakawa, H., Yamaguchi, S. & Nagamune, T. (2011). Enhancement of sortase A-mediated protein ligation by inducing a beta-hairpin structure around the ligation site. Chemical Communications (Cambridge) 47(16), 4742–4744.Google Scholar

Yamazaki, T., Otomo, T., Oda, N., Kyogoku, Y., Uegaki, K., Ito, N., Ishino, Y. & Nakamura, H. (1998). Segmental isotope labeling for protein NMR using peptide splicing. Journal of the American Chemical Society 120(22), 5591–5592.Google Scholar

Yan, L. Z. & Dawson, P. E. (2001). Synthesis of peptides and proteins without cysteine residues by native chemical ligation combined with desulfurization. Journal of the American Chemical Society 123(4), 526–533.Google Scholar

Yang, A., Ha, S., Ahn, J., Kim, R., Kim, S., Lee, Y., Kim, J., Soll, D., Lee, H. Y. & Park, H. S. (2016). A chemical biology route to site-specific authentic protein modifications. Science 354(6312), 623–626.Google Scholar

Yang, F., Yu, X., Liu, C., Qu, C. X., Gong, Z., Liu, H. D., Li, F. H., Wang, H. M., He, D. F., Yi, F., Song, C., Tian, C. L., Xiao, K. H., Wang, J. Y. & Sun, J. P. (2015). Phospho-selective mechanisms of arrestin conformations and functions revealed by unnatural amino acid incorporation and (19)F-NMR. Nature Communications 6, 8202.Google Scholar

Yang, R., Wong, Y. H., Nguyen, G. K., Tam, J. P., Lescar, J. & Wu, B. (2017). Engineering a catalytically efficient recombinant protein ligase. Journal of the American Chemical Society. doi: 10.1021/jacs.1026b12637.Google Scholar

Yang, R. L., Pasunooti, K. K., Li, F. P., Liu, X. W. & Liu, C. F. (2009). Dual native chemical ligation at lysine. Journal of the American Chemical Society 131(38), 13592.Google Scholar

Yang, W., Hendrickson, W. A., Crouch, R. J. & Satow, Y. (1990). Structure of ribonuclease H phased at 2 A resolution by MAD analysis of the selenomethionyl protein. Science 249(4975), 1398–1405.Google Scholar

Ye, S., Zaitseva, E., Caltabiano, G., Schertler, G. F., Sakmar, T. P., Deupi, X. & Vogel, R. (2010). Tracking G-protein-coupled receptor activation using genetically encoded infrared probes. Nature 464(7293), 1386–1389.Google Scholar

Yeates, T. O. & Kent, S. B. (2012). Racemic protein crystallography. Annual Review of Biophysics 41, 41–61.Google Scholar

Yeung, H., Squire, C. J., Yosaatmadja, Y., Panjikar, S., Lopez, G., Molina, A., Baker, E. N., Harris, P. W. & Brimble, M. A. (2016). Radiation damage and racemic protein crystallography reveal the unique structure of the GASA/Snakin Protein Superfamily. Angewandte Chemie (International edition in English) 55(28), 7930–7933.Google Scholar

Yoshizawa, S. & Bock, A. (2009). The many levels of control on bacterial selenoprotein synthesis. Biochimica et Biophysica Acta 1790(11), 1404–1414.Google Scholar

Young, T. S., Ahmad, I., Yin, J. A. & Schultz, P. G. (2010). An enhanced system for unnatural amino acid mutagenesis in E. coli . Journal of Molecular Biology 395(2), 361–374.Google Scholar

Yu, J. X., Hallac, R. R., Chiguru, S. & Mason, R. P. (2013). New frontiers and developing applications in 19F NMR. Progress in Nuclear Magnetic Resonance Spectroscopy 70, 25–49.Google Scholar

Zettler, J., Schutz, V. & Mootz, H. D. (2009). The naturally split Npu DnaE intein exhibits an extraordinarily high rate in the protein trans-splicing reaction. FEBS Letters 583(5), 909–914.Google Scholar

Zhang, C., Welborn, M., Zhu, T., Yang, N. J., Santos, M. S., Van Voorhis, T. & Pentelute, B. L. (2016a). Pi-Clamp-mediated cysteine conjugation. Nature Chemistry 8(2), 120–128.Google Scholar

Zhang, M., Lin, S., Song, X., Liu, J., Fu, Y., Ge, X., Fu, X., Chang, Z. & Chen, P. R. (2011). A genetically incorporated crosslinker reveals chaperone cooperation in acid resistance. Nature Chemical Biology 7(10), 671–677.Google Scholar

Zhang, T. Q. O., Grechko, M., Moran, S. D. & Zanni, M. T. (2016b). Isotope-Labeled Amyloids via Synthesis, Expression, and Chemical Ligation for Use in FTIR, 2D IR, and NMR Studies. Protein Amyloid Aggregation: Methods and Protocols 1345, 21–41.Google Scholar

Zheng, Y., Lewis, T. L. Jr., Igo, P., Polleux, F. & Chatterjee, A. (2017). Virus-enabled optimization and delivery of the genetic machinery for efficient unnatural amino acid mutagenesis in mammalian cells and tissues. ACS Synthetic Biology 6(1), 13–18.Google Scholar

Zuger, S. & Iwai, H. (2005). Intein-based biosynthetic incorporation of unlabeled protein tags into isotopically labeled proteins for NMR studies. Nature Biotechnology 23(6), 736–740.Google Scholar

Zuo, C., Tang, S. & Zheng, J. S. (2015). Chemical synthesis and biophysical applications of membrane proteins. Journal of Peptide Science 21(7), 540–549.Google Scholar

Fig. 1. Molecular engineering toolbox for the structural biologist.

Fig. 2. Chemical modification of cysteine residues.

Fig. 3. Chemical modification of natural amino acids. (a) Modification of lysine ε-amines with activated esters such as N-hydroxysuccinimide. (b) Modification of terminal α-amines with 2-pyridinecarboxyaldehydes. (c) Three-component Mannich reaction for tyrosine modification at the ortho-position. (d) Coupling of carboxyls and amines with carbodiimides such as EDC.

Fig. 4. Native chemical ligation at cysteine followed by desulfurization to alanine for the construction of larger polypeptide chains without any ‘scars’.

Fig. 5. Examples of constructs prepared by NCL and EPL for X-ray crystallography studies. (a) D-alanine was introduced at position 77 in the sequence of the potassium channel KcsA to elucidate its ion selectivity mechanism (Valiyaveetil et al. 2006) (PDB ID: 2IH3). (b) Acetylated lysine (Ac) was incorporated at postions 401 and 408 in S-Adenosylhomocysteine hydrolase (SAHH) to evaluate the structural basis of enzyme inhibition (Wang et al. 2014b) (PDB ID: 4PFJ). (c) Chemical synthesis of HIV protease afforded the site-specific incorporation of unnatural amino acids such as 2-aminoisobutyric acid to modulate conformational dynamics and catalysis (Torbeev et al. 2011) (PDB ID: 3IAW). (d) Semi-synthesis of Mxe GyrA and the installation of β-thienyl-alanine instead of the native histidine at position 187 provided a route to trap the branched intermediate of the intein (Liu et al. 2014b) (PDB ID: 4OZ6).

Fig. 7. Protein engineering with inteins. (a) Expressed protein ligation. (b) Tagless protein purification. (c) Protein trans-splicing and recombinant production of segmentally isotopically labeled proteins.

Table 1. Intein toolbox for protein semi-synthesis

Fig. 8. C-terminal protein labeling with sortase. The acyl donor requires the LPXTG recognition motif, while the acyl acceptor often contains a pentaglycine sequence.

Fig. 10. Chemical modification of unnatural amino acids.

Fig. 14. Dual labeling of proteins with fluorophores. (a) A labeling strategy based on the combination of cysteine chemistry and amber suppression (Brustad et al. 2008). (b) The FlAsH labeling system can be used for the selective modification of a genetically encoded peptide tag, in combination with amber suppression (Perdios et al. 2017). (c) A dual labeling strategy based on native chemical ligation and amber suppression (Wissner et al. 2013). (d) Genetic incorporation of two UAAs using orthogonal ribosomes that can decode the AGTA quadruplet codon (Sachdeva et al. 2014).

Article contents

A molecular engineering toolbox for the structural biologist

Abstract

1. Introduction

2. Advances and challenges in structural biology

2.1 X-ray crystallography

2.2 Cryo-EM

2.3 NMR spectroscopy

3. Molecular engineering toolbox for complex biological samples

3.1 Cysteine chemistry

3.2 Chemical modification of other amino acids

3.3 Native chemical ligation

3.4 Inteins

3.4.1 The intein splicing mechanism

3.4.2 Applications in protein engineering

3.4.3 Toward fast and promiscuous inteins

3.5 Sortases

3.6 Genetic code expansion

3.6.1 Amber codon suppression in living cells

3.6.2 The amber suppression toolbox

3.6.3 Limitations and future directions

3.7 Chemical modification of unnatural amino acids

3.8 Enzymatic bioconjugation approaches

4. Protein engineering approaches for tackling outstanding challenges in structural biology

4.1 X-ray crystallography

4.2 Nuclear magnetic resonance

4.2.1 Segmental isotopic labeling

4.2.2 Site-specific incorporation of magnetic resonance probes

4.3 Studies of dynamic interactions

4.3.1 Incorporation of optical probes

4.3.2 Incorporation of vibrational probes

5. Outlook

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests