Introduction
Non-covalent interactions are fundamental for protein folding, structural stability and function. Traditionally, hydrogen bonds (H-bonds), hydrophobic effects and electrostatic and van der Waals interactions are assumed to be the major drivers of protein folding and stability (Dill and MacCallum, Reference Dill and MacCallum2012; Pace et al., Reference Pace, Fu, Fryar, Landua, Trevino, Schell, Thurlkill, Imura, Scholtz, Gajiwala, Sevcik, Urbanikova, Myers, Takano, Hebert, Shirley and Grimsley2014). However, the essentiality of other weak interactions in sculpting protein structures is also being discovered. For example, the importance of weak H-bonds, cation/anion-π, n → π* and dihydrogen bonding (H···H) interactions for the stability of protein structures is well understood (Derewenda et al., Reference Derewenda, Lee and Derewenda1995; Gallivan and Dougherty, Reference Gallivan and Dougherty1999; Manikandan and Ramakumar, Reference Manikandan and Ramakumar2004; Matta et al., Reference Matta, Hernández-Trujillo, Tang and Bader2003; Bartlett et al., Reference Bartlett, Choudhary, Raines and Woolfson2010; Lucas et al., Reference Lucas, Bauzá, Frontera and Quiñonero2016; Forbes et al., Reference Forbes, Sinha, Ganguly, Bai, Yap, Patel and Zondlo2017; Newberry and Raines, Reference Newberry and Raines2019; Juanes et al., Reference Juanes, Saragi, Jin, Zingsheim, Schlemmer and Lesarri2020). Apart from C, O, N and H that form these non-covalent interactions, divalent sulfur (S) is present in methionines (Met-Sδ), cysteines (Cys-Sγ) and cystines of proteins. S has unique bonding properties that allow it to interact with both electrophiles and nucleophiles (Rosenfield et al., Reference Rosenfield, Parthasarathy and Dunitz1977; Guru Row and Parthasarathy, Reference Guru Row and Parthasarathy1981). Consequently, methionine, cystine and cysteine are expected to contribute to distinct polar interactions in proteins, making it imperative to study them to better understand their structural properties and folding.
Although the polar bonding properties of S have long been known (Fig. 1a) (Rosenfield et al., Reference Rosenfield, Parthasarathy and Dunitz1977; Guru Row and Parthasarathy, Reference Guru Row and Parthasarathy1981), it has gained prominence in the field of organic molecules over the last decade (Andersen et al., Reference Andersen, Jensen, Mackeprang, Du, Jørgensen and Kjaergaard2014; Beno et al., Reference Beno, Yeung, Bartberger, Pennington and Meanwell2015; Rao Mundlapati et al., Reference Rao Mundlapati, Ghosh, Bhattacherjee, Tiwari and Biswal2015; Pascoe et al., Reference Pascoe, Ling and Cockroft2017; Wang and Fujii, Reference Wang and Fujii2017; Motherwell et al., Reference Motherwell, Moreno, Pavlakos, Arendorf, Arif, Tizzard, Coles and Aliev2018; Riwar et al., Reference Riwar, Trapp, Root, Zenobi and Diederich2018; Scilabra et al., Reference Scilabra, Terraneo and Resnati2019; Haberhauer and Gleiter, Reference Haberhauer and Gleiter2020; Kolb et al., Reference Kolb, Oliver and Werz2020; Wang et al., Reference Wang, Zhu, Feng, Yu, Hao, Zhu and Wang2020). In contrast, these properties of S are often overlooked in the study of proteins. Many standard textbooks of biochemistry categorise methionine as a non-polar hydrophobic amino acid (Lehninger et al., Reference Lehninger, Nelson and Cox2000; Berg et al., Reference Berg, Tymoczko and Stryer2002). However, the lone pairs of S can interact with electrophiles, such as H atoms, to form H-bonds with donor O or N (Fig. 1a) (Zhou et al., Reference Zhou, Tian, Lv and Shang2009; Rao Mundlapati et al., Reference Rao Mundlapati, Ghosh, Bhattacherjee, Tiwari and Biswal2015; Chand et al., Reference Chand, Sahoo, Rana, Jena and Biswal2020). Although S is less electronegative than O/N, the strength of S-mediated H-bonds such as N–H···S is comparable to the conventional N–H···O bond mainly because of its larger size, higher polarisability and the diffuse electron cloud (Gregoret et al., Reference Gregoret, Rader, Fletterick and Cohen1991; Rao Mundlapati et al., Reference Rao Mundlapati, Ghosh, Bhattacherjee, Tiwari and Biswal2015). Additionally, the divalent S has two electropositive regions on the extension of its two covalent bonds, referred to as σ-holes, which can interact with various nucleophiles (Fig. 1a) (Murray et al., Reference Murray, Lane, Clark, Riley and Politzer2012; Politzer et al., Reference Politzer, Murray and Clark2014; Politzer et al., Reference Politzer, Murray, Clark and Resnati2017). The interaction made by a σ-hole of S with a nucleophile is categorised as a chalcogen bond (Ch-bond) (Aakeroy et al., Reference Aakeroy, Bryce, Desiraju, Frontera, Legon, Nicotra, Rissanen, Scheiner, Terraneo, Metrangolo and Resnati2019).
Ch-bonds have been shown to play important roles in the self-assembly and catalysis of organic molecules (Iwaoka et al., Reference Iwaoka, Takemoto and Tomoda2002; Benz et al., Reference Benz, López-Andarias, Mareda, Sakai and Matile2017; Mahmudov et al., Reference Mahmudov, Kopylovich, Guedes Da Silva and Pombeiro2017; Chen et al., Reference Chen, Xiang, Zhao and Yan2018; Lim and Beer, Reference Lim and Beer2018; Vogel et al., Reference Vogel, Wonner and Huber2019; Carugo et al., Reference Carugo, Resnati and Metrangolo2021; Jena et al., Reference Jena, Dutta, Tulsiyan, Sahu, Choudhury and Biswal2022). Ch-bonds have a specific directionality, and a recent spectroscopic study on thiophenes has shown that the bond can be as strong as a conventional H-bond (Pascoe et al., Reference Pascoe, Ling and Cockroft2017). However, unlike the latter, the strength of a Ch-bond is independent of solvent polarity (Pascoe et al., Reference Pascoe, Ling and Cockroft2017). The occurrence of a divalent S-mediated Ch-bond in proteins has been documented previously and hypothesised to be functionally significant (Pal and Chakrabarti, Reference Pal and Chakrabarti2001; Iwaoka et al., Reference Iwaoka, Takemoto and Tomoda2002; Iwaoka and Isozumi, Reference Iwaoka and Isozumi2012, Reference Iwaoka and Isozumi2006; Iwaoka and Babe, Reference Iwaoka and Babe2015). Despite its potential importance, the precise role of the Ch-bond in protein structures and its effect on protein stability have remained unaddressed.
Many functional groups in proteins contain both electrophilic and nucleophilic centres with which S can interact to form H- or Ch-bonds (Fig. 1a). Hence, the direction of approach of the functional groups with respect to S and the nature of the bond formed are interlinked and could influence protein conformation. This prompted us also to ask whether S forms an H-bond or a Ch-bond with a functional group having both electrophilic and nucleophilic centres and what determines the choice between them (Fig. 1a). Here, we have addressed the above questions through extensive computational, cheminformatics and bioinformatics analyses. The study reveals the importance of Ch-bonds in the stability of protein structures. Using potential energy surface (PES) scan and atoms in molecules (AIM) analysis, we show that both H- and Ch-bonds can augment the stability of protein secondary structures, and through bioinformatics analyses, we identify some of the mechanisms of fold stabilisation. Analysis of the structures in the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB), and molecular electrostatic surface potential (MESP) calculations reveal the factors that influence the choice between H-bond and Ch-bond by S. The study, therefore, unravels the underappreciated role of S-mediated interactions, particularly Ch-bonds, in protein structure, stability and function.
Methods
Computational methods
All monomer and dimeric complexes used in this study were optimised at the M06/6-311++G(3DF,3PD) level using the Gaussian09 programme (Zhao and Truhlar, Reference Zhao and Truhlar2008; Frisch et al., Reference Frisch, Trucks, Schlegel, Scuseria, Robb, Cheeseman, Scalmani, Barone, Mennucci, Petersson, Nakatsuji, Caricato, Li, Hratchian, Izmaylov, Bloino, Zheng and Sonnenber2009). The positive eigenvalues of the Hessian evaluation confirmed that these optimised complexes are minima on the PES. The (CH3)2S:OH2 and (CH3)2S:NH3 complexes were used to investigate energetically favourable regions for S···H–O/N interaction. Similar investigations were performed on Cl(CH3)S:OH2 for S···O and Cl(CH3)S:NH3 for S···N interactions. To carry out a spherical energy scan using these complexes, d (distance between S and H/O/N) was kept constant, whereas θ (the angle between the centroid (c) of a triangle defined by C–S–C/Cl, S and H/O/N) and δ (the torsion angle between C, c, S and H/O/N) were varied in steps of 2˚. δ was varied from 90˚ to −90˚ and θ from 60˚ to 178˚. Complexation energies (ΔEs) for different θ and δ values without basis set superposition error correction were calculated for all these complexes (Hobza and Řezáč, Reference Hobza and Řezáč2016). MESP topographical analyses characterising the strength of lone pairs (V min) of divalent S were carried out using the rapid topography mapping implemented in DAMQT (Yeole et al., Reference Yeole, López and Gadre2012; Kumar et al., Reference Kumar, Yeole, Gadre, Lõpez, Rico, Ramírez, Ema and Zorrilla2015). Texturing of MESP of molecules at defined density surface (V S,max) was carried out using Gaussview 5.0 (Dennington et al., Reference Dennington, Keith and Millam2009). Model systems used to investigate the role of S···O interactions in α-helices and a β-strands were from the PDB coordinates of 1PVH and 4KT1. All the side chain atoms in these fragments were deleted, and the Cβ atom was replaced by H. A partial energy minimisation was carried out where all atoms other than hydrogen were frozen. The torsion angles (χ) between N, Cα, Cβ and S were varied for the PES scan. AIM analysis was carried out using AIM2000 (König et al., Reference König, Schönbohm and Bayles2001).
CSD analysis
Fragments F1–F6 provided in Fig. 1b were queried in structural data retrieved from CSD 5.43 (Groom and Allen, Reference Groom and Allen2014) (version 5.43 with updates till March 2022) using ConQuest version 2022 1.0 (Bruno et al., Reference Bruno, Cole, Edgington, Kessler, Macrae, McCabe, Pearson and Taylor 2002a). The following criteria of ConQuest were used for the search: (1) only intermolecular contacts; (2) 3D coordinates determined for all the atoms; (3) structures with crystallographic R-factor ≤10%; (4) no disorder in crystallographic data; (5) no error in 3D atomic coordinates; (6) no polymeric structures; and (7) normalisation of terminal H position. Data thus obtained were further processed and analysed using Mercury 2022 1.0 (Bruno et al., Reference Bruno, Cole, Edgington, Kessler, Macrae, McCabe, Pearson and Taylor 2002b).
Searches were done for only intermolecular contacts using the criteria dS···O ≤ 3.32 Å and dS···N ≤ 3.35 Å. For the identification of H-bond, dS···H ≤ 2.8 Å, which is 0.2 Å shorter than the sum of the van der Waal radii of S and H (Bondi, Reference Bondi1964), was chosen for higher stringency. The number of bonded atoms to S in F1–F6 were 2. The number of bonded atoms to O and N in F3–F6 were 2 and 3, respectively. The number of such fragments in CSD is provided in Table 1. The approach of H or O/N towards S in space was investigated using the 3D parameters provided in ConQuest (Fig. 2a). To segregate H- and Ch-bonds based on their θ and δ values, we calculated the mean values for clusters in F1 and F3. The range of θ and δ defining H- and Ch-bonds about the respective mean values were obtained by taking their mean ± standard deviation at 1 sigma of the calculated values. The values for the limits were rounded off to the closest value that was a multiple of 5. The mean of the angular values of δ was calculated using their modulus. This angular range of θ and δ for H- and Ch-bonds thus obtained were used throughout the study. All the plots and figures were generated using OriginPro 9.0 (Seifert, Reference Seifert2014).
Note: NT is the total number of independent pairs of fragments found in the CSD structures. NC is the total number of S···O or S···H–O and S···N or S···H–N contacts found in the CSD and the PDB, having distance between S and the atom less than the sum of their van der Waals radii. The values in the parentheses are equal to (NC/NT) × 100 and represent the frequency of occurrence of the above-mentioned contacts in the CSD.
Abbreviations: CSD, Cambridge Structural Database; PDB, Protein Data Bank.
a Fragments are from protein structures in the PDB.
PDB analyses
Protein structures determined using X-ray crystallography in the PDB (Rose et al., Reference Rose, Prlić, Altunkaya, Bi, Bradley, Christie, Di Costanzo, Duarte, Dutta, Feng, Green, Goodsell, Hudson, Kalro, Lowe, Peisach, Randle, Rose, Shao, Tao, Valasatava, Voigt, Westbrook, Woo, Yang, Young, Zardecki, Berman and Burley2017) were downloaded in July 2022. A set of protein structures with resolution ≤2.0 Å, R-factor≤25% and pairwise sequence identity ≤90% was generated using the PISCES server (Wang and Dunbrack, Reference Wang and Dunbrack2005), resulting in 25,107 structures. The criterion for pairwise sequence identity was excluded for searching ligands containing aromatic S. These structures were analysed using an in-house script written in Python 3.7.1. Search for H-bonds and Ch-bonds was made using the criterion of d (Å) ≤ (sum of van der Waals radii of S and O/N). To minimise the effect of structural constraints on the direction parameters, we excluded the contacts where S and O/N were separated by less than seven covalent bonds or were intramolecular. The direction criteria obtained using the CSD analysis were used to distinguish between H- and Ch-bonds.
To understand the effects of H- and Ch-bonds on proteins’ secondary structures, we searched the PDB for S···H–N (dS···N ≤ 3.6 Å, relaxed from 3.35 Å because it usually peaks at 3.6 Å in proteins (Zhou et al., Reference Zhou, Tian, Lv and Shang2009)) and S···O (dS···O ≤ 3.32 Å) bonds made by the peptide backbone with S. An additional criterion of 120˚ ≤ ζ ≤ 240˚, where ζ is a torsion angle (see Supplementary Fig. 1 for more details), was applied to exclude structures where N–H was not pointing towards the lone pair regions of S. Secondary structure information was obtained from the header in the PDB files. For N-terminal capping, we searched for S···H–N bonds where the amino group was of N1, N2 or N3 residue at the N-terminus of the helix. While for C-terminal capping, we searched for S···O interactions where the carbonyl O was of C1, C2 or C3 residue at the C-terminus of the helix (Aurora and Rose, Reference Aurora and Rose1998). A similar analysis was performed to investigate the effects of S···H–N, S···O and S···N interactions on the stability of α-helices and β-strands. In this case, the last four residues at the N- and C-termini of the helix were excluded, whereas all the residues belonging to the strand were considered. Search for metal-chelating cysteine used a distance between S and the metal to be within 1.9–2.8 Å, which ensured that interactions with metals of different ionic radii were identified. This distance range was calculated using an in-house script for all interactions between S and metals in protein structures in the PDB. Figures of protein structures were made using Chimera 1.13.1 (Pettersen et al., Reference Pettersen, Goddard, Huang, Couch, Greenblatt, Meng and Ferrin2004).
Results and discussion
Geometrical features that distinguish S-mediated Ch-bond from H-bond
Structures in the PDB solved using X-ray crystallography often lack positional information of H atoms, making it difficult to identify if the non-covalent bond between S and O/N is H-bond or Ch-bond. We devised a methodology to distinguish between the H- and Ch-bonds even if positional information of the H atom was absent. Towards this, we first analysed fragments F1 and F2 in the CSD having positional information of H to identify the preferred direction of approach of H towards S to form H-bond (Fig. 1b). Fragments F3 and F4 were studied to characterise the preferred direction of approach of O/N towards S to form Ch-bond. θ and δ were used to characterise the angular distribution for H- and Ch-bonds (Fig. 2a). The parameters measured for all such contacts in the CSD were plotted to illustrate the preferred range of distances and angles of these interactions (Fig. 2b, c and Supplementary Fig. 2a,b). The angular distribution thus obtained was compared with the complexation energy ΔE obtained from PES scans at different values of θ and δ for the model systems (Fig. 2b, c).
Two distinct clusters were observed in the θ–δ plot for S···H–O contacts. The boundary values of the two clusters were (i) 95° ≤ θ ≤ 145° and − 90° ≤ δ < −50° and (ii) 95° ≤ θ ≤ 145° and 50° < δ ≤ 90°, respectively. This matched with the location of the PES scan minima (Fig. 2b and Supplementary Fig. 2a). The clusters represented the direction of approach of the electrophile towards the lone pairs of S (Supplementary Fig. 3a). In the case of S···O contacts, a single cluster was observed at a different region of the θ–δ plot (115° ≤ θ ≤ 155° and − 50° ≤ δ ≤ 50°), which overlapped with the PES scan minimum (Fig. 2c and Supplementary Fig. 2b). The direction corresponded to the approach of the nucleophile towards the σ-hole on S (Supplementary Fig. 3a; Politzer et al., Reference Politzer, Murray and Clark2013; Aakeroy et al., Reference Aakeroy, Bryce, Desiraju, Frontera, Legon, Nicotra, Rissanen, Scheiner, Terraneo, Metrangolo and Resnati2019). Outliers in the plots were due to other strong interactions within the molecules, such as other H-bonds and stacking interactions (Supplementary Fig. 3b–d). The number of S···N interactions was much less than S···O (Table 1) possibly because of N being conjugated in most of the structures resulting in the lack of lone pair electrons for the formation of Ch-bond.
Delineation of Ch-bond from H-bond in groups having electrophilic and nucleophilic centres
We next sought the nature of bonding between S and functional groups having both electrophilic and nucleophilic centres because they are common in proteins. For this, we studied fragments F5 and F6 in the CSD (Fig. 1a,b and Table 1). Using distance criteria, we ensured that these fragments formed either S···H–O/N or S···O/N interaction. Due to structural constraints, both interactions did not occur simultaneously. From this set of interactions, contacts satisfying dS···H ≤ 2.8 Å were assigned as H-bond and the rest as Ch-bond. Note that this filtering strategy excluded those H-bonds having a distance between S and O/N greater than 2.8 Å. The CSD analysis revealed three clusters in the θ–δ plot (Fig. 2d). Interactions in two of these clusters had θ–δ values expected for H-bond and most of them satisfied the criterion dS···H ≤ 2.8 Å (Fig. 2b). S···O/N interactions with dS···H > 2.8 Å primarily clustered with θ–δ distribution that matched the directionality of Ch-bond (Fig. 2c). A few interactions in this cluster had dS···H ≤ 2.8 Å. Note that H–O/N groups that formed Ch-bond with S could also form H-bond with a neighbouring acceptor atom (Supplementary Fig. 3a).
In proteins, Ch- or H-bond can be formed between methionine and cystine with side chains of serine, threonine or tyrosine, or the backbone amide or water (fragments F7–F9) is equivalent to fragments F5 and F6. As there were very few examples of X1–S–H in the CSD analysis discussed above, we excluded interactions made by free cysteine from the analysis of PDB structures. As in the case of fragments F5 and F6, the θ–δ plot obtained by analysing F1–F4 revealed the segregation of angular values into three clusters corresponding to either H-bond or Ch-bond (Fig. 2e and Supplementary Figs 2c,d and 4). In summary, the θ–δ plot allowed us to identify and distinguish Ch-bond from H-bond in protein structures without positional information of H atoms.
Electronic environment of S determines the choice between Ch- and H-bond
We next sought to find what dictated the choice between the formation of Ch-bond and H-bond. In general, the formation of H- or Ch-bond depends on the strength of lone pairs and σ-holes on S, respectively, which are in turn affected by the nature of substituent groups (Adhikari and Scheiner, Reference Adhikari and Scheiner2014; Kumar et al., Reference Kumar, Gadre, Mohan and Suresh2014). Using MESP, we found the same in model systems relevant to biomolecules (Fig. 3a, b). The MESP analysis revealed two V min (MESP minimum), corresponding to the lone pairs on S in all the model systems (Fig. 3a). The electrostatic potential maps also showed the presence of two σ-holes on the extension of the S–X bonds, except in the case of [Fe(SCH3)4] complex (Fig. 3b) because of the anionic nature of metal-chelated S (Hirano et al., Reference Hirano, Takeda and Miki2016). Interestingly, we noted the ability of substituents to modulate the strength of the lone pairs and σ-holes on S in a reciprocal manner, which consequently was expected to affect the nature of the bond formed.
To see if this was true, we categorised all the contacts obtained from CSD based on the substituents linked to S (Fig. 4a and Supplementary Table 1). We analysed the corresponding structures to check if S formed Ch-bond or H-bond. Eighty-eight percent of S(Ar) and 73% of E–S–Y formed Ch-bond. In sharp contrast, more than 97% of M–S–Y formed H-bond. In comparison, saturated C/S/H substituents (R–S–R) appeared to have a lesser influence on the choice of the bond formed. The number of H-bonds (52%) was almost comparable to the number of Ch-bonds (48%) (Supplementary Table 1). The observations matched the expectations from the MESP analysis performed on the model systems. In summary, divalent S, when part of an aromatic ring or bonded to an electron-withdrawing group, is most likely to form a Ch-bond, whereas S coordinated with a metal can form an H- but not a Ch-bond.
Disulphide-linked S preferentially forms Ch-bonds, whereas metal-chelated cysteine form H-bonds in proteins
We studied interactions made by S of methionine or cystine with the hydroxyl, amino or carbonyl group of backbone amide, side chains of serine, threonine, tyrosine, aspartate, glutamate, arginine, lysine, histidine, asparagine, glutamine, tryptophan and bound water (Supplementary Table 2). Our analysis revealed that the disulphide-linked Cys-Sγ was more frequently involved in Ch-bonds (85%) than H-bonds (15%). In comparison, Met-Sδ appeared to form H-bonds (57%) only marginally more than Ch-bonds (43%) (Fig. 4b and Supplementary Table 3). The MESP analysis showed that S bonded to two methyl groups (C–S–C), as in methionine, had comparable values of V min and V S,max (Fig. 3a, b). In contrast, V S,max on a disulphide-linked S (C–S–S) was larger than V min, thus providing a rationale for cystine to preferentially form Ch-bonds.
Aromatic S preferentially formed Ch-bonds with groups containing O. This observation was consistent with previous reports of Ch-bonds between S in the aromatic rings of drugs containing thiophene, thiazole and thiadiazole groups and O in target proteins (Thomas et al., Reference Thomas, Jayatilaka and Guru Row2015; Zhang et al., Reference Zhang, Gong, Li and Lu2015; Koebel et al., Reference Koebel, Cooper, Schmadeke, Jeon, Narayan and Sirimulla2016; Kristian et al., Reference Kristian, Fanfrlík and Lepšík2018). S-mediated interactions with functional groups containing N did not show these features presumably because the delocalised lone pairs of N in backbone amide or the side chain precluded the formation of Ch-bonds. We next analysed the PDB for non-covalent interactions formed by metal-chelated cysteines, which occur in many metalloproteins. Consistent with the rules stated above and independent of the identity of the metal, the thiolate of cysteine preferentially formed H-bonds (M–S–Y in Fig. 4a). In summary, our analyses of the structures in the CSD and the PDB revealed that the nature of the interaction between functional groups containing electrophilic and nucleophilic centres and S was influenced by the substituent-dependent see-saw change in the strength of the lone pairs and σ-holes on S (Fig. 4b).
Role of S in helix capping
Capping satisfies the H-bond-forming abilities of the free backbone N–H or C=O of the terminal residues of an α-helix and is essential for the stability of α-helices in proteins and peptides (Aurora and Rose, Reference Aurora and Rose1998). The role of polar side chains of serine, threonine and asparagine, the acidic side chain of aspartate, the backbone amide of a neighbouring residue and metal-chelated S of cysteine in helix capping are well documented (Doig and Baldwin, Reference Doig and Baldwin1995; Aurora and Rose, Reference Aurora and Rose1998), but not those of methionine and cystine. As the N-terminus and C-terminus of α-helices have free backbone N–H (electrophile) and free backbone C=O (nucleophile), respectively, we asked if Met-Sδ or Cys-Sγ would interact and cap them.
We analysed protein structures in the PDB and found a number of examples of Met-Sδ or Cys-Sγ interacting with backbone amino at the N-terminus or backbone carbonyl at the C-terminus of α-helix (Fig. 5a and Supplementary Table 4). Consistent with previous observations (Doig and Baldwin, Reference Doig and Baldwin1995), we found that metal-chelated thiolates capped only the N-terminus of the helix by H-bonds. In contrast, 75% of Cys-Sγ capped the C-terminus by Ch-bonds, whereas the remaining 25% capped the N-terminus by H-bonds (Fig. 5b). Amongst the examples involving Met-Sδ, 37% of the interactions were H-bonds with backbone N–H of the N-terminal residues and 63% Ch-bonds with backbone C=O of the C-terminal residues (Fig. 5c).
Augmentation of the stability of regular secondary structures by S
In addition to backbone H-bonds, other non-covalent interactions such as C–H···O and n → π* are important for the structural stability of α-helices or β-sheets (Derewenda et al., Reference Derewenda, Lee and Derewenda1995; Manikandan and Ramakumar, Reference Manikandan and Ramakumar2004; Bartlett et al., Reference Bartlett, Choudhary, Raines and Woolfson2010). An earlier study reported methionine-forming intra-helical and inter-strand Ch-bonds with backbone O (Pal and Chakrabarti, Reference Pal and Chakrabarti2001). This prompted us to find if S could contribute to the stability of α-helices and β-strands through H- and Ch-bonds. We analysed structures in the PDB for Ch- and H-bonds between Met-Sδ or Cys-Sγ with backbone O or N–H of residues of α-helices or β-strands (Supplementary Fig. 5 and Table 5). Interestingly, we found an S-mediated H-bond involving backbone N–H of β-strands at the edge of β-sheets (Fig. 5d). We also found Ch-bond formed by S with free backbone C=O of edge strands.
Many elements of negative design, a mechanism that prevents the β-strand dimerisation and stabilises an edge strand of β-sheets, have been documented previously (Richardson and Richardson, Reference Richardson and Richardson2002; Koga et al., Reference Koga, Tatsumi-Koga, Liu, Xiao, Acton, Montelione and Baker2012). This includes the interaction of other regions of the protein with the edge β-strand, disruption of backbone H-bond formation by proline or a β-bulge or use of inward-pointing charged residues to prevent strand-mediated dimerisation. Our analysis revealed that H-bond or Ch-bond formed by backbone N–H or C=O of edge β-strand, respectively, with a neighbouring Met-Sδ or Cys-Sγ, is another element of negative-design that can stabilise β-sheets. Additionally, we found that in some proteins, the free backbone N–H of the insertion residue of a classical β-bulge formed an H-bond with Met-Sδ located two residues ahead (Supplementary Fig. 5).
To gauge the potential contribution of Ch-bonds to stabilise regular secondary structures, we performed a PES scan by varying the torsion angle χ about the covalent bond between Cα and Cβ of the cystines whose S formed a Ch-bond. One fragment each from an α-helix and a β-strand were chosen for the calculations (Fig. 6a). A plot of relative conformational energy versus χ showed that the minimum conformational energy corresponded to the χ of the respective crystal structures (Fig. 6b). The AIM analysis for minimum energy conformations showed the presence of a bond critical point (BCP) between S and O. ρ-values of these BCPs were 0.007 au for 1PVH and 0.011 au for 4KT1 (Fig. 6b), which were in the range suggested previously for favourable non-covalent interactions, that is, 0.002–0.035 au (Bader, Reference Bader1991). The analysis, thus, strongly suggested that Ch-bonds could provide extra stability to a particular conformation in protein molecules.
Conclusions
In this study, we have tried to understand the role of H- and Ch-bonds formed by divalent S in proteins. Computational analyses showed that the S-mediated interactions contributed to the stability of protein conformation and secondary structures. Hence, we conclude that S-mediated Ch- and H-bonds, like other weak interactions, are an essential aspect of the energy landscape in protein folding that compensates for unfavourable conformational entropy changes through favourable interactions (Grantcharova et al., Reference Grantcharova, Alm, Baker and Horwich2001; Dobson, Reference Dobson2003). Furthermore, we envisage that cooperativity among S-mediated and other weak interactions is likely to modulate their strengths with direct implications for protein function, which remains to be studied (Adhav et al., Reference Adhav, Pananghat and Saikrishnan2022). For example, we speculate that the propensity and strength of Ch-bonds would increase upon the delocalisation of lone-pair electron density of Cys-Sγ to form an n → π* interaction with a vicinal carbonyl group (Kilgore and Raines, Reference Kilgore and Raines2018). Also, the computational analyses reported here do not delineate the contribution of hydrophobic and van der Waals interactions from those of the polar Ch-bond and H-bond interactions towards structural stability.
S-mediated Ch- and H-bonds can contribute to the structural stability and substrate specificity of proteins, like other interactions formed by polar amino acids. However, S-mediated interactions can have properties different from other polar non-covalent interactions, for instance, the resistance of Ch-bond strength to solvent polarity (Pascoe et al., Reference Pascoe, Ling and Cockroft2017), thus bringing additional diversity to the repertoire of weak interactions essential for biomolecular functions. This could be a reason why, despite their high biosynthetic cost (Doig, Reference Doig2017), nature selected S-containing amino acids as part of the 20 building blocks of proteins. The wide variety of functionally relevant interactions made by S in proteins necessitates that these non-covalent interactions too are considered in the energy functions used for determining protein structures, folding pathways and binding properties. Also, the design and engineering of proteins and peptides would benefit from a better understanding of the distinct bonding properties of methionine and cysteine/cystine.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2023.3.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/qrd.2023.3.
Comments
Comments to Author: The manuscript entitled “Sulfur-mediated chalcogen versus hydrogen bonds in proteins: a seesaw effect in the conformational space” is nice piece of work. However, the following corrections/clarifications are needed before its publication.
1. Several important references are missing in the introduction part while discussing non-covalent interactions with sulfur (hydrogen bonding and chalcogen bonding). A few noticeable references are J. Phys. Chem. Lett. 2015, 8, 1385-1389, J. Am. Chem. Soc. 2017, 139, 1842−1855, Chem. Sci. 2017, 8, 2667−2670, Acc. Chem. Res. 2020, 53, 8, 1580-1592, J. Mol. Struct. 2020, 1211, 128080, Chem. Soc. Rev., 2022,51, 4261-4286. These references need to be cited and discussed in the revised manuscript.
2. Some of the important parameters such as lower electronegativity, charge and higher polarizability of sulfur should not be ignored while discussing hydrogen bonding with sulfur.
3. Why have different basis sets been used to optimize the monomer and dimer?
4. Why have the criteria for S---H bond distance taken less than 2.8 Å, whereas their sum of van der Walls radii is 3Å? Also, the S---N bond distance criteria were different for CSD (3.35 Å) and PDB (3.6 Å) search. The authors should provide a logical explanation for this.
5. A more detailed explanation should be given in the context of negligible sigma-hole formation on S when it is chelated to a metal center.
6. In the PDB analysis (page 8), the authors mentioned that coordinate files greater than 1 MB in size were excluded from both sets. What is the reason behind choosing based on file size?
7. Fluorescence-based studies:
Why are the experiments done at pH 7.6, slightly higher than the average body pH? Is there any particular reason for this?
The emission maximum for isolated tryptophan in a buffer medium and a protein environment is 350 nm±10 nm. However, in this report, it is 330 nm. Why such a blue shift in the tryptophan emission for MetRS?
For curiosity, can the author explain why fluorescence intensity changes upon ligand binding?
The Kd units in Figures 6C and G need to be correctly written for consistency.
Also, it would be more informative for the readers if the authors provided the emission spectra for titration of proteins and ligands in the supporting information.
8. There is no discussion on CD experiments in the manuscript’s main text. Instead of supplementary 10, the authors can write Figure S10 while discussing it. (page 23)
9. Did the authors perform ITC experiments for all the studied protein-ligand complexes like fluorescence experiments or only for MetRS:Met and MetRS:Nle ? For MetRS:Met complex, the enthalpy contribution is 40%, whereas the entropy contribution is 60% of the total free energy. However, the authors concluded this as prominently entropy-driven; how? Also, how the hydrophobic interactions are responsible for the entropy-driven process?