Application of linker technique to trap transiently interacting protein complexes for structural studies

Protein-protein interactions are key events controlling several biological processes. We have developed and employed a method to trap transiently interacting protein complexes for structural studies using glycine-rich linkers to fuse interacting partners, one of which is unstructured. Initial steps involve isothermal titration calorimetry to identify the minimum binding region of the unstructured protein in its interaction with its stable binding partner. This is followed by computational analysis to identify the approximate site of the interaction and to design an appropriate linker length. Subsequently, fused constructs are generated and characterized using size exclusion chromatography and dynamic light scattering experiments. The structure of the chimeric protein is then solved by crystallization, and validated both in vitro and in vivo by substituting key interacting residues of the full length, unlinked proteins with alanine. This protocol offers the opportunity to study crucial and currently unattainable transient protein interactions involved in various biological processes.


BACKGROUND
Molecular recognition lies at the heart of all biological processes. Interactions between two proteins can be characterized in terms of binding affinity, which describes how tightly two binding partners interact. The binding process is regarded as an equilibrium, resulting from a balance between association and dissociation events [1]. Protein-protein binding affinity can be influenced by various non-covalent intermolecular interactions, such as hydrogen bonds, electrostatic, hydrophobic and Van der Waals forces, as well as by macromolecular crowding caused by high concentrations of macromolecules other than the binding partner(s) [2]. Protein complexes with a low dissociation constant (~ nM) are regarded as high affinity protein complexes, such as interactions between antigen and antibody complexes, whereas protein complexes with a high dissociation constant have a low affinity, such as protein complexes involved in intracellular signaling. In biological systems, these associations are usually reversible, but irreversible covalent bonding can also be observed [3][4][5].
Since the realization that many diseases stem from abnormalities in protein-protein interactions, it has become imperative to elucidate how changes in binding influence disease progression in order to develop appropriate therapeutics. While stable, high affinity protein-protein complexes can be generated following well-established procedures [6,7], most intracellular signaling events are transient, adopting a 'hit-and-run' strategy that makes it difficult to trap the interaction for structure determination. These transient interactions can be classified into two types: (1) interactions where binding partners have an ordered structure or (2) interactions where one or more of the binding partners only gains secondary structure upon binding.
In recent years, a large number of proteins in the eukaryotic proteome have been determined to be intrinsically unstructured proteins (IUPs) [8][9][10][11]; i.e., they contain no or very little well-defined structure and lack a compact globular fold. IUPs are involved in transcriptional regulation, cellular signaling, cell cycle control, endocytosis, replication, and biogenesis of the cytoskeleton [12,13], and they can bind to different proteins using consistent or different interfaces [14]. Unfolded proteins have been reported to fold or attain secondary structure upon binding with their interacting partner(s) [15,16], and studies note that these disordered regions that undergo a transition into order are generally shorter regions [17], in some cases, less than 30 residues [17]. Nevertheless, this highly flexible state of an IUP is fundamental to their biological role, and allows them to bind multiple partners and adopt different conformations. The intrinsic flexibility of these proteins also confers several functional advantages, such as specificity without excessive binding strength, increased speed of interaction, and binding promiscuity, all highly desirable for signaling and regulatory processes [17].
Methods such as X-ray crystallography, NMR spectroscopy, and electron microscopy require stable protein complexes. Thus, in order to understand the nature of transient and unstructured protein-protein interactions, a method is required to trap them into a stable conformation. Here, we present a protocol to trap transiently interacting protein complexes, for which one of the binding partners is unstructured (Fig. 1). We employed a flexible glycine (Gly) linker to fuse the two interacting partners. Being highly flexible, the poly-Gly linker will not impose any spatial restriction on the mobility of the linked proteins. The presence of the linker increases the proximity of the two proteins of interest and retains the interaction. Following a review of the available literature [18], we devised a method to design and produce a chimeric protein construct using a suitable Gly or Gly-rich linker, and proceeded to test this method using an interaction that was previously unable to be solved by co-crystallization [19]: Calmodulin (CaM) binding with two of its intrinsically disordered, neuron-specific substrate proteins, Neuromodulin (Nm) and Neurogranin (Ng). Initial attempts to co-crystallize CaM with the IQ motifs of Nm and Ng did not yield complex crystals despite having strong evidences for the complex formation. Employing this method we generated stable complexes of CaM with the IQ motifs of Nm and Ng IQ using a flexible Gly linker, crystallized and solved the structures, and validated the results with unlinked full-length proteins and alanine substitution mutagenesis. For the first time, here we present a detailed protocol for the design and validation of Gly-linkers to trap the transiently interacting protein complexes, where one of the binding partners is intrinsically disordered. When a protein complex is first targeted for structural study, initial attempts are made to determine the affinity and stability of the complex using various biophysical techniques such as isothermal titration calorimetry (ITC), surface plasmon resonance (SPR) and other fluorescence methods. Often, working with the full-length unstructured protein is troublesome and can be replaced with a peptide comprising the minimum binding region (MBR) to mimic the binding of the full-length proteins. The MBR can be determined using co-immunoprecipitation, gel-shift assays and biophysical binding studies. In the present case, the MBRs of Nm and Ng were previously determined to be the IQ motif [20,21]. We thus designed two different length peptides (19 and 24 amino acids) from Nm and Ng and assessed their binding affinities towards CaM using ITC.
If the specific site of the interaction is known, a Gly or Gly-rich linker of an appropriate length can be designed to facilitate positioning of the MBR peptide within proximity of the binding region of the stable protein; this can be achieved with the knowledge that each glycine residue corresponds to a length of ~3.8 Å [22]. If the site of the interaction is unknown, various studies such as mutagenesis, docking experiments, limited proteolysis and Hydrogen/Deuterium (H/D) exchange experiments can be employed and various linker lengths can be tested to identify the appropriate length that facilitates an interaction. In the present case, we used DeepView analysis of a known CaM-peptide complex, CaM-myosin V IQ motif peptide complex (PDB code: 2IX7), to estimate the approximate site of the interaction for Nm and Ng with CaM. Thus, using literature and computational analyses, we identified a (Gly) 5 linker as sufficient to link the Nm/Ng MBR to the C-terminus of CaM.
A three-step fusion PCR procedure was then employed to link the two proteins, where the sequence for the linker is incorporated into the reverse primer of the CaM gene and the forward primer of Nm/ Ng IQ motif gene, with fusion performed in the next round of PCR. The fused constructs are expressed in E. coli and the proteins are then purified using Ni-NTA affinity chromatography and the chimeric proteins characterized by size-exclusion chromatography (SEC) and dynamic light scattering (DLS). Further analytical ultra-centrifugation (AUC) and circular dichroism (CD) can also be performed to verify the presence of a well-folded intact complex. While we describe here the use of hanging drop vapor diffusion for crystallization of the linked complexes, sitting drop and under oil methods can also be used. Finally, the structural results of the complex obtained with the chimeric protein derived by linking the binding partners was validated by mutating various key interacting residues identified from the linked complex in full-length unlinked proteins/domains in vitro and in vivo. We validated our structural findings using ITC and relevant electrophysiological experiments in vivo.

Recipes
Different lengths of Nm and Ng IQ motif peptides were commercially synthesized and obtained from GL BiochemLtd (Shanghai, China). Stock solutions of 1 mM of Nm/Ng IQ peptides were prepared by dissolving the appropriate amount of peptide in 20 mM imidazole pH 8.0, 100 mM NaCl and 2 mM EGTA (Buffer A). After mixing, the peptide stock solutions were stored at 4°C to ensure complete peptide solubility. The choice of buffer is based on obtaining maximum solubility of the peptides. NOTE: Synthetically derived peptides should be verified for solubility when dissolved in a buffer of interest. Sometimes preparing high concentrations of peptide stock solution may result in precipitation. Lower concentrations of stock solutions can be prepared to avoid this problem.
9 TAE buffer: Tris-acetate-EDTA buffer is used for preparing agarose gels and for electrophoresis. 9 Double digestion: NdeI and XhoI restriction enzymes were from New England Biolabs, UK. Double digestion was performed using these enzymes according to manufacturer's instructions. 9 T4 DNA ligase: Double digested vector and insert were ligated using T4 DNA ligase (Fermentas) using manufacturer's instructions. 9 Gel extraction kit: PCR products were purified using the GeneAll kit (GeneAll Biotechnology, Korea). Other commercially available kits can also be used. 9 Plasmid preparation kit: Plasmids purified using the Qiagen Plasmid Prep Kit. Other commercially available kits can also be used. 9 Coomassie protein assay reagent: Coomassie protein assay was used to measure protein concentration. Homogenously mix 0.5 ml of water and 0.5 ml of reagent and use it as blank. For each sample, add 1 µl of protein solution to this blank solution and mix well. Measure the absorbance of the sample at 595 nm using the blank as a reference. NOTE: If the concentration of the protein is above 5 mg/ml, it is advisable to dilute the protein for accurate measurement of protein concentration.
9 Optimization of crystallization conditions: Once the initial crystallization conditions are identified from commercial screens, further optimize this condition to obtain the best diffraction-quality crystals. In the present case, we obtained crystals in magnesium acetate, PEG3350 and sodium citrate tribasic conditions. Hence, the following stock solutions are prepared: 1.0 M magnesium acetate tetrahydrate (Dissolve 10.7 g of MgAcO in 45 ml of water, and top up to 50 ml), 50% w/v PEG3350 (Dissolve 25 g of PEG3350 in 30 ml of water, using a hot plate if necessary to dissolve completely, and make up to a final volume of 50 ml) and 1.6 M sodium citrate tribasic (Dissolve 23.5 g of sodium citrate tribasic in 40 ml of water and top up to 50 ml). NOTE: Prepare fresh before use.
1.1. Wildtype (WT) CaM, Nm and Ng are purified, as described previously [19]. 1.4. Thoroughly wash the sample cell and the syringe using Buffer A. NOTE: Extensive washing of the cell and syringe is necessary to avoid any contaminants carried over from previous runs.

1.5.
Fill the reference cell with degassed buffer solution using a long-needle syringe.
1.6. Fill the sample cell with 10 µM of Nm/Ng full-length protein, taking care to avoid the appearance of air bubbles in the cell. For the VP-ITC instrument, the net volume is ~1.5 ml. 1.14. After the experiment, wash and maintain the ITC system according to the manufacturer's recommendations.

2.2.
At the pdb website (www.rcsb.org), search for the template model and download the pdb file. In the present case, we identified pdb 2IX7 as a template model to design the linked construct.

NOTE:
In some cases, a search for the PDB may need to be performed using the sequence of the protein of interest to identify the closest homolog complex, if available.

2.3.
Use the identified template pdb and modify the template (in this case, 2IX7) to generate the models for the selected interaction (in this case, apo-CaM-NmIQ and apo-CaM-NgIQ complexes). TROUBLESHOOTING

2.4.
Mutate the existing motif residues (in this case, myosin V IQ motif residues in the pdb 2IX7) with the sequence from MBR of the peptide of interest (in this case, Nm/Ng IQ motif) using the "MUTATE" operation.

2.5.
Use "ENERGY MINIMIZATION" to repair distorted geometries obtained due to mutating residues and for releasing internal constraints. Positions of the side-chains from the mutated residues can then be refined using this option.
2.6. Once all residues are mutated based on the MBR of the peptide of interest (in this case, Nm/Ng IQ motif sequence), use the "SAVE" option to save the current model with an appropriate name, (in this case, "Mod-el_Nm.pdb" or "Model_Ng.pdb").

Run
Pymol and open the model (Model_Nm.pdb) and use the "Measurement" option to measure the distance between the selected terminus of the protein (C-terminal, CaM) and the terminus of the peptide (N-terminus, Nm). Repeat the same for other peptides (Ng).

NOTE:
This is an approximate way to determine the distance between the protein and peptides (CaM and Nm/Ng IQ motifs). Measurements obtained using this option provide only a linear distance. However, the model needs to be verified closely to determine the possible length of the linker required.
3. Fusion PCR and cloning: Linking MBR to structurally stable protein. 3.4. Set-up a PCR reaction (25 µl) to fuse the two genes using genes amplified in the above step as template.
Use the same PCR program described in step 3.3.

3.5.
Gene fusion is then verified using 1.5% agarose gel electrophoresis. PCR products obtained from step 3.4 should show a size corresponding to the sum of the two parts: in this case, CaM and IQ motif (Fig. 2C). TROUBLESHOOTING

3.6.
Purify the PCR product from the gel using the GeneAll Gel extraction kit.
3.7. Double digest the purified PCR product (step 3.6) and the pGS21a vector (Genscript, USA) with NdeI and XhoI restriction enzymes and purify the double digests using the GeneAll PCR purification kit.

3.8.
Ligate the double digested PCR product and pGS21a vector using T4 DNA ligase. TROUBLESHOOTING

3.9.
Transform the ligated product into chemically competent E. coli DH5α cells using heat shock (Recipes).
3.10. Inoculate colonies in 3 ml of LB broth and grow the culture at 37°C for 12-16 h. Perform a plasmid extraction using Qiagen mini prep kit.
3.11. Verify the plasmids with DNA sequencing.

3.12.
Plasmids that contain the fused gene are then transformed into chemically competent E. coli BL21 cells using heat shock (Recipes).

4.1.
For initial protein expression, inoculate a single colony in 100 ml of LB medium overnight at 37°C. Transfer the inoculum into 1 L of LB media (supplemented with 100 µg/mL ampicillin) and grow the culture at 37°C until the OD 600 reaches between 0.6-0.8. The culture should then be maintained at 16°C before protein expression is induced with 0.15 mM IPTG. Cells are then grown for 16 h at 16°C. NOTE: Culture conditions, such as the IPTG concentration and temperature for IPTG induction may vary from protein to protein.

4.2.
For Single wavelength Anomalous Dispersion (SAD) phasing, seleno-L-methionine (SeMet) labeled proteins are produced using LeMaster media (Recipes). The same culture conditions as described in step 4.1 are used, with the exception that the LeMaster medium is used and the plasmids are transformed into DL41 cells (methionine auxotrophic strain).

4.3.
Cells from the 1 L culture are then collected by centrifugation at 9,000 g for 30 min using Avanti J-26 XP centrifuge (or similar). Resuspend the cell pellet obtained in 40 ml of lysis buffer (Recipes) in a 50 ml falcon tube. HINT Cell pellets obtained after centrifugation can be stored at -20°C for future use.

4.4.
Sonicate the cell suspension using 1 s ON/OFF pulses for 5 min. Centrifuge the cell lysate at 39,000 g for 30 min using Avanti J-26 XP centrifuge (or similar) to obtain a clear supernatant. NOTE: Make sure that the tip of the sonicator probe does not touch the sides and bottom of the falcon tube. Adjust the tube if an unusual loud noise is heard. NOTE: Make sure that the cell lysate is centrifuged to separate out the cell debris and the soluble protein fraction.

4.5.
Mix the supernatant with 5 ml of Ni-NTA resin that has been pre-equilibrated with lysis buffer and incubate for 1 h to allow binding. Wash the resin three times with lysis buffer without TritonX-100 (wash buffer) and elute the bound proteins using 10 ml of wash buffer supplemented with 500 mM imidazole. NOTE: Make sure that the Ni-NTA resin is washed with water thoroughly and then equilibrated with lysis buffer. 4.12. Start the machine by clicking "start" (left panel). The counts for the blank should be less than 10 kilo counts/s (kCnt/s). Click "stop" (left panel) to stop the reading. NOTE: Extensive washing is required to avoid any interference from contaminants in previous runs.

4.13.
Remove the water and dispense 20 µl of concentrated protein into the cuvette. Click "start" (right panel), collect 20 readings and then click on "stop" (right panel). The schematic representation of dynamic light scattering is depicted in Figure 2E. TROUBLESHOOTING 4.14. Once the readings are saved, remove the cuvette from the DLS machine, wash it with water and re-test the blank. Switch off the machine once the readings are taken for both the fused proteins.

5.1.
Crystallization trials are performed with 12 mg/ml of apo-CaM-(Gly) 5 -NmIQ and apo-CaM-(Gly) 5 -NgIQ using hanging drop vapor diffusion, as described previously [26]. The schematic representation of crystallization and structure determination is depicted in Figure 2F. NOTE: Do not disturb the crystallization trays frequently after setting up the drops.  [26]. NOTE:

POL Scientific
Concentrations should be varied gradually, as drastic variations in the concentrations of crystallization conditions may result in the loss of crystal formation.

5.5.
In this case, crystals are then tested using the in-house Rigaku Raxis V+. To test the crystals, single crystals are transferred to crystallization buffer supplemented with 10% glycerol using an appropriate size loop and soaked in cryoprotectant for 1 min. Using an appropriate sized loop, the crystals are then flash-cooled in N 2 cold stream at 100 K.
5.6. The following parameters are then used to test the diffraction quality of the crystal: distance to detector (60 mm), oscillation width (0.5°/image), oscillation range (1°), exposure time (30 sec/frame). 5.7. The best crystals (diffract to 3 Å and with mosaicity < 1°) are then transferred and stored in liquid nitrogen storage cans. CAUTION: Wear cryoprotective gloves and a face mask when handing liquid nitrogen.

5.8.
Ship the crystals to the NSLS, Brookhaven National Laboratory, USA or the National Synchrotron Radiation Research Center (NSRRC), Taiwan, to collect complete SAD datasets for structure solution.

5.10
. Process all the datasets using HKL2000 [27] and the .sca files generated are used for locating heavy atom (Se) positions, phasing and density modification using ShelxC/D/E [28] from CCP4. NOTE: It is important to follow the step-by-step procedure for scaling and phasing. CRTICAL STEP: Other programs, such as Phenix Autosol, can also be used for heavy atom (Se) phasing.

5.11.
Heavy atom locations obtained from ShelxC/D/E are then further used to autobuild, based on the sequence using Buccaneer [29] from CCP4. NOTE: Other programs, such as Phenix Autobuild, can also be used for sequence-based model building from heavy atom locations.

5.12.
Check the model obtained from the Buccaneer program in COOT [30]. Where necessary, the model can be manually built in COOT. NOTE: Build the model where significant electron density is available.

5.13.
In the present case, the crystal analysis revealed the presence of a twin fraction. Hence twin refinement was carried out during the refinement stage using Refmac5 [31].
5.14. Stereochemistry of the models obtained are then verified using PROCHECK [32] in CCP4.
6.1. Closely examine the structures (in this case, those of apo-CaM-(Gly) 5 -NmIQ/NgIQ) to identify key interacting residues (in this case, from CaM and Nm/Ng IQ motif). NOTE: Use Contact run from CCP4 to identify the atoms involved in interactions and the distance between the interacting atoms (contacts <3.8 Å are considered for hydrophobic interactions and <3.2 Å are considered for H-bonding contacts).
6.2. Site-directed mutagenesis of the interacting residues from the full-length protein can then be performed using Inverse PCR [33]. In this case, the mutations for CaM were D81A, E85A, F90A, E115A, E121A, M125A, F142A, E85A/F142A and F90A/E121A; for Nm, F42A, R43A and I46A; and for Ng, I33A, Q34A, F37A and R38A. Primers were designed as described in the paper by Dominy et al. [33]. NOTE: In this case, double mutations of CaM were obtained sequentially by performing the second mutation on the plasmid consisting of the first mutation.

ANTICIPATED RESULTS
The Gly-linker strategy has been previously used to stabilize various protein complexes for structural studies [18]. We have adopted this procedure to determine the crystal structure of CaM in complex with two of its intrinsically unstructured binding partners, Nm and Ng, and further validated the findings in vitro and in vivo [19]. We identified the MBR of Nm and Ng that interacted with CaM using ITC experiments using various peptide lengths of Nm/Ng. We show that the MBR determined with ITC is comparable with the binding of full length Nm/Ng with CaM ( Fig. 3A and 3B). We employed known structures of CaM-IQ motif complexes to generate a template model of the CaM-Nm/Ng complex using DeepView, in which the IQ motifs of Nm/Ng interacted with the C-lobe of CaM. This is consistent with previous findings that Nm and Ng interacted with the C-lobe of CaM [34][35][36]. Thus, we linked the IQ motif of Nm/Ng to the C-terminus of CaM using a (Gly) 5 linker. It is important to note that the distance between the two termini of the binding partners will vary depending upon the protein-protein complex being studied.
Expression of fused constructs avoids the need to purify individual proteins and it alleviates the inherent difficulties associated with generating a homogenous and stable complex. With our strategy, we observed homogeneous complexes of both Nm and Ng linked with CaM (Fig.3C). Subsequently, the linked complexes were crystallized and diffracted to 2.7 Å. The molecular replacement method did not yield the structure solution. Thus, SAD phasing was performed using seleno-L-Methionine labeled proteins (Fig. 3D). Notably, the IQ motif of Ng and CaM adopted an intermolecular interaction [19]. This result demonstrates that if the linker length is not sufficient for the chimeric protein to engage in an intramolecular interaction, it will instead adopt an intermolecular interaction to stabilize the complex. This type of interaction has been observed previously for linked complexes [37].
It is important to validate the interactions obtained with linked binding partners. We performed this validation by mutating key residues involved in the interactions in the full length proteins, testing the binding in vitro with ITC and in vivo with electrophysiological experiments. In ITC experiments, substituting any key residue from Nm and Ng with alanine resulted in loss of interaction with CaM, while substituting any key residue from CaM with alanine resulted in reduced affinity towards Nm and Ng [19]. Furthermore, the in vivo experiments showed that mutating a key residue in Ng IQ motif resulted in its inability to potentiate synaptic transmission in CA1 hippocampal neurons [19]. Similarly, any appropriate in vivo experiments could be performed for any particular protein complex studied using this protocol. This protocol can be adopted to study other transiently interacting protein complexes involving structured or unstructured proteins, and offers potential application in studying the transient protein-protein interactions involved in various biological processes. Protein has leaked due to tubing connections.

POL Scientific
Check the connections of the loop. Try to reconnect and inject the protein UV lamp is not sensitive; this is particularly a concern if the protein concentration is low