C687 Tutorial: Homology Module


This tutorial includes ASSIGNMENT 2 (submission deadline Mon 2/24) See instructions at the end of the tutorial.

The Homology Module allows you to build a 3D model of a protein whose 3D structure has not been experimentally determined IF you already know the structure of one or more homologous protein(s).

The protein(s) whose structure(s) is/are aleady known are referred to as the "reference" or "real" protein(s). The protein whose structure you are trying to build is called the "model", "unknown", or "sequence" protein.

The steps involved in this process are:

  1. Obtain the sequences of both the unknown and the reference proteins.
  2. Align these sequences using your favorite alignment tools.
  3. Obtain a structure file for each reference protein
  4. Read the sequences and structures into InsightII
  5. Identify which sequence corresponds to which structure
  6. Find structurally conserved regions (SCRs) in the reference proteins (only possible if there is more than one reference protein)
  7. Copy the coordinates of the conserved regions in one of the reference proteins to the model protein.
  8. Propose structures for the loops or variable regions (VRs) between the SCRs.
  9. Make corrections to sidechain conformations as appropriate.
  10. Refine the structure my energy minimization and/or molecular dynamics.
In this tutorial, we will build a model of the structure of a zinc finger domain whose structure has not been determined. Our model will be based on the similar sequence and known structure of another zinc finger protein.

Unknown protein:
>976347 Human zinc finger homeodomain protein (Res 724-750)
KPFRCEVCNYSTTTKGNLSIHMQSDKH

Reference protein:
>3znf.pdb Human enhancer binding protein zinc finger
RPYHCSYCNFSFKTKGNLTKHMKSKAHSKK
(minimized average NMR structure)

Note 1: This is a relatively simple case because:

  1. Zinc fingers are small and very highly conserved domains. In our example 14/27 residues are identical and there are no gaps in the alignment although the unknown protein is slightly shorter than the reference protein.
  2. The alignment is obvious.
  3. We will only use a single reference protein.
Note 2: Although it is simpler to use a single reference protein, IT IS MUCH MORE RELIABLE TO USE SEVERAL REFERENCE PROTEINS. This is because comparison of the several known structures allows you to identify regions of STRUCTURAL conservation in addition to regions of sequence conservation. In a real life example you should choose to use several reference proteins if possible.

Step 1: Getting Started

Duration: ~10 minutes
Purpose: This section should teach you how to start up Insight/Homology remotely and give you a brief overview of the program layout.

  1. The Homology module is currently licensed only on chemvgx and splatter. Therefore, when working on other machines it is necessary to run the program remotely on chemvgx or splatter, then display on your local monitor. Half of the class should choose each machine. e.g. To run remotely on chemvgx,
    xhost chemvgx
    telnet chemvgx (then login)
    cd to appropriate directory
    insightII
  2. Choose the Homology module and take a few minutes to browse through the menu options

Step 2: Reading in and Aligning Sequences

Duration: ~25 minutes
Purpose: this section should teach you how to read in, manipulate, and align sequences and boxes.

The sequence alignment and pdb files needed in this tutorial are in the directory /ruser/instruct1/stone/C687/homology.

  1. molecule-get
    Get 3znf.pdb
    The name of the molecule you get is ZNF
  2. Sequences-get
    Choose alignment and get zf.align
    A sequence window should appear containing two sequences
  3. Sequences-copy
    Copy from: ZNF1
    Copy to: ZNF
    This generates a new sequence which is identical to the ZNF1 sequence but is understood by the program to be the sequence of the displayed protein ZNF whose structure is known.
  4. Sequences-delete
    Delete ZNF1 (which is no longer useful)
  5. There are many different manipulations you can do with the sequences. One of the simplest is to align two sequences automatically within Insight.
    Alignment-Pairwise_sequence-Automatic
    Specifiy the 2 sequences and execute
    Note that they are now perfectly aligned within the sequence window.
  6. Now spend the next ~15 minutes experimenting with the various options you have to manipulate the sequence. Most of these are mouse driven options. They are described in detail in the help menu invoked within the sequences window. They basically fall into 2 categories distinguished by specifying the mode in the sequences window:
    Boxes are used to specify regions of 2 sequences that are aligned and regions of a sequence that should be mapped onto a particular region of a known structure. Boxes cannot contain gaps!

    This section is going to be particularly important for people who are doing any homology modeling in their projects.

  7. Once you have become reasonably comfortable with the mouse driven options, go back to where you were after point 5 above. You can do this either by deleting boxes and realigning the sequences or just by deleting everything and starting over.

Step 3: Assigning Coordinates fo the Unknown Protein

Duration: ~5 minutes
Purpose: To learn how to assign coordinates to the unknown protein.

  1. Draw a box around the region of the 2 sequences that is perfectly aligned. Under boxes-freeze specify the box number (boxes are numbered starting from 0). Freezing prevents the box from being changed and is necessary before you assign coordinates. Frozen boxes are colored red.
  2. Sequences-AssignCoords
    Give box number and choose bump check (which will look for steric violations as the new structure is generated). Spend some time looking at the new structure and comparing it to the old structure.

    Note: When you assign the coordinates for proteins that are not perfectly aligned, the process is somewhat more complicated. Basically it involves:


Step 4: Fixing the Geometry

Duration: ~35 minutes
Purpose: To learn how to get from the initial 3D structural model of the unknown protein to a structure that is more stable and structurally and chemically reasonable.

  1. Comparison of the the unknown and reference protein structures. Take 10 minutes to look at various details of the two structures which should be superimposed on the screen and observe the similarities and differences.

    In particular, look at the confomations of the backbone, the sidechains of residues that are identical in both proteins, the sidechains of residues that are similar in the two proteins, and the sidechains of residues that are quite different in the two proteins. Think about how the program has assigned conformations to each of these parts of the unknow protein.

  2. In regions where the structure of the unknown protein does not seem to be optimal, we can change it either by changing specific dihedral angles or by global energy minimization.
  3. First try changing rotamers of dihedral angles. Experiment with the Residue-Manual_Rotamer and Residue-Auto_Rotamer options.
  4. Then try doing an energy minimization. You should save your work now, then log out of the remote CPU and start up Insight on your local computer to do this exercise. Otherwise it will take forever!
The zinc finger is an awkward example for energy minimization because of the zinc atom (which is not present in our current model). We will use the Discover module to minimize the energy of the model protein but we will FIX the backbone conformation and we will FIX the sidechain conformations of the zinc ligands (the two conserved Cys and two conserved His residues).

Currently, the name of the moleucle starts with a "$" character. You must change the name of the molecule before proceeding: Click on Object/Rename, an drename the molecule.

Under Biopolymer, choose a forcefield and fix potentials and charges.

Under Discover

Constraint-Fix
to fix the backbone conformations of all resiudes and the sidechain conformations of 5, 8, 21, and 27.
Parameters-Minimize
Steepest descent algorithm
2000 iterations
Derivative of 1.0
Run
run
Local host
Interactive

Step 5: SUBMIT ASSIGNMENT 2

Display your final minimized structure so it looks pretty and save it in a .psv folder entitled your_name.A2.psv

Put a copy of the folder in the directory /ruser/instruct1/stone/C687/assignments NO LATER THAN Mon 2/24


Back to  |  C687 Summer 1997  |  Courses & Instruction  |  MolViz Home  |
Send comments to chemvis@indiana.edu
Last updated: 01/23/2001