C687 Tutorial: Homology Module
This tutorial includes ASSIGNMENT 2 (submission deadline Mon 2/24)
See instructions at the end of the tutorial.
The Homology Module allows you to build a 3D model of a protein whose 3D
structure has not been experimentally determined IF you already know the
structure of one or more homologous protein(s).
The protein(s) whose structure(s) is/are aleady known are referred to as
the "reference" or "real" protein(s).
The protein whose structure you are trying to build is called the "model",
"unknown", or "sequence" protein.
The steps involved in this process are:
- Obtain the sequences of both the unknown and the reference proteins.
- Align these sequences using your favorite alignment tools.
- Obtain a structure file for each reference protein
- Read the sequences and structures into InsightII
- Identify which sequence corresponds to which structure
- Find structurally conserved regions (SCRs) in the reference proteins (only
possible if there is more than one reference protein)
- Copy the coordinates of the conserved regions in one of the reference
proteins to the model protein.
- Propose structures for the loops or variable regions (VRs) between the SCRs.
- Make corrections to sidechain conformations as appropriate.
- Refine the structure my energy minimization and/or molecular dynamics.
In this tutorial, we will build a model of the structure of a zinc finger
domain whose structure has not been determined. Our model will be based on
the similar sequence and known structure of another zinc finger protein.
Unknown protein:
>976347 Human zinc finger homeodomain protein (Res 724-750)
KPFRCEVCNYSTTTKGNLSIHMQSDKH
Reference protein:
>3znf.pdb Human enhancer binding protein zinc finger
RPYHCSYCNFSFKTKGNLTKHMKSKAHSKK
(minimized average NMR structure)
Note 1: This is a relatively simple case because:
- Zinc fingers are small and very highly conserved domains. In our
example 14/27 residues are identical and there are no gaps in the alignment
although the unknown protein is slightly shorter than the reference protein.
- The alignment is obvious.
- We will only use a single reference protein.
Note 2: Although it is simpler to use a single reference protein, IT IS MUCH
MORE RELIABLE TO USE SEVERAL REFERENCE PROTEINS. This is because comparison
of the several known structures allows you to identify regions of STRUCTURAL
conservation in addition to regions of sequence conservation. In a real life
example you should choose to use several reference proteins if possible.
Step 1: Getting Started
Duration: ~10 minutes
Purpose: This section should teach you how to start up Insight/Homology
remotely and give you a brief overview of the program layout.
- The Homology module is currently licensed only on chemvgx and splatter.
Therefore, when working on other machines it is necessary to run the program
remotely on chemvgx or splatter, then display on your local monitor.
Half of the class should choose each machine.
e.g. To run remotely on chemvgx,
xhost chemvgx
telnet chemvgx (then login)
cd to appropriate directory
insightII
- Choose the Homology module and take a few minutes to browse through
the menu options
Step 2: Reading in and Aligning Sequences
Duration: ~25 minutes
Purpose: this section should teach you how to read in, manipulate, and
align sequences and boxes.
The sequence alignment and pdb files needed in this tutorial are in the
directory /ruser/instruct1/stone/C687/homology.
- molecule-get
Get 3znf.pdb
The name of the molecule you get is ZNF
- Sequences-get
Choose alignment and get zf.align
A sequence window should appear containing two sequences
- Sequences-copy
Copy from: ZNF1
Copy to: ZNF
This generates a new sequence which is identical to the ZNF1 sequence but
is understood by the program to be the sequence of the displayed protein
ZNF whose structure is known.
- Sequences-delete
Delete ZNF1 (which is no longer useful)
- There are many different manipulations you can do with the sequences.
One of the simplest is to align two sequences automatically within Insight.
Alignment-Pairwise_sequence-Automatic
Specifiy the 2 sequences and execute
Note that they are now perfectly aligned within the sequence window.
- Now spend the next ~15 minutes experimenting with the various options
you have to manipulate the sequence. Most of these are mouse driven options.
They are described in detail in the help menu invoked within the sequences
window. They basically fall into 2 categories distinguished by specifying
the mode in the sequences window:
- Ways of moving a sequence or insert, moving, or deleting
gaps in a sequence.
- Ways of creating, changing, moving, and deleting boxes.
Boxes are used to specify regions of 2 sequences that are aligned and
regions of a sequence that should be mapped onto a particular region of
a known structure. Boxes cannot contain gaps!
This section is going to be particularly important for people who are
doing any homology modeling in their projects.
- Once you have become reasonably comfortable with the mouse driven options,
go back to where you were after point 5 above. You can do this either by
deleting boxes and realigning the sequences or just by deleting everything
and starting over.
Step 3: Assigning Coordinates fo the Unknown Protein
Duration: ~5 minutes
Purpose: To learn how to assign coordinates to the unknown protein.
- Draw a box around the region of the 2 sequences that is perfectly aligned.
Under boxes-freeze specify the box number (boxes are numbered starting from 0).
Freezing prevents the box from being changed and is necessary before you
assign coordinates.
Frozen boxes are colored red.
- Sequences-AssignCoords
Give box number and choose bump check (which will look for steric violations
as the new structure is generated).
Spend some time looking at the new structure and comparing it to the old
structure.
Note: When you assign the coordinates for proteins that are not perfectly
aligned, the process is somewhat more complicated.
Basically it involves:
- Creating and freezing boxes for EACH region of conservation without
any gaps. Where there is a gap, the boxes should end at least two residues
away from that gap so that the residues that span the gap in the model
structure are not too harshly constrained.
- Assigning coordinates for EACH boxed region.
- Modeling the strctures of the "loop" or non-conserved regions
that span the gaps. This can be done by; (a) searching the PDB for
conformations that match the regions surrounding and including the loop
regions then copying their conformations; or (b) alternatively by generating
a number of possibilities de novo and then "manually" sorting through them for
those that seem most sensible. This whole process is tedious and inherently
biased but obviously necessary. Most of the uncertainty in your final
structure will correspond to these regions and they are often the most
interesting regions biochemically. If you plan on doing any homology
modeling during your project or in the future, you should take the time now
to learn about these methods. The best way is to run throught the on-line
Homology pilot tutorial. Also check out the Homology manual.
Step 4: Fixing the Geometry
Duration: ~35 minutes
Purpose: To learn how to get from the initial 3D structural model of
the unknown protein to a structure that is more stable and structurally
and chemically reasonable.
- Comparison of the the unknown and reference protein structures.
Take 10 minutes to look at various details of the two structures which
should be superimposed on the screen and observe the similarities and
differences.
In particular, look at the confomations of the backbone, the sidechains of
residues that are identical in both proteins, the sidechains of residues
that are similar in the two proteins, and the sidechains of residues
that are quite different in the two proteins. Think about how the program
has assigned conformations to each of these parts of the unknow protein.
- In regions where the structure of the unknown protein does not seem to
be optimal, we can change it either by changing specific dihedral angles
or by global energy minimization.
- First try changing rotamers of dihedral angles.
Experiment with the Residue-Manual_Rotamer and Residue-Auto_Rotamer
options.
- Then try doing an energy minimization.
You should save your work now, then log out of the remote CPU and start up
Insight on your local computer to do this exercise. Otherwise it will take
forever!
The zinc finger is an awkward example for energy minimization because
of the zinc atom (which is not present in our current model).
We will use the Discover module to minimize the energy of the model protein
but we will FIX the backbone conformation and we will FIX the sidechain
conformations of the zinc ligands (the two conserved Cys and two conserved
His residues).
Currently, the name of the moleucle starts with a "$" character.
You must change the name of the molecule before proceeding:
Click on Object/Rename, an drename the molecule.
Under Biopolymer, choose a forcefield and fix potentials and charges.
Under Discover
- Constraint-Fix
- to fix the backbone conformations of all resiudes and
the sidechain conformations of 5, 8, 21, and 27.
- Parameters-Minimize
- Steepest descent algorithm
2000 iterations
Derivative of 1.0
- Run
- run
Local host
Interactive
Step 5: SUBMIT ASSIGNMENT 2
Display your final minimized structure so it looks pretty and
save it in a .psv folder entitled your_name.A2.psv
Put a copy of the folder in the directory
/ruser/instruct1/stone/C687/assignments NO LATER THAN Mon 2/24
Back to | C687 Summer 1997 |
Courses & Instruction | MolViz
Home |
Send comments to chemvis@indiana.edu
Last updated: 01/23/2001