Finding a local homology domain by BLAST and pairwise alignment
a)
HINT: Use the lookup program
SOLUTION: The entry name is XRC1_HUMAN.
b)
HINT: Call blast with the option -filter=xs
SOLUTION: See the blast output
file.
NOTE: If you want to see what happens
if the search is run without filters, look at the following file
c)
HINT: For pairwise comparison, use
the BLOSUM45 comparison matrix. Since we are interested in
local homologies, use the bestfit program
with gap creation penalties of 20-30 and gap extension
penalties of 2-3. For assessing the statistical
significance of pairwise matches, use the following
command line:
bestfit -data=blosum62.cmp
-gap=18 -len=2 swiss:rad4_schpo
swiss:xrcc_human -ran=100
SOLUTION: The only significant match here is Rad4 from S. pombe (see corresponding file).
d)
SOLUTION: Use the combination of GCGs
compare and dotplot programs. For compare, use a
window of 35 and a stringency of 20.
See the resulting file
a)
HINT: create a list file for pileup
using the Begin and End specifications.
SOLUTION:Use the following list_file
( MAPP.4.list), indicating the approximate conserved
regions.
b)
HINT: Suppose your alignment file is
pileup.msf. Invoke lineup by saying '
lineup -MSF pileup.msf
NOTE: Note the difference to e.g. reformat,
where you would have to say 'reformat -msf
pileup.msf{*}'. This is because lineup always
expects a multiple-alignment file while reformat
expects one (or more) sequence(s)
SOLUTION: An example edited alignment
file.
c)
HINT: Suppose your edited MSF file
is called edited.msf. The command line is:
profilemake -stringent -nologwgt -data=blosum62.cmp edited.msf{*}
see the above comment for the {*} syntax.
SOLUTION: An example output of profilmake
is in this file
d)
HINT: For searching small to medium
size databases, profilesearch is suited. However, profilesearch
has a built-in restriction to take at most
100,000 sequences into consideration. For big databases, the
EGCG program tprofilesearch with the
option -nosixframe can be used. tprofilesearch also has a
restriction to 80,000 sequences but it can
use the-minscore=xx parameter. Using this option, only
sequences with a score higher than xx are
considered (and counted). See the tprofilesearch
documentation (EGCG package) for details.
Suitable commands for searching our example against
SwissProt are:
profilesearch -noave -nor -gap=21 -len=2 -batch
or in EGCC
tprofilesearch -nosixframe -noaverage -normalize -minscore=5.0 -list=100
-batch ..
SOLUTION: The output file of this search is in this file.
f) HINT:Siginificant hits have Z-scores
> 7 or 8.
In this example, YD97_SCHPO, YHV4_YEAST, YM8K_YEAST,
DNLJ_THESC, DNLJ_THETH,
DNL4_HUMAN and DNLJ_ECOLI are to be considered
significant.
SOLUTION: The output file of profilesegments
is given in this file.
Advanced exercises
e) HINT: Create matching segments with
a command line like
profilegap -outfile2=newsegment.seg
In the newly found matches, only a part of
the profile is matched to the sequence. In cases like this, it
might be a good idea to manually check if
the flanking regions of the sequence might be forced to also
match the profile. This can be checked e.g.
by
profilesegments -global.
f) NOTE: You should be able to find
consecutively more significant matches, e.g. bacterial ligases,
yeast Rev1, mammalian Ect, later also mammalian
and yeast ligases, yeast Rad9, Rfc1, PARP, 53BP1
and even Brca1.
a)
HINT: Do not use the full XRCC_HUMAN.
If you do so, another domain at the N-Terminus will obscure
the hits of interest. The output file you
get using the whole sequence is shown here.
The major problem is that
the long N-terminal domain makes it difficult
to see the shorter domain we are interested in.
SOLUTION: The correct solution will
be obtained in several rounds by submitting only the frst 400
amino acid of XRCC_HUMAN. See the file
here for the third iteration.
b)
HINT: Pileup is not sensitive enought
to make this alignment. Instead, use ClustalW, but use pileup to
produce the fragment of sequences you wish
to align. Use readseq to reformat your sequences.
SOLUTION: You can use the following list_file
and feed it to pileup. The output alignment is rather
unconvincing (here)
.
c)
You can use Boxshade
or prettyview to produce a more convincing alignment in post-script format
(here is a black and white an
example),
obtained with the command: