Finding a local homology domain by BLAST and pairwise alignment
     a)
     HINT: Use the lookup program
     SOLUTION: The entry name is XRC1_HUMAN.
     b)
     HINT: Call blast with the option -filter=xs
     SOLUTION: See the  blast output
file.
     NOTE: If you want to see what happens
if the search is run without filters, look at the following file
     c)
     HINT: For pairwise comparison, use
the BLOSUM45 comparison matrix. Since we are interested in
     local homologies, use the bestfit program
with gap creation penalties of 20-30 and gap extension
     penalties of 2-3. For assessing the statistical
significance of pairwise matches, use the following
     command line:
         bestfit -data=blosum62.cmp
-gap=18 -len=2 swiss:rad4_schpo
         swiss:xrcc_human -ran=100
SOLUTION: The only significant match here is Rad4 from S. pombe (see corresponding file).
     d)
     SOLUTION: Use the combination of GCGs
compare and dotplot programs. For compare, use a
     window of 35 and a stringency of 20. 
See the resulting file
 
 
     a)
     HINT: create a list file for pileup
using the Begin and End specifications.
     SOLUTION:Use the following list_file
( MAPP.4.list), indicating the approximate conserved
     regions.
 
     b)
     HINT: Suppose your alignment file is
pileup.msf. Invoke lineup by saying '
lineup -MSF pileup.msf
     NOTE: Note the difference to e.g. reformat,
where you would have to say 'reformat -msf
     pileup.msf{*}'. This is because lineup always
expects a multiple-alignment file while reformat
     expects one (or more) sequence(s)
     SOLUTION: An example edited alignment
file.
     c)
     HINT: Suppose your edited MSF file
is called edited.msf. The command line is:
profilemake -stringent -nologwgt -data=blosum62.cmp edited.msf{*}
     see the above comment for the {*} syntax.
     SOLUTION: An example output of profilmake
is in this file
     d)
     HINT: For searching small to medium
size databases, profilesearch is suited. However, profilesearch
     has a built-in restriction to take at most
100,000 sequences into consideration. For big databases, the
     EGCG program tprofilesearch with the
option -nosixframe can be used. tprofilesearch also has a
     restriction to 80,000 sequences but it can
use the-minscore=xx parameter. Using this option, only
     sequences with a score higher than xx are
considered (and counted). See the tprofilesearch
     documentation (EGCG package) for details.
Suitable commands for searching our example against
     SwissProt are:
 
profilesearch -noave -nor -gap=21 -len=2 -batch
or in EGCC
                 
tprofilesearch -nosixframe -noaverage -normalize -minscore=5.0 -list=100
                 
-batch ..
SOLUTION: The output file of this search is in this file.
     f) HINT:Siginificant hits have Z-scores
> 7 or 8.
     In this example, YD97_SCHPO, YHV4_YEAST, YM8K_YEAST, 
DNLJ_THESC, DNLJ_THETH,
     DNL4_HUMAN and DNLJ_ECOLI are to be considered
significant.
     SOLUTION: The output file of profilesegments
is given in this file.
 
 
Advanced exercises
     e) HINT: Create matching segments with
a command line like
     profilegap -outfile2=newsegment.seg
     In the newly found matches, only a part of
the profile is matched to the sequence. In cases like this, it
     might be a good idea to manually check if
the flanking regions of the sequence might be forced to also
     match the profile. This can be checked e.g.
by
     profilesegments -global.
     f) NOTE: You should be able to find
consecutively more significant matches, e.g. bacterial ligases,
     yeast Rev1, mammalian Ect, later also mammalian
and yeast ligases, yeast Rad9, Rfc1, PARP, 53BP1
     and even Brca1.
 
     a)
     HINT: Do not use the full XRCC_HUMAN.
If you do so, another domain at the N-Terminus will obscure
     the hits of interest. The output file you
get using the whole sequence is shown here.
The major problem is that
     the long N-terminal domain makes it difficult
to see the shorter domain we are interested in.
     SOLUTION: The correct solution will
be obtained in several rounds by submitting only the frst 400
     amino acid of XRCC_HUMAN. See the file
here for the third iteration.
     b)
     HINT: Pileup is not sensitive enought
to make this alignment. Instead, use ClustalW, but use pileup to
     produce the fragment of sequences you wish
to  align. Use readseq to reformat your sequences.
    SOLUTION: You can use the following list_file
and feed it to pileup. The output alignment is rather
    unconvincing (here)
.
     c)
     You can use Boxshade
or prettyview to produce a more convincing alignment in post-script format
(here is a black and white  an
    example),
obtained with the command: