b) SOLUTION: vav_human.motifs
c) HINT: the necessary command lines are:
motifs -frequent
motifs -mismatch=1
SOLUTION: the resulting output files are vav_human_freq.motifs
and vav_human.mism1.motifs
d) SOLUTION: the PROSITE patterns DAG_PE_BINDING_DOMAIN and GDS_CDC2MAPP./I> apply to the sequence. The frequent patterns MYRISTYL, RGD, and ASN_GLYCOSYLATION do certainly not apply. Most of the various phosphorylation sites are also probably not used.
e) SOLUTION: The following domains are found: (PH_DOMAIN,
SH2, 2x SH3, GRF_DBL, CH_DOMAIN, DAG_PE_BIND).
Back to Problems
b) HINT: For GCG pattern format documentation see GCG Manual,
subheadings Gene finding and pattern recognition - motifs
- defining patterns. For PROSITE pattern format documentation,
see the URL http://expasy.hcuge.ch/txt/prosuser.txt
SOLUTION: The PROSITE pattern is: "k-x(0,1)-k-x-x>",
the GCG pattern is "kx{0,1}kxx>"
c) SOLUTION: In SwissProt, there are 1397 sequences bearing this
motif at the C-terminus. A complete list can be found here
.
Most of these proteins are not type I transmembrane proteins and never
see the ER retention machinery.
Back to Problems
b) HINT: Call blast with the option -filter=xs
SOLUTION: See the blast output file.
NOTE:
If you want to see what happens if the search is run without filters, look
at the following file
c) HINT: For pairwise comparison, use the BLOSUMMAPP. comparison matrix. Since we are interested in local homologies, use the bestfit program with gap creation penalties of 20-30 and gap extension penalties of 2-3. For assessing the statistical significance of pairwise matches, use the following command line:
d) SOLUTION: Use the combination of GCGs compare and dotplot
programs. For compare, use a window of 35 and a stringency
of 20. See the resulting file
MAPP.5 Multiple alignment of homology domains an profile
searches
a)
HINT: create a list file for pileup using the Begin and End
specifications.
SOLUTION:
Use the following list_file ( MAPP.MAPP.list), indicating the approximate
conserved regions.
******************************************
swiss:RADMAPP.SCHPO Begin:1 End:80
swiss:RADMAPP.SCHPO Begin:100 End:180
swiss:XRCC_HUMAN Begin:310 End:390
******************************************
and the command
b)
HINT: Suppose your alignment file is pileup.msf.
Invoke lineup by saying '
c)
HINT: Suppose your edited MSF file is called edited.msf.
The command line is:
d)
HINT: For searching small to medium size databases, profilesearch
is suited. However, profilesearch has a built-in restriction
to take at most 100,000 sequences into consideration. For big databases,
the EGCG program tprofilesearch with the option -nosixframe
can be used. tprofilesearch also has a restriction to 80,000
sequences but it can use the
-minscore=xx parameter. Using this option, only sequences
with a score higher than xx are considered (and counted). See the tprofilesearch
documentation (EGCG package) for details. Suitable commands for searching
our example against SwissProt are:
f) HINT: Siginificant hits have Z-scores > 7 or 8.
In this example, YD97_SCHPO,
YHVMAPP.YEAST, YM8K_YEAST, DNLJ_THESC, DNLJ_THETH, DNLMAPP.HUMAN
and DNLJ_ECOLI are to be considered significant.
SOLUTION: The output file of profilesegments is
given in this file.
Back to Problems
c) NOTE: You should be able to find consecutively more significant
matches, e.g. bacterial ligases, yeast Rev1, mammalian Ect, later also
mammalian and yeast ligases, yeast Rad9, Rfc1, PARP, 53BP1 and even Brca1.
Back to Problems