Exercises for Day 4 of the EMBNET course :

Multiple Sequence Alignments, Phylogeny and Profiles

MAPP.1 Checking a sequence for the existence of motifs

a) Find the sequence for the human proto-oncogene Vav in SwissProt and create a local copy

b) Use this sequence to search the PROSITE database of patterns with the motifs program.

c) Repeat the motifs run and allow one mismatch per pattern. Repeat the motifs run and include frequent patterns. **NOTE** never use both options at the same time!

d) Try the same search on the WWW server, using the URLs:
http://expasy.hcuge.ch/sprot/scnpsit1.html or
http://www.isrec.isb-sib.ch/software/PSTSCAN_form.html
Read the documentation entries of the reported PROSITE matches and see if they apply to your sequence.

e) Use this sequence to search the profile section of PROSITE and PFAM A, using the URL:
http://www.isrec.isb-sib.ch/software/PFSCAN_form.html
Hints and Solutions:

MAPP.2 Finding occurrences of patterns in protein databases

a) Use the findpatterns program to check if the patterns 'ISREC' or 'ELVIS' occur in the SwissProt database

b) A motif for ER-localization for type I membrane proteins is the pattern KKxx or KxKxx at the extreme C-terminus of a sequence. Formulate a PROSITE and a GCG pattern that covers both variants of this motif.

c) check how many times this pattern occurs in SwissProt, using either the GCG program findpatterns or one of the WWW-servers
http://expasy.hcuge.ch/sprot/scnpsit2.html or
http://www.isrec.isb-sib.ch/software/PATFND_mailform.html
Why do you find so many copies of this motif in non-ER proteins?

Hints and Solutions:

MAPP.3 Multiple alignments and Phylogeny

a)-produce the multiple alignment and the phylogenic tree of the three following sequences in Swiss_prot
                 -SCG1_HUMAN
                 -SCGA_XENLA
                 -STHM_MOUSE
Hint   Solution

b)-Discuss the results. What is wrong. Propose a strategy?
Hint Solution

c)-Add sequences from Swiss_prot in your list , make a new multiple sequence alignment and a tree.
Hint Solution

d)-Discuss the results and propose a new strategy in order to obtain a better tree.
Hint Solution

e)-Obtain the nucleotide sequences of your sequences from the EMBL database and align them
(NOTE: the nucleotide sequences of sg1_mouse and sg1_human are not available)
Hint Solution

f)-use your multiple sequence alignment with the programs distance and growtree. Produce the rooted tree corresponding to your nucleotide alignment.
Hint Solution

MAPP.4 Finding a local homology domain by BLAST and pairwise alignment

a) Find the sequence for the human DNA repair gene Xrcc1 in SwissProt and create a local copy.

b) Perform a blast search against the non redundant protein database. **NOTE**: this sequence contains low complexity regions, therefore use blast's filter option, otherwise lots of junk will be reported

c) By using pairwise alignment, check which of the matches are significant and which are potentially interesting. Check the high scoring matches for putative biological relevance and note the results.

d) Find the conserved region of Xrcc1 and Rad4 by using dotplots. (Use Window=35, Stringency=20) Note the internal repeat in Rad4.
Hints and Solutions:

MAPP.5 Multiple alignment of homology domains an profile searches

a) use pileup to create a multiple alignment of these three sequence fragments (in MSF format)

b) use lineup to look at the multiple alignment and to remove potential non-conserved overhanging ends. Save the edited multiple alignment in MSF format.

c) use profilemake to create a profile from the edited MSF file. Create the profile with lineas symbol weighting and stringent treatment of non-observed symbols.

d) use profilesearch to search the profile against a protein sequence database. Run this search with compositional averaging switched off. Don't forget to restrict the output list to the 100 best sequences, otherwise ALL SEQUENCES in the database will be reported ! Profile searches against big databases take a lot of time and cpu resources. The output file of a profilesearch against the complete SwissProt database will be provided.
NOTE: use the substitution matrix blossum45.cmp with the option -data=genmoredata:blosum45.cmp)

e) Check the output list for high scoring matches of potential relevance. Create profile-to-sequence alignments with selected examples using profilesegments or profilegap.
Hints and Solutions:

MAPP.6 Advanced exercises

The following exercises might be too time consuming to perform now -maybe try them at home!

a) Use profilegap to extract from the high-scoring sequences those fragments that match with the profile.

b) Add these new sequence segments to the existing multiple alignment, either by pileup or by manual editing.

c) Repeat the profilemake/profilesearch cycle iteratively as described above.
Hints and Solutions: