Anonymous Sequences

Cedric Notredame ((c) 2003, 2004, 2005, 2006, 2007

Purpose of the Exercises

Choose a sequence among the following, and characterize it using bioinformatics tools. Among other things, identify its domain structure, try to predict its putative functions and its putative structure. All the sequences are of similar complexity.

Characterizing a sequence means that you must find out as much as you can about the function of this sequence. Is it enzyme, and if so what does it do? Where is its active site (and is it active). Does it have repeated elements, and if so, what are they. This is a complicated investigation where you must remember that each individual bioinformatics result you get is not enough on its own and must be supported by alternative results. Have Fun!

List of Sample Sequences



Where to start ?

There is no definite rule when it comes to studying a sequence and the purpose of this exercise is to let you invent and discover your own way of looking at sequences, yet, in case you need them, here are a few guidelines.

Remember that there may be more than one story to uncover in your sequence (two or three domains for instance). If this is more than you can handle, do things one at a time, start with a study of the domain you find the most exiting and then move to the next one. You do not need to be exhaustive and we will be happy with a nice study of at least one portion of your sequence!

Here is a list of simple things you could do, along with the chapter of Bioinformatics for dummies you could use:

Bioinformatics For Dummies
Type of Invistigation

The easiest thing you could do with your sequence is to find out about its various physico-chemical properties, and do simple prediction, or find out whether your sequence contain known domains.


The second easiest thing to do is comparing the sequence with a database. Using the ressources in the Chapter 7 of bioinformatics for Dummies. Of course the secret here is to use the right database to ask the right question. If you are not too sure on the databases, indications for proteins can be found in the Chapter 4. These databases will also come very handy to find out about obscure chemical information and complicated post-translational modifications.


If you have started gathering sequences, you may want to build a multiple sequence alignment. Yet before you do this, remember that simply comparing the sequence with itself can yield valuable clues. You can use dotlet for this purpose (and other tools) as explained in chapter 8.


Multiple alignments constitute the best way to present biological sequence information. If you think you have gathered the right sequences (or portion of sequences), you can try using some of the online tools presented in Chapter 9. Go to chapter 10 if you want to make your alignment look flashy.


Structures also help! Use Chapter 11 to find out if your there is a way to estimate the structure of your sequence. If your active site, or your phosporilation site appears to be right in the middle of the protein CORE, burried as deep as it can be, you are in trouble!


And here comes the complicated part! Making sense of all the information. There is a lots of noise in the data gathered using bioinformatics methods. You must clean it using consistency rules, just like if you were doing an enquiry, confronting witnesses testimonies. Try to folow these three rules

1Use a previous result to propose an hypothesisAccording to Prosite Residue 25 could be a Phosphorilation site
2Design an experiment to test your hypothesisPredict the secondary or the tertiary structure
3ConcludeResidue 25 seems to be deeply burried and is therefore not a good candidate for beeing a phosporilatiuon site