Analysis of Two Sequences1-MUTATIONS AND GENETIC CODE
1.Given a random ORF, what probability of appearance do you expect for
each amino
acid?
2.What are the two residues that are the most likely to mutate by chance?
3.Does this prediction correlate well with the observation?
MATERIAL: Pam250 and the
Genetic Code
[BACK]
The simplest form of multiple sequence analysis occurs when you have only two sequences ! Analyses you may want to do of two sequences can be carried out by the programs compare, gap and bestfit. 2-COMPARE & DOTPLOTCompare compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with dotplot. Compare finds the points using either a window/stringency or a word match criterion. The window/stringency comparison is the slower but more sensitive of the two.Dotplot makes a dot-plot with the output file from compare, foldrna, or stemloop. Exercise:To do this exercise you will first need to fetch the two following sequences from the GCG database, these are the E.coli and Mycobacterium genitalium recA sequences em:ecreca (v00328) em:mtreca (x58485)The default filenames are ecreca.em_ba and mtreca.em_ba. To compare these enter the following: % compare Compare compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with DotPlot. Compare finds the points using either a window/stringency or a word match criterion. The word comparison is 1,000 times faster than the window/stringency comparison, but somewhat less sensitive.COMPARE what horizontal sequence ? em:ecreca Begin (* 1 *) ? End (* 1391 *) ? Reverse (* No *) ?to what vertical sequence (* ecreca.em_ba *) ? em:mtreca Begin (* 1 *) ? End (* 2762 *) ? Reverse (* No *) ? What comparison window size (* 21 *) ? What stringency (* 14.0 *) ? What should I call the output file (* ecreca.pnt *) ? Number of points: 947 Writing .......To view this comparison dotplot is used as follows: NB! you must set your GCG graphics environment before using this or any other GCG graphics program. (see Appendix I of this document) % dotplot DotPlot makes a dot-plot with the output file from Compare, FoldRNA, or StemLoop.DOTPLOT what point file ? ecreca.pnt ecreca.pnt contains COMPARE results of Axis Name Check Start End Dir Horizontal ecreca.em_ba 3229 1 1391 for Vertical mtreca.em_ba 8528 1 2762 for Window . . . . . . . . . 21 Stringency . . . . . . . 14.0 Number of points . . . . 947 Percent of possible . . 0.023 The minimum density for a one-page plot is 5118.3 bases/100 platen units on each axis. What point density would you like (* 2303.3 *) ? DOTPLOT will take 1 pages. Would you like to: P)lot the points D)ifferent density G)et another point file to plot Q)uit Please select one (* P *): P)lot the points D)ifferent density G)et another point file to plot Q)uit Please select one (* Q *):Ask yourself why there is a big discontinuity in the sequence comparison. Clue: Mycobacterium has inteins. This graph can be improved by changing the settings for the compare command. For the previous example the default settings were used and there is a lot of confounding information (the small dots) which can be removed by making the window size greater. 3-Comparison Using Dynamic Programming: GAP and BestfitGap uses the algorithm of Needleman and Wunsch to find the alignment of two complete sequences that maximises the number of matches and minimises the number of gaps.Exercise:First fetch your sequences. Use the Haemophilus influenzae rec1 sequence and the Escherichia coli recA gene em:hearec (L07521) em:ecreca (v00328).To do the gap comparison enter the following: % gap Gap uses the algorithm of Needleman and Wunsch to find the alignment oftwo complete sequences that maximizes the number of matches and minimizes the number of gaps.GAP of what sequence 1 ? em:hearec Begin (* 1 *) ? End (* 1484 *) ? Reverse (* No *) ?to what sequence 2 (* hearec.em_ba *) ? em:ecreca Begin (* 1 *) ? End (* 1391 *) ? Reverse (* No *) ? What is the gap creation penalty (* 50.00 *) ? What is the gap extension penalty (* 3.0 *) ? What should I call the paired output display file (* hearec.pair *) ? Aligning .................................................. ...................-. Aligning .................................................. ...................-....... Gaps: 4 Quality: 7888 Quality Ratio: 5.671 % Similarity: 58.531 Length: 1486To view the alignment, type: % more hearec.pair It is much quicker to put all the instructions on the command line: % gap em:hearec em:ecreca -def but the "-def" (default) assumes that you want to align the whole sequence, and not, for example just the coding sequence and that you are happy with the gap penalties. This is unlikely to be always true, especially as you move into multiple sequence alignments. Map is a global alignment program which attempts to align two complete sequences. Bestfit offers another way of aligning two sequences, it tries to find the best local alignment. They are both good for different sorts of problems. Exercise.Compare the results you get from bestfit and gap under various gap-penalty regimes.[BACK] 4 Looking at the Structure of GenesUsing the same tools as in the above exercises, examine the following seqences pairs:
1.PAPA_CARPA - SERA_PLAFG
What biological phenomenon do your observations reflect? 5 Identification of Multiple RepeatsDo the same analysis as previously using the sequence TF3A_XENLA. Make a dot-plot and then use Lalign. Explain the resultsand propose a list of repeats.
|
||||||
|