1 Using PILEUP and ClustalW
Multiple alignment of (homologous) sequences is a very powerful tool for
finding biologically significant features in sequences and also as an essential
prerequisite to carrying out phylogenetic analysis.
CLUSTALWClustalW is a multiple alignment program that also draws phylogenetic trees. This software was described in: Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) Nucleic Acids Research, 22(22):4673-4680.Input sequences must all be in one file (or two files for a profile alignment) and one format. The acceptable formats are : FASTA (Pearson); NBRF/PIR; EMBL/Swiss Prot; GDE; CLUSTAL; GCG/MSF. Note that for Clustal format and MSF format (output from the GCG program pileup), the sequences are already aligned. You can use this facility to read in an alignment in order to calculate a phylogenetic tree OR to output the same alignment in a different format (from the output format options menu of the multiple alignment menu) e.g. read in a GCG/MSF format alignment and output a PHYLIP format alignment. This is also useful to read in one reference alignment and to add one or more new sequences to it using the "profile alignment" facilities. The default output format for clustal-created trees is New Hampshire format (in which the tree topology is indicated by a hierarchy of nested brackets), this format is compatible with PHYLIP and you can use PHYLIP programs such as RETREE or DRAWTREE/DRAWGRAM to view the output tree. The phylogenetic trees in ClustalW use the Neighbour-Joining method of Saitou and Nei based on a matrix of "distances" between all sequences. Note: do NOT use the .dnd file (a guide tree used to decide which sequences in the dataset align first) as the definitive phylogeny. PILEUPThis is a program available under GCG. Pileup creates a multiple sequence alignment from a group of related sequences using a simplification of the progressive alignment method of Feng and Doolittle.The multiple alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Before alignment, the sequences are first clustered by similarity to produce a dendrogram, or tree representation of clustering relationships. It is this dendrogram that directs the order of the subsequent pairwise alignments. Distance along the vertical axis is proportional to the difference between sequences; distance along the horizontal axis has no significance. GCG 9 incorporates a useful xwindows based tool called seqlab, which enables you to view, edit-by-hand, and save multiple sequence alignments. Windows based software is best learned by being shown how to use it in a basic way and then experimenting. Exercise:The aim of this exercise is to demonstrate how to operate multiple sequence alignment programs and also to highlight their restrictions and some differences between them.For the purposes of this exercise it will be necessary to create a new directory. This will be used to hold only the sequences and files relevant to the exercise. This will enable you to use the wild-card * to select all sequences, as demonstrated later. To create a new directory called "mult_seqs" enter the command : % mkdir mult_seqs You can then move into this directory (change directory) % cd mult_seqs It might be better to use cas instead of mult_seqs because it's quicker to type and the first dataset is mammalian casein peptides. Later you will be analysing the corresponding DNA sequences and later still analagous somatotropin genes. The next step is to fetch the sequences for alignment from the database. To do this you must be in GCG, so you may first need to enter: % gcg Fetch these individually into your working directory. % fetch sw:database_name where database_name is the name as listed below or is the appropriate accession number. The first alignment is of the a - S1 casein precursor of selected mammals.
These sequences have the Swiss-Prot names :
% ls you will see a list of the files in your present directory ( ? mult_seqs). Note that these all end .sw, this is useful because you can use *.sw to refer to all the sequences which you want to align. Because these files were brought to your directory using the GCG program fetch, they are all in GCG format. For ClustalW to read in these sequences they need to be all in one file and in one of the accepted formats. We can do both of these at once using the GCG command tofasta, which will write all the sequences into one file in FastA/Pearson format. To do this enter the following: % tofasta *.sw ToFastA converts GCG sequence(s) into FastA format. What should I call the output file (* tofasta.tfa *) ? cas1.tfa CAS1_BOVIN214 characters. CAS1_HUMAN185 characters. CAS1_MOUSE313 characters. CAS1_PIG206 characters. CAS1_RABIT215 characters. CAS1_RAT284 characters. CAS1_SHEEP206 characters. 1,623 symbols written into "cas1.tfa".Note that it is important to use *.sw here because if you repeat the tofasta command for each sequence they will all be in separate files. You now have a single file (cas1.tfa) containing all the sequences for alignment so you can now run ClustalW : % clustalw ************************************************************** ******** CLUSTAL W(1.60) Multiple Sequence Alignments ******** ************************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice:1 Sequences should all be in 1 file. 7 formats accepted: NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF,RSF Enter the name of the sequence file:cas1.tfa Sequence format is Pearson Sequences assumed to be PROTEIN Sequence 1: CAS1_BOVIN 214 aa Sequence 2: CAS1_HUMAN 185 aa Sequence 3: CAS1_MOUSE 313 aa Sequence 4: CAS1_PIG_I 206 aa Sequence 5: CAS1_RABIT 215 aa Sequence 6: CAS1_RAT_I 284 aa Sequence 7: CAS1_SHEEP 206 aaYou will now have returned to the main menu. Enter 2 after the prompt "Your choice: " to choose to do a multiple alignment. The Multiple Alignment Menu (as follows) should then appear on your screen. ****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now (Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps between alignments? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:You can choose 6 at this prompt to change the algorithm parameters, i.e. gap creation and extension penalties etc., or just choose 1 to do the multiple alignment now. In this case choose 1. Accept the default filenames by just hitting <return> twice. When the multiple alignment is complete the Multiple Alignment Menu will again be on your screen. Hit <return> to go back to the main menu (as before). Here choose option x to exit from ClustalW. Look at the alignment % more cas1.aln CLUSTAL W(1.60) multiple sequence alignment CAS1_BOVIN MKLLILTCLVAVALARPKHPIKHQGLP-------QEVLNEN-LLRFFVAPFPEVFGKEKV CAS1_HUMAN MRLLILTCLVAVALARPKLPLRYPERLQNP---SESSE-------PIP----LESREEYM CAS1_MOUSE MKLLILTCLVAAAFAMPRLHSRNAVSSQTQQQHSSSEE-------IFKQPKYLNLNQEFV CAS1_PIG_I MKLLIFICLAAVALARPKPPLRHQEHLQNEPDSREELFKERKFLRFPEVPLLSQFRQEII CAS1_RABIT MKLLILTCLVATALARHKFHLGHLKLTQEQPESSEQEILKERKLLRFVQTVPLELREEYV CAS1_RAT_I MKLLILTCLVAAALALPRAHRRNAVSSQTQQENSSSEEQE-----IVKQPKYLSLNEEFV CAS1_SHEEP MKLLILTCLVAVALARPKHPIKHQGLS-------PEVLNEN-LLRFVVAPFPEVFRKENI *.*** ** * * * . * . CAS1_BOVIN NELSKDIGSESTEDQAMEDIKQMEAESISSSEEIVPNSVEQKHIQKE------------- CAS1_HUMAN NGMNRQRNILREK----QTDEIKDTRNESTQNCVVAEPEKMESSISSS------------ CAS1_MOUSE NNMNRQRALLTE-----QNDEIKVTMDAASEEQAMASAQEDSSISSSS-EESEEAIPNIT CAS1_PIG_I NELNRNHG--------MEGHEQRGS-SSSSSEEVVGNSAEQKHVQKEE------------ CAS1_RABIT NELNRQRELLREK----ENEEIKGTRNEVTEEHVLADRETEASISSSS----EEIVPSST CAS1_RAT_I NNLNRQRELLTE-----QDNEIKITMDSSAEEQATASAQEDSSSSSSSSEESKDAIPSAT CAS1_SHEEP NELSKDIGSESIEDQAMEDAKQMKAGSSSSSEEIVPNSAEQKYIQKE------------- * . . . . . CAS1_BOVIN -----DVPSERYLGYLEQLLRLKKYKVPQLEIVPNSAEERLHSMKE---GIHAQQKEPMI CAS1_HUMAN ------SEEMSLSKCAEQFCRLNEYNQLQLQAAH--AQEQIRRMN-----ENSHVQVP-- CAS1_MOUSE EQKNIANEDMLNQCTLEQLQRQFKYNQLLQKASL--AKQASLFQQPSLVQQASLFQQPSL CAS1_PIG_I -----DVPSQSYLGHLQGLN---KYKLRQLEAIH---DQELHRTNE---DKHTQQGEPMK CAS1_RABIT KQKYVPREDLAYQPYVQQQLLRMKERYQIQE------REPMRVVN---QELAQLYLQP-- CAS1_RAT_I EQKNIANKEILNRCTLEQLQRQIKYSQLLQQASL--AQQASLAQQASLAQQALLAQQP-- CAS1_SHEEP -----DVPSERYLGYLEQLLRLKKYNVPQLEIVPKSAEEQLHSMKE---GNPAHQKQPMI . . * CAS1_BOVIN GVNQELAYF-----------------YPELFRQFYQLD--AYPSGAWYYVPLGTQYTDAP CAS1_HUMAN ------------------------------FQQLNQL---AAYPYAVWYYPQIMQYVPFP CAS1_MOUSE LQQASLFQQPSMAQQASLLQQLLLAQQPSLALQVSPAQQSSLVQQAFLAQQASLAQKHHP CAS1_PIG_I GVNQEQAYF-----------------YFEPLHQFYQLD--AYPYATWYYPP---QYIAHP CAS1_RABIT -----------------------------FEQPYQLD---AYLPAPWYYTPEVMQYVLSP CAS1_RAT_I ----------------------------SLAQQAALAQQASLAQQASLAQQASLAQKHHP CAS1_SHEEP AVN-------------------------QLFRQFYQLD--AYPSGAWYYLPLGTQYTDAP . * CAS1_BOVIN SFSDIPNPIGSENSE-KTTMPLW------------------------------------- CAS1_HUMAN PFSDISNPTAHENYEKNNVMLQW------------------------------------- CAS1_MOUSE RLSQSYYPHMEQPYRMNAYSQVQMRHPMSVVDQALAQFSVQPFPQIFQYDAFPLWAYFPQ CAS1_PIG_I LFTNIPQPTAPEKGGKTEIMPQW------------------------------------- CAS1_RABIT LFYDLVTPSAFESAEKTDVIPEWLKN---------------------------------- CAS1_RAT_I RLSQVYYPNMEQPYRMNAYSQVQMRHPMSVVDQ--AQFSVQSFPQLSQYGAYPLWLYFPQ CAS1_SHEEP SFSDIPNPIGSENSG-KITMPLW------------------------------------- * . CAS1_BOVIN ---------------------------- CAS1_HUMAN ---------------------------- CAS1_MOUSE DMQYLTPKAVLNTFKPIVSKDTEKTNVW CAS1_PIG_I ---------------------------- CAS1_RABIT ---------------------------- CAS1_RAT_I DMQYLTPEAVLNTFKPIAPKDAENTNVW CAS1_SHEEP ----------------------------If you examine this alignment you might decide that it is not very sensible and consequently any tree drawn on the basis of this alignment would be unsatisfactory. For example, in the alignment above, I have highlighted small sequences that you might think should be aligned but are not. Change the alignment parameters to see how it affects the alignment. To do this re-enter ClustalW % clustalw and input the same sequences as before. Choose option 2 to do an alignment. Now choose option 6 to change the multiple alignment parameters. ********* MULTIPLE ALIGNMENT PARAMETERS ********* 1. Gap Opening Penalty :10.00 2. Gap Extension Penalty :0.05 3. Delay divergent sequences :40 % 4. DNA Transitions Weight :0.50 5. Protein weight matrix :BLOSUM series 6. DNA weight matrix :IUB 7. Use negative matrix :OFF 7. Protein Gap Parameters H. HELP Enter number (or [RETURN] to exit):As you can see the gap opening penalty is quite large (the default in pileup v8.1 it was only 3.00 although it is now 12). Change these parameters to be the same as the pileup ones (gap extension penalty for pileup 8.1 was 0.10 now 4). To change the gap opening or extension penalties choose the appropriate menu number and you will be prompted for a value. Depending on what changes you make this may give a quite different alignment. If you choose the parameters the same as pileup the alignment is as follows (note it is a good idea to give the new alignment a name other than the default otherwise you will overwrite the pre-existing alignment). CLUSTAL W(1.60) multiple sequence alignment CAS1_BOVIN MKLLILTCLVAVALARPKHPIK--HQG-LP------QE--VLNEN-LLRFFVAPFPEVFG CAS1_HUMAN MRLLILTCLVAVALARPKLPLR--YPERLQNPSESSEP--IP--------------LESR CAS1_MOUSE MKLLILTCLVAAAFAMPRLHSRNAVSSQTQQQHSSSEE--IFK---------QPKYLNLN CAS1_PIG_I MKLLIFICLAAVALARPKPPLR--HQEHLQNEPDSREE--LFKERKFLRFPEVPLLSQFR CAS1_RABIT MKLLILTCLVATALARHKFHLG--HLKLTQEQPESSEQ-EILKER-KLLRFVQTVPLELR CAS1_RAT_I MKLLILTCLVAAALALPRAHRRNAVSSQTQQENSSSEEQEIVK---------QPKYLSLN CAS1_SHEEP MKLLILTCLVAVALARPKHPIK--HQG-LS------PE--VLNEN-LLRFVVAPFPEVFR *.*** ** * * * . . CAS1_BOVIN KEKVNELSKDIGSESTEDQAMEDIKQMEAESISSSEEIVPNSVEQKHIQKE-DVPSE--- CAS1_HUMAN EEYMNGMNRQRNILR-EK----QTDEIKDTRNESTQNCVVAEPEKMESSISSSS-EE--M CAS1_MOUSE QEFVNNMNRQRALLT-E-----QNDEIKVTMDAASEEQAMASAQE-DSSISSSS-EESEE CAS1_PIG_I QEIINELNRNHG--------MEGHEQ-RGSSSSSSEEVVGNSAEQKHVQKEEDVPSQ--- CAS1_RABIT EEYVNELNRQRELLR-EK----ENEEIKGTRNEVTEEHVLADRET-EASISSSS-EE--- CAS1_RAT_I EEFVNNLNRQRELLT-E-----QDNEIKITMDSSAEEQATASAQE-DSSSSSSSSEESKD CAS1_SHEEP KENINELSKDIGSESIEDQAMEDAKQMKAGSSSSSEEIVPNSAEQKYIQKE-DVPSE--- * .* . . . .. . . CAS1_BOVIN RYLGYLEQLLRLKKYKVPQLEIVPNSAEERLHSMKEGIHAQQKEPMIGVNQE-------- CAS1_HUMAN SLSKCAEQFCRLNEYNQLQLQAAH--AQEQIRR------MNENSHVQVPFQQ-------- CAS1_MOUSE AIPNITEQKNIANEDMLNQCTLEQ--LQRQFKY------NQLLQKASLAKQASLFQQPSL CAS1_PIG_I SYLGHLQG---LNKYKLRQLEAIH---DQELHRTNEDKHTQQGEPMKGVNQE-------- CAS1_RABIT IVPSSTKQKYVPREDLAYQPYVQQQLLRMKERYQ-----IQEREPMRVVNQE-------- CAS1_RAT_I AIPSATEQKNIANKEILNRCTLEQ--LQRQIKY------SQLLQQASLAQQA-------- CAS1_SHEEP RYLGYLEQLLRLKKYNVPQLEIVPKSAEEQLHSMKEGNPAHQKQPMIAVNQ--------- . . * CAS1_BOVIN --LAYFYPE----------LFRQFYQLDAYPSGAWYYV-PLGTQYTDAPSFSDIPNPI-- CAS1_HUMAN -----------------------LNQLAAYPYAVWYY--PQIMQYVPFPPFSDISNPT-- CAS1_MOUSE VQQASLFQQPSLLQQASLFQQPSMAQQASLLQQLLLAQQPSLALQVSPAQQSSLVQQAFL CAS1_PIG_I --QAYFYFE----------PLHQFYQLDAYPYATWYYP-P---QYIAHPLFTNIPQPT-- CAS1_RABIT --LAQLYLQP----------FEQPYQLDAYLPAPWYYT-PEVMQYVLSPLFYDLVTPS-- CAS1_RAT_I ----------------------SLAQQASLAQQALLAQQPSLAQQAALAQQASLAQQASL CAS1_SHEEP -------------------LFRQFYQLDAYPSGAWYYL-PLGTQYTDAPSFSDIPNPI-- * . * . CAS1_BOVIN GSEN--SEKTT-MPLW-------------------------------------------- CAS1_HUMAN AHEN--YEKNNVMLQW-------------------------------------------- CAS1_MOUSE AQQASLAQKHHPRLSQSYYPHMEQPYRMNAYSQVQMRHPMSVVDQALAQFSVQPFPQIFQ CAS1_PIG_I APEK--GGKTEIMPQW-------------------------------------------- CAS1_RABIT AFES--AEKTDVIPEWLKN----------------------------------------- CAS1_RAT_I AQQASLAQKHHPRLSQVYYPNMEQPYRMNAYSQVQMRHPMSVVDQ--AQFSVQSFPQLSQ CAS1_SHEEP GSEN--SGKIT-MPLW-------------------------------------------- . * CAS1_BOVIN ---------------------------------------- CAS1_HUMAN ---------------------------------------- CAS1_MOUSE YDAFPLWAYFPQDMQYLTPKAVLNTFKPIVSKDTEKTNVW CAS1_PIG_I ---------------------------------------- CAS1_RABIT ---------------------------------------- CAS1_RAT_I YGAYPLWLYFPQDMQYLTPEAVLNTFKPIAPKDAENTNVW CAS1_SHEEP ----------------------------------------This alignment is better than the one above but not as good as the pileup alignment (see later). These casein genes are obviously homologous but difficult to align and may require a lot of hand editing with seqlab, a unix editor or a word-processing package. Note that there are many sites where there are no asterisks *. This means there are only a few sites where all of the sequences are identical. Any site where there is not an asterisk holds information about the relationship of these proteins to each other. If you have a very "good" alignment with most sites marked with an asterisk not much information about the phylogeny is available i.e. there are very few "informative sites", and any derived tree may not be a true representation of the real phylogeny. It might be worth looking at the DNA sequences if the proteins are all nearly identical. On the other hand, if the relationship between the sequences is distant, you may have an alignment filled with gaps. You will not have much success drawing reliable trees with such data either, especially if you use the "toss all gaps" option <as you should !> in, say, clustalW The GCG program pileup can also be used to align these casein sequences. To run pileup you must first be in GCG. The guide tree on which the order of sequence alignment is based is plotted as graphics output by pileup. To view this tree you will have to set your "graphics environment" before running pileup - see Appendix I. To Use two alignment viewing software tools later on: prettyplot and prettybox, you must be usinf GCGv8 for this next part. Where as GCGv9 can read and understand files in GCGv8 format, the reverse is not true. Both prettyplot and prettybox are GCGv8 tools, so convert to GCGv8 now. To do this enter:
To convert back to GCGv9 enter:
NOTE: Local instructions may vary. In the example below a printable postscript file is created. To start the program enter: % pileup PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment.PileUp of what sequences ? *.sw 1 cas1_bovin.sw 214 aa 2 cas1_human.sw 185 aa 3 cas1_mouse.sw 313 aa 4 cas1_pig.sw 206 aa 5 cas1_rabit.sw 215 aa 6 cas1_rat.sw 284 aa 7 cas1_sheep.sw 206 aa What is the gap creation penalty (* 12 *) ? What is the gap extension penalty (* 4 *) ? This program can display the clustering relationships graphically. Do you want to: A) Plot to a FIGURE file called "pileup.figure" B) Plot graphics on LASERWRITER attached to PlotPort C) Suppress the plotPlease choose one (* A *): C The minimum density for a one-page plot is 6.0 sequences/100 platen units. What density do you want (* 6.0 *) ? What should I call the output file name (* pileup.msf *) ? Determining pairwise similarity scores... 1 x 2 0.59 1 x 3 0.52 1 x 4 0.80 1 x 5 0.65 1 x 6 0.48 1 x 7 1.35 2 x 3 0.57 2 x 4 0.67 2 x 5 0.77 2 x 6 0.60 2 x 7 0.59 3 x 4 0.54 3 x 5 0.66 3 x 6 1.22 3 x 7 0.51 4 x 5 0.72 4 x 6 0.52 4 x 7 0.74 5 x 6 0.61 5 x 7 0.61 6 x 7 0.49 Aligning... 1 ..........-.. 2 ..............-.. 3 ..........-. ..........-. 4 ..........-. 5 ..........-. ..........-.. 6 ............-.. Total sequences: 7 Alignment length: 330 CPU time: 01.23 Output file:pileup.msfThis tree has an implied root at the top which should be ignored. This tree is quite different to the ClustalW neighbour-joining and PHYLIP PROTPARS trees. This is not really a fair comparison because this is only a UPGMA guide tree, not a tree based on the alignment. You could look at the pileup alignment simply by entering % more pileup.msf But this output is in a different format to the ones previously obtained. To make the alignments more easily comparable change the format of this alignment to clustal format. Do this % clustalw Choose 1 to input your alignment (pileup.msf).
Look at the alignment (first exit from ClustalW) % more pileup.aln The pileup alignment in clustal format follows CLUSTAL W(1.60) multiple sequence alignment cas1_mouse MKLLILTCLVAAAFAMPRLHSRNAVSSQTQQQHSSSE--E-----IFKQPKYLNLNQEFV cas1_rat MKLLILTCLVAAALALPRAHRRNAVSSQTQQENSSSEEQE-----IVKQPKYLSLNEEFV cas1_bovin MKLLILTCLVAVALARPKHPIKHQ-------GLPQEVLNE-NLLRFFVAPFPEVFGKEKV cas1_sheep MKLLILTCLVAVALARPKHPIKHQ-------GLSPEVLNE-NLLRFVVAPFPEVFRKENI cas1_pig MKLLIFICLAAVALARPKPPLRHQEHLQNEPDSREELFKERKFLRFPEVPLLSQFRQEII cas1_human MRLLILTCLVAVALARPKLPLRYPERLQNPSESSE--------------PIPLESREEYM cas1_rabit MKLLILTCLVATALARHKFHLGHLKLTQEQPESSEQEILKERKLLRFVQTVPLELREEYV *.*** ** * * * . * . cas1_mouse NNMNRQRALLTE-QNDEIKVTMDAASEEQAMASAQED-SSISSSSEESEEAIPNITEQKN cas1_rat NNLNRQRELLTE-QDNEIKITMDSSAEEQATASAQEDSSSSSSSSEESKDAIPSATEQKN cas1_bovin NELSKDIGS--------------ESTEDQAMEDIKQMEAESISSSEE---IVPNSVEQKH cas1_sheep NELSKDIGS--------------ESIEDQAMEDAKQMKAGSSSSSEE---IVPNSAEQKY cas1_pig NELNRNHGM--------------EGHEQ---------RGSSSSSSEE---VVGNSAEQKH cas1_human NGMNRQRNILREKQTDEIKDTRNESTQNCVVAEPEKMESSISSSSEE---MSLSKCAEQF cas1_rabit NELNRQRELLREKENEEIKGTRNEVTEEHVLADRET-EASISSSSEE---IVPSSTKQKY * . . . ***** .. cas1_mouse IANEDMLNQCTLEQLQRQFKYNQLLQKASLAKQASLFQQPSLVQQASLFQQPSLLQQASL cas1_rat IANKEILNRCTLEQLQRQIKYSQLLQQASLAQQASL------------------------ cas1_bovin IQK-E------------------------------------------------------- cas1_sheep IQK-E------------------------------------------------------- cas1_pig VQKEE------------------------------------------------------- cas1_human CRLNE------------------------------------------------------- cas1_rabit VPRED------------------------------------------------------- . cas1_mouse FQQPSMAQQASLLQQLLLAQQPSLALQVSPAQQSSLVQQAFLAQQASLAQKHHPRLSQSY cas1_rat ------AQQASLAQQALLAQQPSLAQQAALAQQASLAQQASLAQQASLAQKHHPRLSQVY cas1_bovin -------------------------------DVPSERYLGYLEQLLRLKKYKVPQLEIVP cas1_sheep -------------------------------DVPSERYLGYLEQLLRLKKYNVPQLEIVP cas1_pig -------------------------------DVPSQSYLGHLQG---LNKYKLRQLEAIH cas1_human ----------------------------------------------------YNQLQLQA cas1_rabit -------------------------------------------------------LAYQ- * cas1_mouse YPHMEQPYRMNAYSQVQMRHPMSVVDQALAQFSVQPFPQIFQYDAFP--LWAYFPQDMQY cas1_rat YPNMEQPYRMNAYSQVQMRHPMSVVDQ--AQFSVQSFPQLSQYGAYP--LWLYFPQDMQY cas1_bovin NSAEERLHSMKEGIHAQQKEPMIGVNQELAYFYPELFRQFYQLDAYPSGAWYYVPLGTQY cas1_sheep KSAEEQLHSMKEGNPAHQKQPMIAVNQ--------LFRQFYQLDAYPSGAWYYLPLGTQY cas1_pig ---DQELHRTNEDKHTQQGEPMKGVNQEQAYFYFEPLHQFYQLDAYPYATWYYPP---QY cas1_human AHAQEQIRRMNENSHVQ-----------------VPFQQLNQLAAYPYAVWYY-PQIMQY cas1_rabit PYVQQQLLRMKERYQIQEREPMRVVNQELAQLYLQPFEQPYQLDAYLPAPWYYTPEVMQY . . * * *. * * * ** cas1_mouse LTPKAVLNTFKPIVSKDTEKTNVW------ cas1_rat LTPEAVLNTFKPIAPKDAENTNVW------ cas1_bovin TDAPSFSDIPNPIGSENSEKT-TMPLW--- cas1_sheep TDAPSFSDIPNPIGSENSGKI-TMPLW--- cas1_pig IAHPLFTNIPQPTAPEKGGKTEIMPQW--- cas1_human VPFPPFSDISNPTAHENYEKNNVMLQW--- cas1_rabit VLSPLFYDLVTPSAFESAEKTDVIPEWLKN *It is interesting to note that the area of the sequences that we were looking at earlier is perfectly aligned in this sequence. You can feed the pileup alignment into ClustalW (filename : pileup.msf) and use clustal to draw a tree. This tree can then be viewed as before using DRAWTREE. This is a fairer comparison. Note that, in this case, it is essentially the same tree as obtained from the ClustalW alignment. GCG's alignment viewing softwareA GCG .msf file is particularly useless for viewing the aligned sequences. Use GCG pretty to create and display a consensus sequence and generally make it easier to read. If you have Xwindows capability, be sure to try seqlab to view and manipulate multiple sequence alignment files. Otherwise you are advised to try EGCG's pretty box and prettyplot to prepare your sequence for publication.PRETTYPLOTPrettyplot is an EGCG alternative to the GCG program pretty. It displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment it simply displays it. Prettyplot displays the aligned sequences with boxes around identical sites.PRETTYBOXPrettybox displays multiple sequence alignments as shaded boxes in Postscript format (i.e. the output file must be printed and/or displayed on a Postscript-compatible device). Prettybox will optionally calculate a consensus sequence. The program does not create the alignment it simply displays it.Like the GCG program pretty, both of these will take pileup output. Note carefully the format of the input. If, for example, your pileup output file was called pileup.msf (the default filename) then when asked "what sequences ?" you must enter pileup.msf{*}. And don't forget the {*} ! Exercise:Use prettyplot and prettybox to display the casein sequences pileup output from the "Multiple Sequence Alignment " section. To use that output file you will first need to be in that directory.% cd mult_seqs Then run prettyplot % prettyplot PRETTYPLOT displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment, it simplydisplays it. PRETTYPLOT uses any sequences PRETTYPLOT of what sequence(s) ? pileup.msf{*}
Start (* 1 *) ? End (* 330 *) ? cas1_mouse len: 330 wgt: 1.00 cas1_rat len: 330 wgt: 1.00 cas1_bovin len: 330 wgt: 1.00 cas1_sheep len: 330 wgt: 1.00 cas1_pig len: 330 wgt: 1.00 cas1_human len: 330 wgt: 1.00 cas1_rabit len: 330 wgt: 1.00 Find consensus to what minimum plurality (* 4.0 *) ? PostScript instructions for a LASERWRITER are now being sent to output.ps.To view this output enter % lpr -Pprintername output.ps Similarly to run prettybox, % prettybox PRETTYBOX displays multiple sequence alignments as shaded boxes inPostscript format (e.g., the output file must be printed and/or displayed on a Postscript-compatible device). PrettyBox will optionally calculate a consensus sequence. The program does not create the alignment; it simply displays it. PRETTYBOX uses any sequencesPRETTYBOX of what sequence(s) ?
pileup.msf{cas1_mouse}, len: 330 pileup.msf{cas1_rat}, len: 330 pileup.msf{cas1_bovin}, len: 330 pileup.msf{cas1_sheep}, len: 330 pileup.msf{cas1_pig}, len: 330 pileup.msf{cas1_human}, len: 330 pileup.msf{cas1_rabit}, len: 330 Start (* 1 *) ? End (* 330 *) ? Orient output as: L) Landscape P) Portrait Please choose one (* L *) ?Display a consensus (* No *) ? yes Find consensus to what plurality (* 3.6 *) ? Do numbering on: R) Right side T) Top side N) None Please choose one (* R *) ?Printing the output file is exactly as for prettyplot. Remember that the fileoutput.ps will get overwritten until you issue another % postscript command. For the casein dataset the alignments are likely to be "unconvincing" whatever parameters you chose for creating them. You might under these circumstances want to use GCG 9.1's seqlab program to try editing the .msf{*} file by hand. Remember to down-grade to GCG 8.1 for EGCG programs prettybox and prettyplot. Also note that to run any of these pretty* programs the input file is of the form file.msf{*}. Here the {*} indicates that you wish to include all the sequences in the .msf file. But if you leave off the {*} then the programs will not work for you. Its not a bug, but a GCG feature ! You might also like to try a terminal screen based tree manipulation and printing program such as njplot this is widely available for unix and also for Macintosh. The latter is particularly hand because it prints the tree out in a de-constructable PICT format that can be rapidly got to camera-ready print quality. You should also compare the clustalw server which, among many other
useful, point-and-clickable features and options, enables you to see an
alignment with colour-coded residues.
[BACK]
|
||||||
|