|
|
|
|
|
Pro-Coffee A Multiple Sequence Alignment Tool for Promoter Regions |
|
|
|
|
|
What is Pro-Coffee?
Pro-Coffee is a multiple sequence alignment
method specifically designed for promoter regions. It is part of the
T-Coffee distribution. Pro-Coffee takes nearest-neighbour
nucleotide correlations into account when aligning DNA sequence. For
this it first translates sequences into a di-nucleotide
alphabet and then does the alignment using a specifically designed
di-nucleotide substitution matrix. This matrix was constructed
from binding sites alignments from the Transfac data base. A benchmark
on multi-species ChIP-seq data shows that validated binding
sites will be better aligned than when using off-the-shelf methods.
An article about the method and its evaluation is in preparation.
|
|
|
|
|
|
- Pro-Coffee is a special mode of T-Coffee. Download the latest T-Coffee version for Pro-Coffee here.
- Download benchmark data sets used in the paper "Use of ChIP-Seq data for the design of a multiple promoter alignment method" here.
|
|
|
|
|
|
|
|
Given a sequence file regions.fa , Pro-Coffee outputs three different kinds of files:
regions.dnd contains the guide tree used to assemble the progressive alignment,
regions.aln contains the final alignment in ClustalW format, and
regions.html contains the final alignment colored
according to and index ranging from red (very consistent) to blue (poorly consistent).
The example below shows part of a 2000 bp upstream regions alignment of the human gene c18orf19 to various orthologous
regions. Highlighted in yellow are ChIP-seq regions of the CEBPA transcription factor. Predicted binding sites falling
in this region are shown in green, predicted sites outside of the regions are shown in red. Pro-Coffe manages to align
the proven binding sites while default T-Coffee fails to align these sites.
(ChIP-seq raw data taken from: Dominic Schmidt, et al. Science 328, 1036 (2010))
|
|
|
The full documentation is on the T-Coffee Homepage. But the following shortcuts may be useful.
|
To run procoffee, type |
t_coffee regions.fa -mode=procoffee
|
|
To modify your gap costs (default gap opening -60, gap extension -1) type |
t_coffee regions.fa -method promo_pair@EP@GOP@-60@GEP@-1
|
|
|
|
|
|
|
Our projects rely on your feeback. Please send me an
E-mail if you wish to make a request, a comment, or report a bug!
*******************************************
Dr. Cedric Notredame, PhD.
Group Leader
Comparative Bioinformatics Group
Bioinformatics and Genomics Programme
Center for Genomic Regulation (CRG)
Dr Aiguader, 88
08003 Barcelona
Spain
Email: cedric.notredame@gmail.com
HOME : http://www.tcoffee.org/homepage.html
GROUP: CRG
Phone: +34 933 160 271
*******************************************
|
|
|
|