Regular expression-based motifs, at the origin of the concept of
functional signatures for proteins are progressively being supplanted by more sophisticated probabilistic methods, such as position weight matrices, Hidden Markov Models and Neural Networks.
We will present a new strategy to turn multiple alignments into specific regular expressions using an interactive JAVA-based software
system called REAL (Regular Expression Analysis and Location). Our approach is based on the computation and parallel display of
various profiles computed both from the average value and variability, of numerous amino acid properties. Relevant positions can also
be selected on the basis of their information content, relative to a given property. The Information Content provides a quantitative
measure of the contrast between residue variability and property invariance. In the automatic mode, a simple probabilistic framework
is used to extract the most informative positions. Once the position selected on the basis of one (or several) of the many available
criteria, a regular expression is automatically generated from the multiple alignment using a simple quantitative rule. Tentative
signatures are tested with the companion database scanning program LOOKFOR. Possible applications of the REAL software will be
presented through two examples.
|