Total views : 79
ProMot: A Tool for the Fast Discovery of Functional Motifs from Aligned Protein Sequences
In this paper, we present an algorithm to quickly identify conserved patterns from a set of aligned protein sequences. Using contribution statistics, the proposed method identifies a motif describing the given set of sequences, and it is flexible enough to identify variable-length wildcard regions and also identifying motif elements based on regions containing amino-acids having similar physiochemical properties. In this paper, we compare its performance against other well known motif-discovery algorithms, on three datasets: snake-toxins, insulin proteins, and methylated-DNA protein-cysteine methyltransferase active-site enzymes. When tested with 91 neurotoxin protein sequences from 45 species of Elapid snakes, the algorithm successfully generated a motif which had a 97% precision. The motif generated by our algorithm had a 92% precision on the Insulin family, and 96.5% on the MGMT family of proteins. Our algorithm is fast, efficient, outperforms on average the commonly used motif generation algorithms in terms of accuracy, and never fails to report any motifs unlike some other algorithms.
motif generation; PROSITE; protein families; patterns; motifs; flexible wildcard regions; snake toxins; insulin
- Bairoch, Amos. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Research 19.Suppl (1991): 2241-2245.
- Ogiwara A, Uchiyama I , Yasuhiko S, Kanehisa M. 1992. Construction of a dictionary of sequence motifs that characterize groups of related proteins. Protein Eng 5:479-488.
- Saqi MAS, Sternberg MJE. 1994. Identification of sequence motifs from a set of proteins with related function. Protein Eng 7:165-171.
- Wang JTL, Marr TG, Shasha D, Shapiro BA, Chirn GW. 1994. Discovering active motifs in sets of related protein sequences and using them for classification. Nucleic Acids Res 22:2169-2775.
- Koza, John R., and David Andre. Automatic discovery of protein motifs using genetic programming. Yao, Xin (editor) (1996).
- Smith RF, Smith TF. 1990. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Nutl Acud Sei USA 87:118-122.
- Henikoff S, Henikoff JG. 1991. Automatic assembly of protein blocks for database searching. Nucleic Acids Res I9:6565-6572.
- Jonassen, Inge, John F. Collins, and Desmond G. Higgins. Finding flexible patterns in unaligned protein sequences. Protein science 4.8 (1995): 1587-1595.
- Durbin, Richard, et al. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press, 1998.
- Bailey, Timothy L., et al. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic acids research 34.suppl 2 (2006): W369-W373.
- Leibovich, Limor, and Zohar Yakhini. Efficient motif search in ranked lists and applications to variable gap motifs. Nucleic acids research 40.13 (2012): 5832-5847.
- Leibovich, Limor, et al. DRIMust: a web server for discovering rank imbalanced motifs using suffix trees. Nucleic acids research 41.W1 (2013): W174-W179.
- Edgar, Robert C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32.5 (2004): 1792-1797.
- Bairoch A, Boeckmann B. 1992. The SWISS-PROT protein sequence data bank. Nucleic Acids Res 20:2019-2022.
- Hodgson, Wayne C., and Janith C. Wickramaratna. In vitro neuromuscular activity of snake venoms. Clinical and Experimental Pharmacology and Physiology 29.9 (2002): 807-814.
- Lindahl, Tomas, et al. Regulation and expression of the adaptive response to alkylating agents. Annual review of biochemistry 57.1 (1988): 133-157.
- Samson, Leona. The suicidal DNA repalr methyltransferases of microbes. Molecular microbiology 6.7 (1992): 825-831.
- De Castro, Edouard, et al. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic acids research 34.suppl 2 (2006): W362-W365.
- Grant, Charles E., Timothy L. Bailey, and William Stafford Noble. FIMO: scanning for occurrences of a given motif. Bioinformatics 27.7 (2011): 1017-1018.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.