Indian Journal of Science and Technology
Year: 2017, Volume: 10, Issue: 25, Pages: 1-12
Akash Nag* and Sunil Karforma
Department of Computer Science, The University of Burdwan, Rajbati Burdwan – 713104, West Bengal, India; [email protected], [email protected]
*Author for correspondence
Department of Computer Science, The University of Burdwan, Rajbati Burdwan – 713104, West Bengal, India; [email protected]
Objectives: We present an algorithm to quickly identify conserved patterns from a set of aligned protein sequences. Method: Using contribution statistics, the proposed method identifies a motif describing the given set of sequences, and it is flexible enough to identify variable-length wildcard regions and also identifying motif elements based on regions containing amino-acids having similar physiochemical properties. In this paper, we compare its performance against other well-known motif-discovery algorithms, on three datasets: snake-toxins, insulin proteins, and methylated-DNA proteincysteine methyl transferase active-site enzymes. Findings: When tested with 91 neurotoxin protein sequences from 45 species of Elapid snakes, the algorithm successfully generated a motif which had a 97% precision. The motif generated by our algorithm had a 92% precision on the Insulin family and 96.5% on the MGMT family of proteins. Novelty: Our algorithm is fast, efficient, outperforms on average the commonly used motif generation algorithms in terms of accuracy, and never fails to report any motifs unlike some other algorithms.
Keywords: Flexible Wildcard Regions, Insulin, Motifs, Motif Generation, Patterns, PROSITE, Protein Families, Snake Toxins
Subscribe now for latest articles and news.