ProMot: A Tool for the Fast Discovery of Functional Motifs from Aligned Protein Sequences

Akash Nag  and Sunil Karforma

doi:10.17485/ijst/2017/v10i25/101419

Article

ProMot: A Tool for the Fast Discovery of Functional Motifs from Aligned Protein Sequences

VIEWS 1349
PDF 843

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2017/v10i25/101419

Year: 2017, Volume: 10, Issue: 25, Pages: 1-12

Original Article

ProMot: A Tool for the Fast Discovery of Functional Motifs from Aligned Protein Sequences

Akash Nag^* and Sunil Karforma

Department of Computer Science, The University of Burdwan, Rajbati Burdwan – 713104, West Bengal, India; [email protected], [email protected]

^*Author for correspondence
Akash Nag
Department of Computer Science, The University of Burdwan, Rajbati Burdwan – 713104, West Bengal, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: We present an algorithm to quickly identify conserved patterns from a set of aligned protein sequences. Method: Using contribution statistics, the proposed method identifies a motif describing the given set of sequences, and it is flexible enough to identify variable-length wildcard regions and also identifying motif elements based on regions containing amino-acids having similar physiochemical properties. In this paper, we compare its performance against other well-known motif-discovery algorithms, on three datasets: snake-toxins, insulin proteins, and methylated-DNA proteincysteine methyl transferase active-site enzymes. Findings: When tested with 91 neurotoxin protein sequences from 45 species of Elapid snakes, the algorithm successfully generated a motif which had a 97% precision. The motif generated by our algorithm had a 92% precision on the Insulin family and 96.5% on the MGMT family of proteins. Novelty: Our algorithm is fast, efficient, outperforms on average the commonly used motif generation algorithms in terms of accuracy, and never fails to report any motifs unlike some other algorithms.

Keywords: Flexible Wildcard Regions, Insulin, Motifs, Motif Generation, Patterns, PROSITE, Protein Families, Snake Toxins