Indian Journal of Science and Technology
Year: 2013, Volume: 6, Issue: 8, Pages: 1-12
G. M. Karthik1* and Ramachandra V. Pujeri2
The periodicity search in longest common subsequences in multiple strings has a number of application, is an interesting data mining problem. Periodicity is very common practice in longest common subsequence mining algorithm. This work introduces a new parallel algorithm for finding periodicity in multiple strings. Few existing algorithms lacks in poor scalability, lacks in finding all longest pattern, and for finding symbol, partial and full periodicity. We designed the algorithm using FP-tree for finding periodicity for most common longest substring in multiple sources. We introduce a parallel algorithm for Constraint Based Periodic Pattern Mining (CBPPM) algorithm, which takes O(kN) for finding periodicity and O N( ) ( × × L h)/ p time for MLCS pattern. We tested parallel algorithm on a coarse-grained multi-computer (BSP/CGM) model with p m < processors that takes O N( × L p) space per processor, with O p ( ) log communication rounds. We derive a practical implementation that works better for arbitrary length of input sequence. The algorithm is noise resilient, and shown its performance in presence of replacement, insertion, deletion, or mixture of these types of noise. We experimented with synthetic and real data reveals a near linear speedup with scalable performance. The comparative study shows algorithm’s applicability and effectiveness, generally more noise resilient.
Keywords: Frequent Pattern (FP) Tree, Multiple Longest Common Subsequence (MLCS), Periodicity Mining, Noise Resilient, Parallel Processing, Data Mining.
Subscribe now for latest articles and news.