Indian Journal of Science and Technology
Year: 2017, Volume: 10, Issue: 38, Pages: 1-9
S. Naveen Kumar1 and S. Vijayaragavan2
1Department of Computer Science (Category-B), Bharathiar University, Coimbatore – 641046, Tamil Nadu, India; [email protected] 2Department of Computer Science and Engineering, Paavai Engineering College, Namakkal – 637018, Tamil Nadu, India; [email protected]
Objectives: Increased data generation mandates a highly scalable and powerful processing framework for root cause analysis. The objective is to identify such a framework by analyzing the existing processing architectures. Methods/Analysis: In-order to identify the best processing architecture for root-cause analysis, the existing architectures are divided in terms of sequential processing using python, CPU based parallelization, Hadoop MapReduce and Spark based parallel in-memory processing. Pre-processing the input text was identified to be the most process intensive component of any text based processing framework. Hence this module of the proposed root-cause analysis framework is implemented and is used for analysis. Findings: Performance is measured in terms of scalability, processing time, applicability, usability considering the streaming nature of data. Pre-processing module of the proposed framework is implemented in all of the considered processing architectures. Throttle points for each of the techniques is documented. It was identified that the scalability levels provided by sequential systems were not sufficient to handle the voluminous data. Considering the parallel approaches namely, CPU parallel, Hadoop MapReduce and Spark, it was identified that the CPU parallel approach exhibits effective performance until a certain level, after which the architecture fails. Hadoop and Spark based techniques exhibits high scalability levels, due to the underlying HDFS structure. However, their pros and cons in terms of other metrics indicate that the in-memory technique used by Sparkworks best both in terms of scalability and time complexity levels. Due to the dynamic nature of data under consideration, Spark architecture was identified to be the best for a root-cause analysis architecture. Novelty/ Improvement: A novel root-cause analysis framework incorporating pre-processing modules, aspect extraction and fuzzy based sentiment identification of aspects, rather than the conventional polarity analysis is proposed.
Keywords: Aspect Extraction, In-Memory Processing, Parallelization, Root Cause Analysis, Sentiment Analysis
Subscribe now for latest articles and news.