Fast and Effective Root cause Analysis of Streaming Data using In-Memory Processing Techniques

S  Naveen Kumar  and S  Vijayaragavan

doi:10.17485/ijst/2017/v10i38/114003

Article

Fast and Effective Root cause Analysis of Streaming Data using In-Memory Processing Techniques

VIEWS 891
PDF 13538

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2017/v10i38/114003

Year: 2017, Volume: 10, Issue: 38, Pages: 1-9

Original Article

Fast and Effective Root cause Analysis of Streaming Data using In-Memory Processing Techniques

S. Naveen Kumar¹ and S. Vijayaragavan²

¹Department of Computer Science (Category-B), Bharathiar University, Coimbatore – 641046, Tamil Nadu, India; [email protected] ²Department of Computer Science and Engineering, Paavai Engineering College, Namakkal – 637018, Tamil Nadu, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: Increased data generation mandates a highly scalable and powerful processing framework for root cause analysis. The objective is to identify such a framework by analyzing the existing processing architectures. Methods/Analysis: In-order to identify the best processing architecture for root-cause analysis, the existing architectures are divided in terms of sequential processing using python, CPU based parallelization, Hadoop MapReduce and Spark based parallel in-memory processing. Pre-processing the input text was identified to be the most process intensive component of any text based processing framework. Hence this module of the proposed root-cause analysis framework is implemented and is used for analysis. Findings: Performance is measured in terms of scalability, processing time, applicability, usability considering the streaming nature of data. Pre-processing module of the proposed framework is implemented in all of the considered processing architectures. Throttle points for each of the techniques is documented. It was identified that the scalability levels provided by sequential systems were not sufficient to handle the voluminous data. Considering the parallel approaches namely, CPU parallel, Hadoop MapReduce and Spark, it was identified that the CPU parallel approach exhibits effective performance until a certain level, after which the architecture fails. Hadoop and Spark based techniques exhibits high scalability levels, due to the underlying HDFS structure. However, their pros and cons in terms of other metrics indicate that the in-memory technique used by Sparkworks best both in terms of scalability and time complexity levels. Due to the dynamic nature of data under consideration, Spark architecture was identified to be the best for a root-cause analysis architecture. Novelty/ Improvement: A novel root-cause analysis framework incorporating pre-processing modules, aspect extraction and fuzzy based sentiment identification of aspects, rather than the conventional polarity analysis is proposed.

Keywords: Aspect Extraction, In-Memory Processing, Parallelization, Root Cause Analysis, Sentiment Analysis