Indian Journal of Science and Technology
Year: 2017, Volume: 10, Issue: 30, Pages: 1-8
Sachin Sharma1* , Sandip Kumar Goyal1 and Kamal Kumar2
1Department of Computer Engineering, M. M. Engineering College, M. M. University, Ambala – 133207, Haryana, India; [email protected] 2School of Computer Science and Engineering (SoCSE), UPES, Dehradun – 248007, Uttarakhand, India; [email protected]
*Author for the correspondence:
Department of Computer Engineering, M. M. Engineering College, M. M. University, Ambala – 133207, Haryana, India; [email protected]
Objectives: To apply ETLR (Extraction, Transformation, Loading and Retrieval) paradigm to build an efficient, effective and cost effective data warehouse for telecom industry. The focus point is to optimize every layer of telecom DWH. Methods: The data techniques used are making use of telecom infrastructure, i.e. MSC files and applying segregation logic at the source layer i.e. mediation layer. Files are pushed towards predefined separate destinations and applying multiple technology mix mainly of database inbuilt utilities and custom scripts to avoid use of commercial ETL tools; and at the same time achieving enhanced performance at every front. Technology mix includes source optimization, external table implementation and switching, DB copy utility and retrieval level optimization. We have used data loading statistics to compare the results. Findings: The ultimate result is a telecom data warehouse and the result that we have achieved using ETLR paradigm improved the data processing of the data many folds. The motive is to optimize every layer that comes in between the data warehouse building process. Source level optimization leads at the data cleaning at the source level itself, thus shifting the load at the source system and reduced the load on the DWH servers. We have supplied bunch of files to the external tables and thus utilizing the OS storage for tabular data. Transforming data using views and push them into partitioned tables using DB copy utility improved the overall performance. Using query optimization techniques and DB level tuning ensures the data availability in minimum time. The data availability of a standard DWH is sysdate-1; but in our case, we have reduced it to approx. 4 hours with indexes intact. The scalability is also a very strong point of our ETLR paradigm. Now telecom operators have a better system available for building their data warehouse without taking care of heavy license fee for commercial tools. Application/Improvements: The application of the paradigm is in mostly every sector where data processing is a big challenge and cost is a major factor. We have given its application in telecom sector in this paper. The same can be implemented in Banking Sector, Insurance Sector, social media etc. and we can put it on cloud also in case hardware is a constraint.
Keywords: Big-Data, ETL, Mobile Data, Retrieval, Scripts, Telecom Sector
Subscribe now for latest articles and news.