Total views : 439

Creating Values from a Noisy Accumulated Contents Based on Data Analysis


  • NTIS Center, Korea Institute of Science and Technology Information, Daejeon, 305-806, Korea, Republic of


The contents of the information system play a major role to user services. The quality of contents not only depends on the accuracy and availability but also depend on the depth of the information. As the size and the quality of the information item increases, the need to create meaningful analyzed data also increases. But it is not easy to extract valuable information from the unfiltered noisy data. Using these accumulated data, we want to add valuable information based on data analysis. With a preliminary validation of data items in a preparation step, we found that about 70% of data items could be used as a source of getting statistics. After applying time series analysis, correlation analysis between data items and regression analysis we found some informative relations between the data items. These value added information could be added to the original data set as a source of another analysis.


Analysis, Regression, Relation, Time-Series.

Full Text:

 |  (PDF views: 245)


  • LLim CS, Kim JM, Yoon YJ, Shon KR, Kim JS. Data content constraint management for national r&d data quality improvement. International Conference on Convergence Contents; 2011. p.17-18.
  • Lim CS, Shon Kr, Kim TH, Han SW, Lee WG, Kim JM. Improvement of R&D information management to support convergence research. International Conference on Convergence Technology; 2013.
  • Ceri S, Widom J. Deriving Production Rules for Constraint Maintenance. Proceedings of 16th International Conference on VLDB; 1990 Aug 13-16; Brisbane, Australia. p 566-7.
  • Ceri S, Cochrane R, Widom J. Practical applications of triggers and constraints: success and lingering issues. Proceedings of 26th International Conference on VLDB; 2000 Sep 10-14; Cairo, Egypt. p. 254-62.
  • Available from:
  • Mosteller, Frederick, Tukey JW. Data analysis and regression: a second course in statistics. Addison-Wesley Series in Behavioral Science: Quantitative Methods. 1977.
  • Hosmer, David W, Lemeshow S, Sturdivant RX. Introduction to the logistic regression model. John Wiley & Sons, Inc; 2000.
  • Phillips, Peter CB, Perron P. Testing for a unit root in time series regression. Biometrika. 1988; 75(2):335-46.
  • Zikopoulos P, Eaton C. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. 1st ed. McGraw-Hill Osborne Media; 2011.
  • Kwak JH, Yoon J, Jung YH, Hahm J, Park D. Large-scale data analysis based on hadoop for Astroinformatics. Journal of KIISE. 2011;17(11):587-91.
  • Borthakur D. The hadoop distributed file system: Architecture and design. Apache Software Foundation; 2007.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.