Mining Issues in Traditional Indian Web Documents

Kolla Bhanu Prakash; nbsp

doi:10.17485/ijst/2015/v8i32/77056

Article

Mining Issues in Traditional Indian Web Documents

VIEWS 1234
PDF 398

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i32/77056

Year: 2015, Volume: 8, Issue: 32, Pages: 1-11

Original Article

Mining Issues in Traditional Indian Web Documents

Kolla Bhanu Prakash^1,2*

¹Faculty of Computer Science Engineering, Sathyabama University, Chennai - 600119, Tamil Nadu, India² Faculty of Computing, Chirala Engineering College, Chirala - 523157, Andhra Pradesh, India [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Recent developments in information technology are mostly in areas where information, content creation and knowledge integration are the driving forces. Beginning with adjusting to complexities in internet and mobile communications, these developments are becoming significant sources of knowledge and expertise creators and this is where countries like India and China play a major role. Indian tradition is considered more than 5000 years old and proofs of some of this are available even now on written, oral and real forms like Mahabharata on text or Mohenjo-Daro-Harappa as structures. This study presents issues at extracting information from traditional Indian documents and a method of evaluating content as language, script and form of the web documents are significantly varied. The development is based on pixel level to make the approach generic and presents results for some basic issue at text level and how this can be extended to word and document level.
Keywords: Attribute Generation, Data Mining, Data Preparation, Information Extraction, Tradition, Voxel