• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: 30, Pages: 1-6

Original Article

Automatic Extraction of Objects and their Attributes from Semi-Structured Web Tables for E-commerce Tasks

Abstract

Most business Web documents provide key information for decision-making in the form of text tables. High-speed retrieval and analysis of such tables is of great interest to the business and at the same time is a major challenge for researchers. The problem lies in the fact that text tables in Web documents are rarely self-described, i.e. contain no data schemas. In this paper we consider the special case of the problem, limited by the scope of e-commerce. The main objects of e-commerce are goods, works, services and other objects represented in text tables as sets of characteristics (attributes). In fact, these text tables compactly map domain objects into sets of “attribute-value” pairs. Thus, the extraction of meaningful information from the tables can be interpreted as extraction of sets of “attribute-value” pairs. We need to interpret these attributes correctly using a domain knowledge base, since the search engine does not have any information about neither the nature of these attributes, nor their internal relations. In this paper we use as knowledge base semi-extensible e-commerce ontology, based on economic classifiers of the Kazakhstan unified system of classification and coding of technical, economic and social information.
Keywords: E-Commerce, Information Retrieval, Ontology-Based Text Mining, Table Understanding

DON'T MISS OUT!

Subscribe now for latest articles and news.