Automatic Extraction of Objects and their Attributes from Semi-Structured Web Tables for E-commerce Tasks

Yerzhan Baiburin and Aliya Nugumanova

doi:10.17485/ijst/2015/v8i30/88013

Article

Automatic Extraction of Objects and their Attributes from Semi-Structured Web Tables for E-commerce Tasks

VIEWS 677
PDF 207

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i30/88013

Year: 2015, Volume: 8, Issue: 30, Pages: 1-6

Original Article

Automatic Extraction of Objects and their Attributes from Semi-Structured Web Tables for E-commerce Tasks

Yerzhan Baiburin and Aliya Nugumanova^*

Department of Information Technologies, D. Serikbayev East Kazakhstan State Technical University, Ust Kamenogorsk, Kazakhstan;
[email protected], [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Most business Web documents provide key information for decision-making in the form of text tables. High-speed retrieval and analysis of such tables is of great interest to the business and at the same time is a major challenge for researchers. The problem lies in the fact that text tables in Web documents are rarely self-described, i.e. contain no data schemas. In this paper we consider the special case of the problem, limited by the scope of e-commerce. The main objects of e-commerce are goods, works, services and other objects represented in text tables as sets of characteristics (attributes). In fact, these text tables compactly map domain objects into sets of “attribute-value” pairs. Thus, the extraction of meaningful information from the tables can be interpreted as extraction of sets of “attribute-value” pairs. We need to interpret these attributes correctly using a domain knowledge base, since the search engine does not have any information about neither the nature of these attributes, nor their internal relations. In this paper we use as knowledge base semi-extensible e-commerce ontology, based on economic classifiers of the Kazakhstan unified system of classification and coding of technical, economic and social information.
Keywords: E-Commerce, Information Retrieval, Ontology-Based Text Mining, Table Understanding