Total views : 705

A Framework for Generation of RDF Data from HTML


  • Department of Computer Science & Engineering, University of Kalyani, Kalyani - 741235, West Bengal, India
  • C.I.R.M. University of Kalyani, Kalyani - 741235, West Benga, India


Generally, legacy data of various domains have been expressed through different formats over the web. However, most of the contents are available in static html format. Nowadays, the majority of the current web pages are designed and developed in dynamic HTML. Different kind of advanced templates have been used. Until now, there is a plethora of web pages available in static HTML format to describe through the RDF format. In this article, an Ontology based framework has been proposed to nurture the static HTML pages and exploit them in RDF format for backup purpose as well as better reusability in the domain of Semantic Web.


HTML, Ontology, RDF, Regular Expression, Semantic Web

Full Text:

 |  (PDF views: 423)


  • Berners-Lee T, Hendler J, Lassila O. The Semantic Web.Scientific American. 2001 May; 29-37.
  • W3C Semantic Web Activity [Internet]. 2013 Jul 19 [cited 2014 Dec 12]. Available from:
  • Yu L. Introduction to the Semantic Web and Semantic Web Services. Boca Raton: Chapman & Hall/CRC; 2007.
  • RDF 1.1 Concepts and Abstract Syntax [Internet]. 2014 Feb 25 [cited 2014 Oct 22]. Available from:
  • Arasu A, Garcia-Molina H. Extracting structured data from web pages. International Conference on Management of Data (SIGMOD ‘03 ); 2003; San Diego, California, USA,New York: ACM; 2003. p. 337–48.
  • Davulcu H, Koduri S, Nagarajan S. Datarover: a taxonomy based crawler for automated data extraction from data-intensive websites. Proceedings of the 5th ACM International Workshop on Web Information and Data Management (WIDM ’03); New Orleans, Louisiana, USA,New York: ACM; 2003. p. 9–14.
  • Cafarella MJ, Madhavan J, Halevy A. Web-scale extraction of structured data. SIGMOD Record. 2008 Dec; 37(4):55–61.
  • Cafarella MJ, Halevy A, Madhavan J. Structured data on the web. Communications of the ACM. 2011 Feb; 54(2):72–9.
  • Myllymaki J. Effective Web Data Extraction with Standard XML Technologies. Proceedings of the 10th International Conference on World Wide Web (WWW’01); Hong Kong, New York: ACM; 2001. p. 689–96.
  • Grigalis T. Towards automatic structured web data extraction system. In: Albertas AL, Dzemyda CG, Vasilecas O, editors. Local Proceedings and Materials of Doctoral Consortium of the Tenth International Baltic Conference on Databases and Information Systems; 2012 Jul 8-11; Vilnius, Lithuania, Vilnius: Zara; 2012. p. 197–201.
  • Yu L. A developer’s guide to the semantic web. Berlin, Heidelberg: Springer; 2011. Chapter 10, DBpedia; p. 379–08.
  • Wikipedia [Internet]. 2015 Jan 15 [updated 2015 Jan 15;cited 2015 Jan 16]. Available from:
  • The DBpedia Information Extraction Framework [Internet]. 2014 Apr 11 [updated 2014 Apr 11; cited 2014 Dec 15]. Available from:
  • DBpedia Live [Internet]. 2015 Jan 8 [updated 2015 Jan 8;cited 2015 Jan 8]. Available from:
  • Morsey M, Lehman J, Auer S, Stadler C, Hellman S.DBpedia and the live extraction of structured data from Wikipedia. Program: electronic library and information systems. 2012; 46(2):157–181.
  • Noy NF, Mcguinness DL. Ontology development 101: a guide to creating your first ontology. Knowledge System Laboratory, Stanford University; 2001 Mar. p. 24 Report No: KSL-01-05
  • Gruber TR. Toward principles for the design of ontologies used for knowledge sharing?. International Journal of Human-Computer Studies. 1995; 43(5–6):907–28.
  • Blank nodes [Internet]. 2014 Oct 17 [updated 2014 Oct 17;cited 2014 Oct 22]. Available from: node
  • Milicic V. Problems of the RDF model: Blank Nodes. 2011 Jul 14 [cited 2014 Dec 15] In: Problems of the RDF model [Internet]. Bew Citnames A blog by Vuk Milicic. Available from:
  • Arenas M, Consens M, Mallea A. Revisiting Blank Nodes in RDF to Avoid the Semantic Mismatch with SPARQL. RDF Next Steps Workshop; 2010 Jun 26-27; Palo Alto, CA, USA,Palo Alto: W3C.
  • Getting started with Apache Jena [Internet]. Apache Software Foundation; 2011 [cited 2014 Dec 16]. Available from:
  • JTidy - HTML PARSER AND PRETTY-PRINTER IN JAVA [Internet]. 2014 [cited 2014 Dec 15]. Available from:


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.