A Review on Ontology Based Document and Image Retrieval Methods

Web is a collection of huge data. Using search engine, the information is retrieved from web pages. Search engines arrange the retrieval results using various ranking algorithms. There are two type of searching techniques which are based on content or based on statistical searching techniques. Unless opening each web page separately, the user cannot realize the content inside to it. By taking this measure as the key point, we have created an ontology-based O-A-V information extraction web model. This model will help the users to rephrase their keywords in their query on their next attempts. Almost the image data in web are all in digital form. There has been a substantial boost in internet usage which allows access to images from remote places. The thirst for retrieving interested images from big dataset is a challenging problem. Most of the times, the retrieval may contain unwanted images due to the gaps studied in the literature. Many people started working with keyword based image retrieval, but in retrieval it resulted in a lot of spurious images with an understanding that a few words are not sufficient to describe an image completely. The image feature extraction process analyzes each pixel in the image to extract the possible features in that pixel. If the similar process is repeated for the all image pixels, it is called global feature extraction. This process will increase the feature vector size also. Sometimes it is more than enough to retrieve salient features in the parts of an image called local features that can represent the entire image. Once the feature descriptor or feature vector is formed, they will help in similarity matching of retrieval process. The basic CBIR image retrieval system works in the following manner. The feature extraction step is common for a query image as well as to the database images. The feature vector Abstract


Introduction
Web is a collection of huge data. Using search engine, the information is retrieved from web pages. Search engines arrange the retrieval results using various ranking algorithms. There are two type of searching techniques which are based on content or based on statistical searching techniques. Unless opening each web page separately, the user cannot realize the content inside to it. By taking this measure as the key point, we have created an ontology-based O-A-V information extraction web model. This model will help the users to rephrase their keywords in their query on their next attempts.
Almost the image data in web are all in digital form. There has been a substantial boost in internet usage which allows access to images from remote places. The thirst for retrieving interested images from big dataset is a challenging problem. Most of the times, the retrieval may contain unwanted images due to the gaps studied in the literature. Many people started working with keyword based image retrieval, but in retrieval it resulted in a lot of spurious images with an understanding that a few words are not sufficient to describe an image completely.
The image feature extraction process analyzes each pixel in the image to extract the possible features in that pixel. If the similar process is repeated for the all image pixels, it is called global feature extraction. This process will increase the feature vector size also. Sometimes it is more than enough to retrieve salient features in the parts of an image called local features that can represent the entire image. Once the feature descriptor or feature vector is formed, they will help in similarity matching of retrieval process. The basic CBIR image retrieval system works in the following manner.
The feature extraction step is common for a query image as well as to the database images. The feature vector can be used for indexing and that can also be stored in the database. So, each time there is no need to extract the features for database images. The similarity is computed between query as well as the database images. Then the resulting distance measure was sorted to produce the rank of images in image retrieval. In this process by incorporating ontology can reduce the "semantic gap 1 " of user understanding of images and system computation of image semantics.

Ontology Based Retrieval Methods Related to Document Retrieval
The first ontology based knowledge discovery in World Wide Web was developed by 2 which was suffering from high-level semantic features. A step towards to build a semantic web browser was created by 3 which used the old retrieval techniques and lacking with lot of semantic features. The question answering system called AquaLog 4 was developed but it can't capture the whole semantic of user minds. The perfect benchmark for text retrieval was developed with the name Text Retrieval Conference (TREC) 5 and still it continues as one of the best benchmarking systems for text and document retrieval applications. The TREC gives better results but annotation is tedious. The document retrieval system called Dbpedia 6 which extract information from Wikipedia and organizes and serve using ontology. The major drawback is all about the trust worthy of contents in Wikipedia. The dissertation 7 used the concept of combining keywords into SPARQL using knowledge bases but it fails to build Resource Description Framework (RDF) automatically and could not be directly applied to semantic web retrieval. There is an intelligent generation of SPARQL queries found in Bio-Semantic framewor 8 , but it is specific only to bio-informatics applications. An image prediction model based on relevance 9 using the query context was built using the retrieval technique called bag-of-object is found in. But, it didn't make an attempt of filling the semantic gap of user intention with system computations. The ontology based knowledge graph 10 in is an enriched semantic search technique. It uses probabilistic modelling framework, linking and representing the facts. Also, it created an atmosphere of a compulsion to go to the links suggested. The similar concept is again improved by 11 which organizes information with well-structured entity relationships. But it is developed for Chinese language. A semantic based document retrieval system 12 was developed using enhanced ontology based approach. The attempt made was with only text and it has not guaranteed to use it in image retrieval applications. The comparison of this review is summarized in Table 1. The amount of information has increased exponentially on the web over the past years. Always the best web result will be retrieved by the Search Engines but it will not give any knowledge to select the most suitable result. Web Browser just renders the web page with no understanding of the content. With the increasing amount of data and unwillingness of the users to go through the entire list of results, we need an enhanced search method to represent it in a better way so that the users will get the idea of inner contents in the web results.A comparative analysis of data mining used data retrieval and dynamic decision quadtree used data retrieval is found in literature. An information retrieval in speech recognition using neural net is also found in literature.

Extraction of O-A-V (Object-Attribute-Value) using Ontology Which is Applied in Search Engine Retrieval Result
An ontology applied O-A-V extraction is found in 13 . The proposed architecture of O-A-V based method is shown in Figure 1. Algorithms for Developing Ontology  if there is communal classes with common ancestors then add individuals to it; eliminate that individuals and increase the ancestor as alternative individual in the set; untilthere are no individuals found

Search Engine with Light-Weight Ontology
A light-weight ontology based search engine which was built using this proposed method is shown in Figure 2. This model represents retrieval contents in a semantic way. It assists the user in retrieval content search. We can extend this model to organize the retrieval contents into various classes / topics within web browsers itself. Users can save time by viewing only the required topics rather going for unnecessary topics.

Ontology Based Image Retrieval Methods
The image retrieval based on keywords related techniques found in 14,15,16 used texts, field and structure based methods, but keywords are not enough to capture the complete semantics. The retrieval methods using low level color features identical as color similarity and color coherence vectors were found in 17,18 . The other retrieval works using low level texture features identical as texture descriptors and Wavelet based CBIR techniques are found in 19,20 . The shape based retrieval method found in 21 using template matching technique was only trust on shape features. Even though, it is obvious that a single low-level feature can't capture the full semantic of an image and hence the related image retrieval methods were suffering from semantic gap. The Scale Invariant Feature Transform (SIFT) feature found in 22 was used to predict the amino acid changes in protein structure. But, in 23 work concluded the necessity of discovering better image feature which is faster in computations.
The novel scale and rotation invariant feature descriptor was proposed by 24 called Speeded Up Robust Features (SURF), which is used in CBIR visual attention model 25 .
However, the SURF feature invention was not intended to fill the semantic gap. The fantastic survey on CBIR with high-level semantics 1 addressed all issues in image retrieval techniques. Another ontology based cognitive vision method 26 was proved using ontology used only limited visual features. The comparison of this review is summarized in Table 2.
In conservative image retrieval systems, the indexing and retrieval is done based on the keywords. The key words can't mean better than the content of an image mean. The text descriptions are used to define the image content in the text based an image retrieval technique which often creates ambiguity and inadequacy in query processing and performing an image database search. The process of assigning meta data with captions or keywords to a digital image is known as automatic image annotation or automatic image tagging. Retrieval based on texts is lexically motivated than conceptually motivated and hence it leads to unrelated results in data retrieval. Lexically motivated information retrieval means that text based retrieval operates on the word level but not on the meaning of words. But the very basic idea of ontology's is that they are conceptually motivated. That means it can be applied to specify the actual meaning of things and not like words as textual strings. The evaluations of image retrieval based on are shown in Table 3.

Content-Based Image Retrieval
The basic of CBIR system works as shown in Figure 3. It includes feature extraction, indexing, feature distance measure computation, ranking and retrieval. Various advancements in CBIR methods are evolving every year. A boosting framework, fusion of contourlet transforms and Zernike moments and a computations intelligence based hybrid approaches are found in literature. The CBIR uses the computer vision methods for digital image retrieval from databases. "Content based search" will perform the analysis with the actual contents instead keywords or tags annotated with the image. The word 'content' here might deals to colors, shapes, textures, or spatial orientation that can be obtained within the image. The web related image search engine relies on meta data and so this generates a lot of garbage results. Hence, CBIR is desirable in this case. Giving manual keywords to search images in a large database may retrieve wrong results. Also it is costly process and may not identify all keywords that postulate the image and hence it is inefficient. By providing a good indexing technique based on the actual contents of images may retrieve and produce accurate results. The CBIR improves the main usefulness of Picture Archiving and Communication System (PACS). It retrieves pictures with patterns of images instead of using alpha numeric indices.  The CBIR systems mostly suffer from "semantic gap". It is a gap of high level image grasping of a human mind with the low level image estimation from computers. Recent CBIR techniques both includes the low level features like texture, colourand shape and high level features similar to facial expressions. In CBIR, feature detection and extraction are a low-level image processing operations. The process examines each pixel to detect if there exist a possible feature at that pixel, do it as an initial operation on that image. If this is a sub module of a bigger algorithm, then let this will test the image in the area of features. Before performing feature detection, use the Gaussian kernel in a scale space to mild the input image, and estimate one or more image features usually represented using confined derivative operations. In certain circumstances if the images feature detection process looks and results with high computational time then go for another best algorithm to find only some image parts for the searching features. Once if the features have been obtained, a confined image segment surrounding the feature could be extracted using some of the image processing techniques. This process results in producing feature descriptor or feature vector. This extracted features help to perform a similarity matching in the CBIR retrieval process. The low level features similar to texture, color, shape, spatial location, SIFT feature, SURFfeature and their extraction is defined in the following sections.

Features Based on Color
It is the commonly adopted feature in image retrieval. Various color spaces are used for defining colors. Those color spaces are used depending on different applications. Lot of different color space description is discussed in 26,27 they contain LAB, RGB (Red, Green, Blue), HSV (Hue, Saturation and Value)HSL(Hue, Saturation, Lightness), LUV, YCrCb and the Hue-Min-Max-Difference (HMMD) 28,29 . The color covariance, color histogram and color moments 30,31 are mostly used color features in RBIR. The leading color, scalable color, color structure and color layout are the mainly used color features in Moving Picture Experts Group (MPEG)-7 32 . With the origin of three color features, hue-huepairand hue are estimated and the color invariants are built. The high level semantics are not straight related to the above said color features. For mapping the region colors into semantic color names with high level semantics, the region with color average could be used as the image color feature 29,33 .
If the segmentation is erroneousness, then it will end up as the original region is visually different from the average color. From 34 , it is understood that in many cases, the dominant color and average color are very identical, but in few cases they look very different. Depending on the segmentation results only the color features are selected. It is observed that average color is not a desirable choice whether the segmentation results objects that do not have similar colors. In the literature, it is found that the color based CBIR techniques uses images which are notpreprocessed. The appropriate color filters 27,35 are essential to enhance the retrieval efficiency because the color images are always damaged with noises.

Texture Features
Few systems donot make use of texturefeatures 29,17 for image retrieval as like the color features. The textureisan another salientfeaturefor describinghigh levelsemanticsin retrieval of images,because it providesessential details inimagecataloguingasitdefinesthecontextofmanyreal world images like clouds, fruitskin, bricks, fabric and trees.The result of applying Wavelet transform 36 or Gaborfiltering 37 , confined statistical measures like wold features proposed by 38 and sixTamuratexturefeatures 39 are the commonly using texture features in the process of image retrieval. The regularity, line likeness, roughness, directionality, contrast and coarseness are the various Tamurafeatures. Among them coarseness, directionality and regularity are themost important 39 features.These three are related tootherare less effective with respect to texture description.
The texture browsing descriptors 19,32 are obtained from MPEG-7. They are regularity, directionality and coarseness. It is found that Brodatz texture 40 will perform outstanding with word features likerandomness,directionality and periodicity. The Tamura features fail to work for multiple resolutions that are considered for measurement. The wold features are get affected by image distortions like orientation differences due to viewpoint distortion 41 and scale. If the texture regions in the image are not organized and similar 40 , it would result in poor retrieval response for natural scene images. But for Brodatz textures, the above will work well. The human vision study 31,37 may match well with Wavelet and Gabor features in most of the image retrieval. But the actual design of Gaborfilterandwavelettransform is only meant for rectangular images.Butin RBIR, the region of image is having erratic-shapes. Hence, in such type of retrieval methods, the texture features are received related to the texturenatureofpixelsortinyblockspresentedinthe region 29,37 . But for natural image representation 32 , the Edge Histogram Descriptor(EHD) is most suitable and effective.

Shape Feature
One of the most distinctconcepts is shape feature. This feature has consecutive boundary segments, aspectratio,Fourier descriptors, circularity and momentinvariants 42 . The color and texture features are more useful in domain particular images like manmade objects. Still, theshapefeaturesareessential featuresbut they do not have that much popularity in Region-Based Image Retrieval (RBIR) like texture and color features. The erroneousness of segmentation has been resulted that they are not as famous as texture and color features. To explore the inherent benefits of RBIR, the shape features could be used as evaluators by some system. For instance, the orientation and eccentricity features are used for this purpose discussed in 29 .

Spatial Location
Not only texture, color and shape features are important but alsospatiallocation feature is very much useful with region cataloguing. For an instance an image containing trees with gross in ground could have comparable texture and color features, but the spatial locations of them are dissimilar with tree leaves normally appears at the top of an image, whilegross leaves at the bottom. So, it is very easy to define the spatial locations as 'left, right, top and bottom' depending on the place of the region in an image 43,44 . The minimum bounding rectangle and region centroid are utilized to find the spatial location details are found in 45 . Also the center spatial of a region has been applied to define the spatial location details were discussed in 29 .
In semantic feature extraction, relative spatial affinity is more essential than complete spatial location. The directional affinity between objects such as 'right1left' and 'above1below' have been easily described using 2D-string 46 and with its alternative. Only directional affinities are not enough by without considering the topological affinities while representing the semantic image contents. The algorithm in 47 which refers the touch, front, right, up, left, down with spatial context modeling is used to offer performance enhancement in semantic related image retrieval.

SIFT Feature
The SIFT algorithm is invariant to changes in orientation and scale of an image. The first stage of this algorithm is constructed using scale space with Gaussian function. The next key stage is to obtain difference of Gaussian where potential interest points are identified. It uses k-d tree to identify nearest neighbours with less computational time. Then consistent orientation is assigned to the key points. Histogram is created using sample points of the image and highest peak of the histogram is noted. A few top peaks within that range are used to create a key point with that orientation. Finally, the key point descriptors are built forming histograms on 4×4 pixel neighbourhood with 8 bins each. Hence SIFT algorithm generates 4 × 4 × 8 = 128 dimensions and elements.

SURF Feature
It stands for SURF. It was to some extent inspired by the SIFT algorithm. The standard SURF version extracts minimal points which are the strongest features of a given image. The points of interest are considered by calculating the image variance. In next step, a vector is created to excerpt the required image features. So, the number of interested points and the number of SURF features are always same. Because it extracts only limited number of points (64 column matrix), it is computationally faster as compared to SIFT. The new method of fusing the SIFT and SURF features is discussed in 48 which gave the better retrieval efficiency than these individual methods.

Image Similarity Using Visual Signatures
After obtaining image signature, the next step is to focus on accurate image retrieval. Various basic frameworks have been already defined for image similarity. The most wanted features to be ensured in this are local linearity (using triangle inequality in a neighbourhood), concord with semantics, invariance to background (region dependent querying) and robustness to noise (in large scale and in real time). A design by incorporating various methods had shown in Figure 4.like supervised, semi or unsupervised learning, region based, global based similarity or both, segmentation based closeness matching and computation, using hypothetical, considering vectors or aggregate of features, deterministic closeness or fuzzy and computing closeness over linear space or non-linear regions. The mostly applied method to image retrieval is content based. It retrieves images depends on image content using image meta data or human attached meta data. But human annotation is a tough and time absorbing process, and hence the retrieval process has to be automated. By participating, the user in retrieval process to refine the image search by asking them to continuously mark each result as 'relevance' , 'irrelevant' or 'neutral' and this approach in CBIR systems is called relevance feedback method. To compare a given image with an image in database, the CBIR relies only on distance measure. It examines in contrast the nearness of two images in innumerable ranges such as texture, color, spatial locations and shape. Hence, zero value of distance measure means perfect match of images with the given query by considering the above said dimensions. If it is higher than zero, then different types of similarities will exit between images.
The major categories of CBIR gaps defined as: A disparity between the low level constituents are mined by computers from the image and the high level understanding of human image cognition is called semantic gap. A disparity between the methods of image objects capturing and object that present in the real world is called sensory gap. A disparity between the levels of CBIR integration with general purpose image retrieval system is known as integration gap. Automation of feature extraction gap is an algorithm generated gap. The catalogue of various gaps is discussed in 49 . By representing, how ontology helps in image retrieval by shrinking the semantic inequality among the low level, high level features will provide a better solution for the CBIR system. A signature constructed similarity search in CBIR is found in literature.

Ontology Assisted Image Retrieval
Ontology means a particular explanation of a conceptualization. It projects a domain in a proper way of representation. In old era, image tags with texts are only used in web image retrieval. There are some text dependent image retrieval systems are already available for the web such as Google and Yahoo.Those machines use text features like file names as indices for searching images in the web. Numerous image retrieval engines are under construction. The low level descriptors of these engines are remote from semantic concepts. Except those systems only relies on human-annotations. Hence, there is a necessary to define and middle approach to image understanding. Few systems may define a particular domain using domain experts by detecting vocabularies used to define objects of interest. The most desirable thing in image retrieval is domain-independent visual concept ontology. This type of ontology driven description supports automatic recognition based on image processing techniques. The visual concept ontology is described in 26 .
The ontology driven knowledge acquisition is necessary for building the visual concept ontology. In this, a domain is specified using a tree structure with class hierarchy of its sub elements at each level. We can take an example domain in Medical Pathology or biological organisms.
In ontology based retrieval, the knowledge gathering process is done as follows. The visual idea ontology in Figure 5. has three parts such as Color, Spatial Temporal and texture concepts. The architecture of ontology applied image retrieval process is given in Figure 6. The image feature extraction done by computer will result with meaningful image concepts. These concepts may be color, texture, shape or spatial locations. Mapping those one or more resultant concepts into ontology will interpret the conceptual meaning of an image. If the retrieval query captures the actual users' intention from this ontology representation, it will definitely reduce the semantic disparity between man and machine. In an intelligent image retrieval process, different type of indexing schemes has been applied starting from text based, keyword annotated, field based, structure based, content based to ontology based. Still, image retrieval is in its infant stages only because of the semantic disparity.  The image context description could be framed using ontology from the above said image concepts. Applying Description Logics (DL), the knowledge representation could be formed. The DAML (DARPA Markup Language) and OIL (Ontology Interface Language) are used for this implementation which is available with OWL (Web Ontology Language). Rules for describing relation between image features in ontology can be defined using the DL also. Once the concept ontology in Figure  7.andFigure 8.been framed (for example spatial ontology), the similarity matching of user query with extracted image feature is estimated through the ontology hierarchy. This provides more closeness to user query with images in database. There are some tools have been developed namely "OntoVis 50-57 " which perform three tasks namely domain knowledge acquisition, ontology driven visual acquisition and image example management. The benefit of using visual idea ontology is to fill the semantic gap as much as possible between low and high level concepts.

Figure 7.
Ontological concept description in image interpretation.

The Ontology Formation Model Using Protégé
The prototype model is used here because ontology formation cannot go with design and implementation phase as in Figure 9. The initial step in ontology framing is knowledge acquisition and hence the domain knowledge could be acquired from domain experts. Then it is refined by clients with the help of their feedbacks. For complex ontology's the user acceptance is the main one. In this formation process, the top level classes' selection could be done using RDFS. These classes should tell what image context is described and the remaining ontology should tell how it is described. The higher level class alone will not describe a total image, but it should serve as a whole to annotate and retrieve them.

Discussion and Conclusion
The data on web got exponential increase from bits to Big Data in recent years. Clicking each result of web is time consuming process and for an intolerant user who needs best results with little work. The ranked results do not serve the purpose if the user intents to make further clicks on the result to find the best one; providing O-A-V triplets which are semantically extracted for every web link will afford the user with treasured insight by saving their time. The aim of using this O-A-V representation not only provides the semantic relations of objects but also helps us to integrate and share data among different web resources. The web agents can easily use this information which is in machine interpretable form by performing compound operations and deliver the users with best search results. Even though this method shows an improvement in the search results than the current data retrieval methods, a self-governing benchmark standard is required for assessing semantic based search systems. The absence of such fashionable and particular benchmarks created it difficult for the proposed system to assess it precisely. Presently, the web related document retrieval system is in its most embryonic state. In future, the proposed model will try to enhance the semantic relations which are intra within web page into inter web relations along with the amalgamation of data using the ontology's of the aimed web resources. The algorithm developed by the proposed model determines the most appropriate triplets which should be showed with every web link. It affords a facility of catching the mind set and probing patterns of users by emerging ontology's which improves the search practice. The ontology not only connects the web resources but also helps to authenticate the classification groups in which an entity pertaining to use the resource of web. Also by applying a proximity calculation of keywords based scoring method which is one among many techniques and that can be used for ranking the web pages.
The classical research interest in image retrieval was done with CBIR, low level feature extraction, etc. In CBIR technique, the low level image features could not always represent high level semantic perceptions in the users' mind. Hence, the CBIR schemes should provide supreme provision in linking the 'semantic gap' among low level visual features and the fruitfulness of human semantics. This research work delivers a wide-ranging survey of current work in the direction of tapering down the 'semantic gap' . The high level semantics can be combined with CBIR system for shrinking the semantic gap using some computer vision and machine learning techniques. Different solutions to lessening the semantic gap may be by using ontology representation of image objects that will interpret the high level ideas in user mind, machine learning approaches which combine query ideas of user with basic level extracted features of an image, getting the relevance feedback of users to transparent their real expectation, providing a meaningful template for user to enhance the high level interpretations of user expectations, intermix the proofs from and the visual web image retrieval and HyperText Markup Language (HTML) text.
Although there is no generic and automated algorithm for image retrieval without the above said gaps. The argument of CBIR with high level imports and classical systems with low level features, this work gives an idea for 'semantic gap' reduction using ontology shown in Figure 10. To implement fully automated image retrieval system with high level semantics requires some type of visual concept ontology to be built for low level feature extraction and map them into high level semantics.