Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: Special Issue 1, Pages: 1-6
Deepak Sahoo and Rakesh Balabantaray*
Department of Computer Science and Engineering, IIIT, Bhubaneswar -751003, Odisha, India; [email protected]
*Author for correspondence
Department of Computer Science and Engineering
Clustering is one of the important steps in most of the text mining/information retrieval task like text summarization, domain identification etc. We are working on automatic abstractive text summarization where we require finding out tightly coupled sentences that could be merged and compressed to generate compact abstract summary. For our pipeline we require a clustering technique that does not take no. of clusters to be formed in advance as input.Therefore, we studied two important clustering techniques density based DBSCAN and graph based Markov Clustering Algorithm (MCL) in association with some sentence level relationships. Both the clustering techniques do not require no. of clusters to be formed in advance which is needed to generate summary of a text document without any intervention.Evaluation of sentence clustering is done using purity metric. Purity of both the sentence clustering technique is compared with baseline K-means clustering technique.MCL with some sentence level features performs better than others and fits into our pipeline.
Keyword: Clustering, DBSCAN, Markov Clustering, Transition Relationship, Anaphoric Relationship
Subscribe now for latest articles and news.