• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2016, Volume: 9, Issue: Special Issue 1, Pages: 1-6

Original Article

Sentence Clustering: A Comparative study


Clustering is one of the important steps in most of the text mining/information retrieval task like text summarization, domain identification etc. We are working on automatic abstractive text summarization where we require finding out tightly coupled sentences that could be merged and compressed to generate compact abstract summary. For our pipeline we require a clustering technique that does not take no. of clusters to be formed in advance as input.Therefore, we studied two important clustering techniques density based DBSCAN and graph based Markov Clustering Algorithm (MCL) in association with some sentence level relationships. Both the clustering techniques do not require no. of clusters to be formed in advance which is needed to generate summary of a text document without any intervention.Evaluation of sentence clustering is done using purity metric. Purity of both the sentence clustering technique is compared with baseline K-means clustering technique.MCL with some sentence level features performs better than others and fits into our pipeline.
Keyword: Clustering, DBSCAN, Markov Clustering, Transition Relationship, Anaphoric Relationship


Subscribe now for latest articles and news.