• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: 29, Pages: 1-8

Original Article

Tumour Classification and Analysis from Breast Cancer Pathology Reports using Natural Language Processing


Breast Cancer is the prime cause of death in Indian women. Hospitals in India use electronic means of collection and reporting of data. One such report is the Pathology report which has natural language narrations of the conditions of patients. This work aims to extract the details on Tumour (T) in the breast using pattern-matching rules and derive the pathological classification of T by applying the PTNM classification protocol by American Joint Committee on Cancer (AJCC). Information Retrieval (IR), Natural Language Processing (NLP) tasks and Information Extraction (IE) techniques are applied to develop an automated system to accomplish the task. The system analyzes the extracted and the classified values of T against the Gold Standard Values, which are derived by manual scrutiny of the reports. The evaluation of the performance of the automated system performed using three sets of Pathology reports, resulted in an average Precision of 86%, Recall of 82.7%, Specificity of 75.1% and Accuracy of 79.53%.
Keywords: Breast Cancer, Information Extraction, Natural Language Processing, Pathology


Subscribe now for latest articles and news.