Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 29, Pages: 1-8
G. Johanna Johnsi Rani 1* , D. Gladis2 , Joy John Mammen3 and Marie Therese Manipadam4
1 Department of Computer Science, Madras Christian College, Tambaram, Chennai - 600 059, India; [email protected]
2 Department of Computer Science, Presidency College, Chennai - 600 005, India; [email protected]
3 Department of Transfusion Medicine, Christian Medical College, Vellore - 632 004, India; [email protected]
4 Department of Pathology, Christian Medical College, Vellore - 632 004, India; [email protected]
Breast Cancer is the prime cause of death in Indian women. Hospitals in India use electronic means of collection and reporting of data. One such report is the Pathology report which has natural language narrations of the conditions of patients. This work aims to extract the details on Tumour (T) in the breast using pattern-matching rules and derive the pathological classification of T by applying the PTNM classification protocol by American Joint Committee on Cancer (AJCC). Information Retrieval (IR), Natural Language Processing (NLP) tasks and Information Extraction (IE) techniques are applied to develop an automated system to accomplish the task. The system analyzes the extracted and the classified values of T against the Gold Standard Values, which are derived by manual scrutiny of the reports. The evaluation of the performance of the automated system performed using three sets of Pathology reports, resulted in an average Precision of 86%, Recall of 82.7%, Specificity of 75.1% and Accuracy of 79.53%.
Keywords: Breast Cancer, Information Extraction, Natural Language Processing, Pathology
Subscribe now for latest articles and news.