• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2022, Volume: 15, Issue: 23, Pages: 1124-1132

Original Article

Named Entity Recognition for Sheko Language Using Bidirectional LSTM

Received Date:20 March 2022, Accepted Date:09 May 2022, Published Date:24 June 2022

Abstract

Objectives: This study aims to advance Sheko language name entity Recognition first of its kind. Named Entity Recognition (NER) is one of the most important text processing in machine translation, text summarization, and information retrieval. Sheko language named entity recognition concerns in addressing the usage of the bidirectional Long Short-Term Memory (LSTM) model in recognizing tokens into predefined classes. Methods: A bidirectional long shortterm memory is used to model the NER for sheko language to identify words into seven predefined classes: Person, Organization, Geography, Natural Phenomenon, Geopolitical Entity, time, and other classes. As feature selection plays a vital role in long short-term memory framework, the experiment is conducted to discover the most suitable features for Sheko NER tagging task by using 63,813 words to train and test our model. Out of which is 70% for training and 30% for testing. Datasets were collected from Sheko Mizan Aman Radio Station (MARS), Sheko southern region mass media, Language, and Literature Department. Findings: Through several conducted experiments, Sheko NER has successfully achieved a performance of 97% test accuracy. From the experimental result, it is possible to determine that tag context is a significant feature in named entity recognition and classification for Sheko language. Finally, we have contributed a new architecture for Sheko NER which uses automatically features for Sheko named entity recognition which is not dependent on other NLP tasks, and added some preprocessing steps. We provide a comprehensive Comparison with other traditional NER algorithms.

Keywords: Named Entity Recognition; Sheko language; Recurrent Neural Network; Bidirectional layer; embedding layer

References

  1. Patil N, Patil A, Pawar BV. Named Entity Recognition using Conditional Random Fields. Procedia Computer Science. 2020;167:1181–1188. Available from: http://dx.doi.org/10.1016/j.procs.2020.03.431
  2. Muralikrishna H, Sapra P, Jain A, Dinesh DA. Spoken Language Identification Using Bidirectional LSTM Based LID Sequential Senones. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019;p. 320–326. Available from: https://doi.org/10.1109/ASRU46091.2019.9003947
  3. Elfaik H, Nfaoui EH. Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text. Journal of Intelligent Systems. 2020;30(1):395–412. Available from: http://dx.doi.org/10.1515/jisys-2020-0021
  4. Abafogi A. Boosting Afaan Oromo Named Entity Recognition with Multiple Methods. Int. J. Inf. Eng. Electron. Bus. 2021;13(5):51–59. Available from: http://dx.doi.org/10.5815/ijieeb.2021.05.
  5. Tran Q, Mackinlay A, Yepes AJ. Named Entity Recognition with stack residual LSTM and trainable bias decoding. 2019;p. 566–575. Available from: http://arxiv.org/abs/1706.07598
  6. “sheko-hellenthal-0328 | Endangered Languages Archive.”. “sheko-hellenthal-0328 | Endangered Languages Archive.” Accesed date april. Available from: https://www.elararchive.org/uncategorized/SO_9306e3ab-c010-4f03-bacc-9a139ec2ce43/ (accessed )
  7. Bose P, Srinivasan S, Sleeman WC, Palta J, Kapoor R, Ghosh P. A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts. Applied Sciences. 2021;11(18):8319. Available from: https://doi.org/10.3390/app11188319
  8. Zhang N, Xu G, Zhang Z, Li F. MIFM: Multi-Granularity Information Fusion Model for Chinese Named Entity Recognition. IEEE Access. 2019;7:181648–181655. Available from: https://doi.org/10.1109/ACCESS.2019.2958959
  9. Wu G, Tang G, Wang Z, Zhang Z, Wang Z. An Attention-Based BiLSTM-CRF Model for Chinese Clinic Named Entity Recognition. IEEE Access. 2019;7:113942–113949. Available from: https://doi.org/10.1109/ACCESS.2019.2935223
  10. Cho M, Ha J, Park C, Park S. Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition. Journal of Biomedical Informatics. 2020;103:103381. Available from: https://doi.org/10.1016/j.jbi.2020.103381
  11. Chen T, Xu R, He Y, Wang X. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 2019;72:221–230. Available from: https://doi.org/10.1016/j.eswa.2019.10.065
  12. Habibi M, Weber L, Neves M, Wiegandt DL, Leser U. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics. 2017;33(14):i37–i48. Available from: https://doi.org/10.1093/bioinformatics/btx228
  13. Ye N, Qin X, Dong L, Zhang X, Sun K. Chinese Named Entity Recognition Based on Character-Word Vector Fusion. Wireless Communications and Mobile Computing. 2020;2020:1–7. doi: 10.1155/2020/8866540
  14. Agrawal A, Tripathi S, Vardhan M, Sihag V, Choudhary G, Dragoni N. BERT-Based Transfer-Learning Approach for Nested Named-Entity Recognition Using Joint Labeling. Applied Sciences. 2022;12(3):976. Available from: https://doi.org/10.3390/app12030976
  15. Cao S, Lu W, Zhou J, Li X. CW2Vec: Learning Chinese word embeddings with stroke n-gram information. 32nd AAAI Conf. Artif. Intell. 2019;p. 5053–5061. Available from: https://doi.org/10.1016/j.eswa.2019.10.003

Copyright

© 2022 Ashebir & Gantela. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.