• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2022, Volume: 15, Issue: 41, Pages: 2129-2142

Original Article

Cluster Predictive Model Using Affinity Propagation Algorithm to Group Mushroom 5.8s rRNA Sequences

Received Date:31 July 2022, Accepted Date:02 October 2022, Published Date:03 November 2022

Abstract

Background: The main emphasis of the article is biological information on a distinct species of mushroom (Phylum Basidiomycota) data collection of 5.8s rRNA sequences. Macrofungi from the phylum Basidiomycota are predominantly used as therapeutic mushrooms in several countries. During the rainy season, hundreds of macrofungal basidiocarps were discovered in Tamilnadu. The internal transcribed spacer (ITS) and 5.8S rRNA gene sequence markers, which have been collected from NCBI, were used to isolate at least thirty of these strains that fall under the Basidiomycota kingdom (suborders of Polyporales, Hymenochataeles, and Russuales), which have the therapeutic properties of the Basidiomycota kingdom. Objectives: This article’s main objective is to organise the sequences according to similarity utilising multiple sequence alignment and an algorithmic perspective. Methods: In this paper, we use 30  30 pairwise similarity matrix data of these thirty 5.8s rRNA mushroom sequences obtained using the clustal omega tool to develop an affinity propagation approach. As a continuation of earlier work, this will be evaluated against k-means, hierarchical clustering based on the ideal cluster, and time and space complexity. Findings: The affinity propagation algorithm typically discourages providing the initial number of clusters; therefore, the optimal number of cluster values and grouping of clustered results obtained from the affinity propagation algorithm are also the same as the results obtained from the previous existing research work using the kmeans, hierarchical agglomerative clustering algorithm. Novelty: The overall suggested technique involves applying the cluster validation metrics Silhouette score, Calinski-Harabasz Index, and Davies-Bouldin Index methodologies to find the ideal number of clusters. The CD-hit Clustering tool does not offer these metrics, and the Cluster Omega tool does not support this kind of extension work. This follow-up work assists bioinformatics researchers in obtaining favourable results by utilising the existing software prior to working in wet laboratories; rather than wasting a lot of chemical resources, this result will open the door for a targeted approach.

Keywords: Affinity propagation; Cluster metrics; Kmeans; mushroom sequences; Bioinformatics; Data science; Silhouette score; Calinski-Harabasz index; Davies-Bouldin index

References

  1. Rajoub B. Chapter 3 - Supervised and unsupervised learning. Biomedical Signal Processing and Artificial Intelligence in Healthcare. 2020;p. 51–89. Available from: https://doi.org/10.1016/B978-0-12-818946-7.00003-2
  2. Kaur N, Virk U, Kumari. Genome Sequence Analysis of Lungs Cancer Protein WDR74 (WD Repeat-Containing Protein) International Journal for Research in Applied Science & Engineering Technology. 2022;10(5).
  3. Venegas CN. Identification of genomes: Clustal Omega and BLAST: One introduction. International Journal of Science and Research. 2022;6(2):26–29. Available from: https://doi.org/10.30574/ijsra.2022.6.2.0154
  4. Katoh K. Multiple Sequence Alignment. (1). (pp. 1-321) 2021.
  5. Asraf SS, Sivakkanni A, Sneha M, Janani S, Jashin P, Jemimal AM. In Silico Based Bioinformatics Project During the COVID-19 Lockdown Period: An Alternative to Wet Lab Study. Journal of Engineering Education Transformations. 2022;35(3):82–87. Available from: https://dpo.org/10.16920/jeet/2022/v35i3/22090
  6. Roknabadi S, Sadatabdosalehi A, Pouyamehr F, Koohi S. An accurate alignment-free protein sequence comparator based on physicochemical properties of amino acids. Scientifc Reports. 2022. 2022;12:11158. Available from: https://doi.org/10.1038/s41598-022-15266-8
  7. Gonçalves RS, Musen MA. The variable quality of metadata about biological samples used in biomedical experiments. Scientific Data. 2019;6(1):190021. Available from: https://doi.org/10.1038/sdata.2019.21
  8. Wang Y, Peng Q, Pei Z, Ma M, Chen Y, Leng C, et al. Detection of Social Groups in Class by Affinity Propagation. 2019 IEEE International Conferences on Ubiquitous Computing & Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS). 2019;p. 484–489. Available from: https://doi.org/10.1109/IUCC/DSCI/SmartCNS.2019.00106
  9. Marin FR, Dávalos A, Kiltschewskij D, Crespo MC, Cairns M, Andrés-León E, et al. RNA-Seq, Bioinformatic Identification of Potential MicroRNA-like Small RNAs in the Edible Mushroom Agaricus bisporus and Experimental Approach for Their Validation. International Journal of Molecular Sciences. 23(9):4923. Available from: https://doi.org/10.3390/ijms23094923
  10. Wang X, Xu Y. An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Materials Science and Engineering. 2019;569(5):052024. Available from: https://doi.org/10.1088/1757-899X/569/5/052024
  11. Wu Z, Zhang Y, Zhang JZ, Xia K, Xia F. Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes. Journal of Computational Chemistry. 2019;41(1):14–20. Available from: https://doi.org/0.1002/jcc.26070
  12. Simić S, Villar JR, Calvo-Rolle JL, Sekulić SR, Simić SD, Simić D. An Application of a Hybrid Intelligent System for Diagnosing Primary Headaches. International Journal of Environmental Research and Public Health. 1890;18(4):1890. Available from: https://doi.org/10.3390/ijerph18041890
  13. Ünlü R, Xanthopoulos P. Estimating the number of clusters in a dataset via consensus clustering. Expert Systems with Applications. 2019;125:33–39. Available from: https://doi.org/10.1016/j.eswa.2019.01.074
  14. Wang W, Ma Q, Liu Y, Yao N, Liu J, Wang Z, et al. Clustering analysis method of power grid company based on K-means. Journal of Physics: Conference Series. 2021;1883(1):012072. Available from: https://doi.org/10.1088/1742-6596/1883/1/012072
  15. Morales F, García-Torres M, Velázquez G, Daumas-Ladouce F, Gardel-Sotomayor PE, Vela G, et al. Analysis of Electric Energy Consumption Profiles Using a Machine Learning Approach: A Paraguayan Case Study. 2022;11(2):267. Available from: https://doi.org/10.3390/electronics11020267
  16. Mughnyanti M, Efendi S, Zarlis M. Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation. IOP Conference Series: Materials Science and Engineering. 2020;725(1):012128. Available from: https://doi.org/10.1088/1757-899X/725/1/012128
  17. Yudhistira Arie Wijaya DA, Kurniady E, Setyanto W, Sanur Tarihoran D, Rusmana R, Rahim. Davies Bouldin Index Algorithm for Optimizing Clustering Case Studies Mapping School Facilities. TEM Journal. 2021. 3(0):1099–1103.
  18. Punhani R, Arora VPS, Sabitha AS, Shukla VK. Segmenting e-Commerce Customer through Data Mining Techniques. Journal of Physics: Conference Series. 2021;1714(1). Available from: https://doi.org/0.1088/1742-6596/1714/1/012026
  19. Sari PK, Purwadinata A. Analysis Characteristics of Car Sales In E-Commerce Data Using Clustering Model. Journal of Science and Its Application. 2019;2(1):19–28. Available from: https://doi.org/10.21108/jdsa.2019.2.19
  20. Umam MWF, Fatekurohman M, Anggraeni D. Hybrid clustering and classification methods to find out the pattern of the spread of covid-19 in East Java province. Journal of Physics: Conference Series. 2022;2157(1):012030. Available from: https://doi.org/10.1088/1742-6596/2157/1/012030
  21. Shahapure CKR, Nicholas. Cluster Quality Analysis Using Silhouette Score. IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). 2020;2020. Available from: https://doi.org/10.1109/DSAA49011.2020.00096
  22. Naghizadeh A, Metaxas DN. Condensed Silhouette: An Optimized Filtering Process for Cluster Selection in K-Means. Procedia Computer Science. 2020;176(176):205–214. Available from: https://doi.org/10.1016/j.procs.2020.08.022
  23. Sudhasini P, Ashadevi B. Pairwise Sequence Alignment Similarity Score Prediction on Mushroom Biological data. International Journal of Advanced Science and Technology. 2020;29(4s):1844–1867. Available from: http://sersc.org/journals/index.php/IJAST/article/view/6993
  24. Masoodi F, Quasim M, Bukhari S, Dixit S, Alam S. Applications of Machine Learning and Deep Learning on Biological Data (1). Auerbach Publications. Taylor & Francis. CRC press. 2023

Copyright

© 2022 Sudhasini & Ashadevi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.