The novel coronavirus (SARS-CoV-2) was detected for the first time in China’s Hubei province’s city of Wuhan in December 2019
However, in real-world situations, considering risk over multiple periods is more realistic and more interesting because we can observe the dynamic nature of the problem. COVID-19 typically spreads as "waves" of infection. It emphasizes the importance of having a multi-period analysis and being myopic is not enough. A multi-period analysis of risk poses technical challenges
New variants or sub-lineages of the SARS-CoV-2 virus could contain mutations that distinguish them from the reference sequences or predominant types already circulating in the population. In contrast with previous or currently circulating viruses, different strains of SARS-CoV-2 can have different characteristics, including the ability to spread more easily, exhibit resistance to existing treatment options, and have no impact compared to previous strains
Researchers, clinicians, health-care workers, epidemiologists, and decision makers face a formidable challenge as a result of this pandemic. As a part of this global effort, BMC Medical Research Methodology has set up a collection of articles called "Methodologies for COVID-19 research and data analysis". Rapid assessment of a dynamic research area such as COVID-19, where the body of evidence has grown at an impressive rate, requires an approach that is more direct and has a wider scope than the current gold standard methods, such as scoping and systematic reviews. Many research articles have discussed the potential use of machine learning and artificial intelligence in the fight against the COVID-19 crisis
Machine learning (ML) is a part of artificial intelligence, where algorithms analyze large data sets to detect patterns and make predictions. In the healthcare industry, machine learning algorithms have a lot of potential because there is a lot of data generated for each patient. Therefore, it's no surprise that there are multiple successful applications of machine learning in the healthcare industry
• Classification
• Recommendations
• Clustering
• Prediction
• Anomaly detection
• Automation
• Ranking
Since SARS-CoV-2 mutation rates are high, developing a vaccine to protect against its many subtypes poses a huge challenge to public health worldwide. If an existing vaccine cannot protect against the virus, a new vaccine must be developed
The article explains the method of creating datasets, the best policy resolutions, how the CORD-19 method has been applied, and how the dataset has been used to build shared tasks. In order to find efficient diagnosis and treatment policies for COVID-19, researchers will continue to bring together the computing community, biomedical experts, and policymakers. Among the many tools that can be focused on contesting the COVID-19 epidemic, automated text analysis stands out as a tool that cannot provide a full solution. Additionally, this work illustrates the value of public access to full-text literature, since it allows users to interact with and discover texts through computational access
COVID-19 can currently only be treated with comprehensive support. In mild cases, patients can often return to full health after appropriate clinical treatment. In order to determine what happened during cancer treatment, this study will analyze patients' blood parameters before and after their complete remission. The paper examines four potential mechanisms leading to lymphocyte deficiency. Initially, the virus may directly attack those cells, killing them. Infection of lymphocytes may be facilitated by the presence of the coronavirus receptor ACE2. Viruses can also attack the lymphatic system directly. There is evidence of acute lymphocyte decline, which may be due to dysfunction in the cells themselves. It might also be due to a direct invasion by coronavirus, which attacks organs like the thymus and spleen
Clinical research companies have been under tremendous pressure since the COVID-19 pandemic broke out. A redistribution of resources and a temporary suspension of face-to-face visits will inevitably restrict research in other therapeutic areas. The clinical research center will screen participants for COVID-19 symptoms, fever, and exposure history before admitting them to the study. Participants can choose to visit remotely, when possible. Masks and physical distance are required for face-to-face interactions, and visitors are not allowed. SARS-CoV-2 infections may cause lower or higher blood pressure in hypertensive patients, worsen diabetes control in diabetics, or accelerate kidney disease progression in chronic kidney disease patients. A delay in treatment due to fear of contracting the virus can result in acute illnesses, hospitalization, and even death. Researchers, coordinators, and clinicians rekindle a sense of perseverance and a mission to use science to solve problems that are important to patients and the public
In this methodical assessment and meta-analysis, we searched for LitCOVID and Embase. Studies published before January 1, 2021 include at least 100 patients. The study sample was collected between the ages of 17 and 87 years old. From observation, it has been estimated that more than 80% of patients infected with SARS-CoV-2 have one or more symptoms, such as fatigue, headache, attention deficit, hair loss, and breathing difficulties
In this Research article, we have applied these Machine learning techniques to predict the COVID-19 mutation through the classification of the J48 and Linear Regression prediction algorithm. The details about the COVID-19 variant mutation data set from WHO have been included. The classification models J48 and Linear Regression are used to predict further mutation of COVID-19 variants. A significant number of results have been achieved through J48 and REP Tree in areas of representing, utilizing, and learning statistical knowledge. Using trained data, which we have collected from the World Health Organization web portal, is our test option. With this interface, all algorithms could be run at the same time, and the results could be compared. Detailed explanations of the algorithms applied to process COVID-19 variant data sets are given in the section below.
Previously conducted COVID-19 related studies used data that was collected from a variety of sources. The prediction of further mutation of SARS-CoV-2 is not discussed in earlier studies using the Artificial Intelligence tool of Machine Learning. The reason for applying the Weka tool is it has the facility of extending features to diagnose and predict different diseases. In addition, medical practitioners and researchers can expand their research activities with cost-effective and time-saving options. It can also help in solving the problems of clinical research using different applications of Weka. Another advantage of using Weka for the prediction of diseases is that it can easily diagnose any kind of disease with the necessary dataset.
A dataset is an instance of statistical data in which every attribute of the data represents a variable. Every instance has its own description. In order to compare the accuracy of algorithms with Weka tools for prediction of COVID-19 mutations, we used Mutation data for the prediction and classification of algorithms. For classification and accuracy of further Coronavirus mutation prediction, we used 6 attributes and 98282 instances from 120 countries. We have applied different J48, Rep Tree algorithms with the WEKA data mining tool for our data analysis purpose. This study focused on the disease (COVID-19), rather than the virus, as well as the prediction of a vulnerability to further mutation.
The data pool is collected from the WHO web portal and European Union official website, which is mentioned in
(
The machine learning algorithm can be used to identify the effects of mutations to determine why mutated variants of SARS-CoV-2 spread more rapidly. By doing this, we could identify mutation variants that are concerning before they spread and notify the reply by medical authorities. Applying machine learning models for classification, the study sought to find prognostic factors for predicting COVID-19 mutation. In preprocessing, data are cleaned, missing data is replaced, data is transformed, and data imbalances are reduced. In this study, 98232 records and 120 nations' mutation data were included, as shown in
It is a process of selecting the best subset of the relevant variable for use in the prototypical structure. It selects a suitable classification for prediction accuracy. Perfect variable selection helps to evaluate the effectiveness of included variables in the training dataset. The dataset and its description, which we have derived from the WHO portal are shown in
|
|
Location |
Name of Country |
Date |
Mutation date |
Variant |
COVID-19 Variant |
num_sequences |
Mutation sequence |
Perc_sequences |
Percentage of Mutation |
num_sequences_total |
Total sequence |
The predictive classifier models were developed to accurately identify COVID-19 variant mutation. The classification model J48 and Linear Regression is used to develop prediction models. We considered these models due to their following characteristics. It is a widely and most commonly used method of empirical analysis in sociology, bio-statistics, clinical medicine, quantitative psychology, econometrics, marketing, and often uses to compare with machine learning studies. It has many advantages including high power and accuracy. The J48 classifier is built independently by applying the general technique of bootstrap aggregating and is a selected sample for the training set. Still this J48 method is dominant model used to the prediction with a degree of certainty. The validation results from j48 classifier models were then combined to provide a measure of the overall performance.
Prediction statistical analysis variables were presented as the mean, standard deviation which is analyzed by the J48 classifier in addition to the Linear Regression. The performance of classification models to predict COVID-19 mutation prediction was calculated by the Receiver Operating Characteristic (ROC). We also calculated the accuracy (AC), Kappa statistic, Mean Absolute Error (MAE) Root mean squared error (RMSE), Relative absolute error (RSE). Weka (V.3.9.5) were implemented to build classifier models. The Weka tools consist a collection of graphical user interface and a visualization for easily performing algorithms.
A disaster has struck with the COVID-19 pandemic. It affects a large number of people. In places where there is a severe pandemic, how food, medicines, masks, and other necessities can be delivered in a timely manner can save lives. As there may be city lockdowns, there may also be obstructions to transportation. Next, it is important to protect the health of service agents, volunteers, and logistics workers. We know that COVID-19 virus is highly infectious and spreads rapidly. Keeping the people who help safe is also crucial. The third challenge is how to ensure that healthcare products, vaccines and drugs are delivered safely to the people in need while operating smoothly through the respective supply chains. As such, designing the logistics system in question is a very important research topic, which should be explored in more depth in the future. The use of different vaccine techniques cannot control COVID-19, due to its variants, which have been identified from 2019 onwards. This study is to use machine learning techniques to analyze how the COVID-19 variants are changing regularly, as well as their effect on humans.
In the healthcare sector, Machine learning algorithms have been implemented in many applications. For example, it can be used to automate time-consuming and complex tasks within this field. Today, artificial intelligence-based algorithms provide faster processing power and are used in a variety of health care fields, including drug discovery, disease prediction, improved therapeutic diagnosis, etc. We applied machine learning algorithms to predict further mutation situations and vulnerability statistics based on the COVID-19 variant dataset, which received data from 120 nations and 98282 instances. Our goal was to analyze the data with classification models and predict further mutations and their effects. The J48 classifier was chosen because it performs better with a large number of datasets with a variety of attributes. Moreover, the J48 algorithm stimulated the enactment of the data without any obscuring effects on selected attributes. The performance metrics after attribute selection, parameter tuning, and calibration are used because this is a standard process for evaluating algorithms. The classification results achieved correctly were 8979 instances, i.e. 9.1406%, and the incorrectly classified instances were 89253, i.e. 90.8594%. A total of 98282 instances were involved in the process of classifying the results. According to further statistical analysis, the kappa statistic output was 0.0519, the mean absolute error was 0.0769, the root mean squared error was 0.01961, the relative absolute error was 96.2989%, and the root relative squared error was 98.132%.
The purpose of this work is to predict the chances of further mutation and its vulnerability. We have taken datasets with different performance measures using machine learning. All data were preprocessed and used for test prediction. The highlight of the resulting output is given below.
In this study, the accuracy of COVID-19 variants was classified using J48 and Linear Regression models on different data sets, and the results were compared to arrive at a feasible solution and for cross-verification of the predicted result. The above algorithms used by us were applied to a prediction of the COVID-19 mutation situation. The data set details are mentioned in
The output of the Linear regression classifier result is shown in
We have selected the J48 classifier once again to identify and cross verify with the linear regression to predict any other further mutations. J48 Tree has been used in this paper to decide the target value based on the time to build the model, accurately classified instances and incorrectly classified instances, as well as the accuracy of all classifiers.
After the J48 prediction classifier, we analyzed results from the obtained classifier. The output gave several statistics based on a percentage split of 90% to make a prediction of each instance of the dataset.
In
We performed experiments on different numbers of the selected attributes of the COVID-19 mutation dataset. In addition, we have verified using various preprocess setups such as supervised and unsupervised with various metrics. Next, we have applied different classifiers using supervised learning, unsupervised learning, cross-validation and percentage split etc.,
Healthcare data has expanded rapidly in recent years, and machine learning makes it possible to analyze massive amounts of data quickly. Therefore, it is an opportunity to apply machine learning models to the care of individual patients in medical practice. There is an alarming spread of this new type of coronavirus, which largely evades immunity. Within just two weeks' time, between December 25, 2021 and January 7, 2022, it has affected 24,151,332 people in the world
Period |
No of Deaths |
Total deaths |
Average(Monthly) |
Up to 31 January 2020 |
266 |
266 |
- |
01-2-2020 to 31-6-2020 |
573125 |
573391 |
95565 |
01-7-2020 to 31-12-2020 |
1368371 |
1941762 |
228061 |
01-1-2021 to 31-6-2021 |
2045731 |
3987493 |
340955 |
01-7-2021 to 31-12-2021 |
1465986 |
5453479 |
244331 |
01-01-2022 to 31-3-2022 |
708524 |
6162003 |
236174 |
|
Covid-19 prediction models are rapidly entering the academic literature to support medical decision making at a time when they are urgently needed. Using unreliable predictions to guide clinical decisions could cause more harm than good, which is why methodological guidance should be followed. Predicting COVID-19 is not always as easy as a binary classification. It is important to deal with the complexity of the data. Some reviews excluded some of the data from analysis of earlier studies because participants had neither recovered nor died within that time period.
The present study has several limitations that need to be addressed. In the first instance, the information was collected from the WHO with the novelty of different viruses of COVID-19. SARS-CoV-2 is still causing a lot of discussion and confusion. Nevertheless, the World Health Organization itself is unable to identify how and from where the SARS-CoV-2 virus originated. In the next step, a Weka data mining tool was used to predict further mutation and vulnerability, however, the result differed from clinical and laboratory-based data experiments. Scientists, researchers, and clinicians would be able to produce better outcomes due to using physical datasets instead of the available datasets to make therapeutic decisions. The experiment that we have conducted fully relies on computer-based evaluation utilizing J48 and linear regression algorithms. We used a huge dataset of 98232 instances with six attributes, which suggests that a machine learning algorithm may be able to provide the best results. When we have a huge amount of data, it is difficult to validate. We analyzed a huge number of variables that were considered a sample size, even though most of the variables are not statistically significant because they are repeated in the dataset. Due to the large dataset, there is a possibility of producing nearly unbiased estimates of the prediction results. Finally, we used a classification approach for automatic machine learning variable integration, but a deep learning approach would have improved prediction.
The purpose of this paper is to predict further mutations of COVID-19 and determine its vulnerability. For the verification and validation of prediction results, we have used the J48 algorithm and the Linear Regression algorithm. These algorithms were then analyzed based on the accuracy they provided after running them in the output window using WEKA data mining. We analyzed the results based on accuracy after running these algorithms. On the basis of a given attribute value, model building time, mean absolute error, and ROC area, J48 and linear regression algorithms were used to analyze the COVID-19 variant data set.
Results show that the J48 algorithm predicted a ROC value of 0.591 and the mean region size was 83.7053. In contrast, linear regression predicted a correlation value of 0.27, and the root mean squared result was more than 1. We have concluded that 73% of further mutations are possible based on the prediction (1-0.27 = 0.73). It is likely to spread to more than 83% of the country (a mean region size of 83.7053) and has some favorable results regarding death rates. According to the WHO report, the total number of deaths as of 31/3/2022 was 6162003. A WHO report shows that the number of dead people is falling. The highest number of deaths was 2045731 between January-2021 and June-2021, the average was 340955. There were 212542 deaths recorded in December 2021 and March 2022 deaths were recorded as 162907, showing a better causality rate than in previous years. Our prediction result is based on the WHO database attribute only, and it has been evaluated by the prediction algorithm. We are not evaluating the data on a clinical basis, and some ambiguity may arise from computer-based data analysis. As a result, the machine learning prediction algorithm predicts that a further mutation is likely to have fewer casualties.
In the future, the research article should lead to more clinical-oriented experiments to optimize the predictive performance of these classifiers for COVID-19 virus-related diagnoses using other feature selection algorithms and optimization techniques. In addition, it may provide physicians with a better understanding of real-world clinical practice, which would let them better identify the vulnerabilities of novel coronavirus diagnoses and preventative measures.
This research is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R194), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.