Machine Learning-Based Classification of COVID-19 and Influenza Variants Using Nucleotide Frequency Analysis Thaliah Fauz Ardamayanti (a*), Mohammad Isa Irawan (a), Ridho Nur Rohman Wijaya (a)
a) Department of Mathematics, Institut Teknologi Sepuluh Nopember
Jalan Raya ITS Sukolilo, Surabaya 60111, Indonesia
*thaliahfauz[at]gmail.com
Abstract
This research aims to classify COVID-19 and influenza variants using machine learning based on nucleotide frequency analysis. The dataset consists of COVID-19 and influenza DNA sequences. We calculate the nucleotide frequencies in each sequence, then we train the features using machine learning algorithms. The experimental results show that using nucleotide frequencies as a feature provides good results in classifying COVID-19 and influenza variants, achieving the highest accuracy score of 89.99%. This discovery has significant implications for developing diagnostic tools and the epidemiological monitoring of COVID-19. The classification results enable quick and accurate identification, facilitating timely clinical decision-making and aiding in the control of disease spread. The research emphasizes the essential function of machine learning in genomic analysis and the detection of viral variants, with nucleotide frequency proving to be an informative feature for distinguishing between different variants.
Keywords: Classification, COVID-19, DNA Sequences, Influenza, Machine Learning, Nucleotide Frequency