New and emerging variants of SARS-CoV-2 virus continue to pose a threat to the health of populations across the globe. Until January 2022, there have been more than 6,000 mutations in the spike gene of the SARS-CoV-2.
Early prediction for emergence of new strains is critical for pandemic preparedness.
Most of the currently available predictive models are based on the reported infections and deaths.
But now researchers have come up with Strainflow Model. It is a supervised predictive model using features of SARS-CoV-2 genome sequences.
Strainflow Model
Earlier models do not incorporate features from the virus sequences in a predictive manner.
Strainflow, plugs this gap by taking a sequence-driven approach to predict future surges using a novel artificial intelligence pipeline.
This study was based on a simple hypothesis — virus sequences can be treated as documents that can be read like a book by natural language understanding (NLU) models. Further, the models can discover the underlying “grammar” patterns which are causally predictive of future surges.
Thus, Strainflow is a genomic surveillance model for SARS-CoV-2 genome sequences.
Here, sequences are treated as documents with words (codons) to learn the codon context of 0.9 million spike genes using the skip-gram algorithm.
The team experimented with several NLU models optimised for efficiently learning the “grammar of Spike gene”.
The best model compressed the viral sequences in 36 dimensions. Each of these 36 dimensions is a different cocktail mix of codon level relationships. Some of these 36 cocktail mixtures may encode the patterns that make the virus spread faster.
Time series analysis of the information shows their leading relationship with the monthly COVID-19 cases for seven countries (e.g., USA, Japan, India, and others).
And Machine Learning modeling can help develop an epidemiological early warning system for the COVID-19 caseloads.