Untitled attachment 00028

Big Data Analytics and AI in the War Against COVID-19

The completion of the 13-year long, US$3 billion Human Genome Project on 14 April 2003, an international scientific project to determine the base pairs which make up human DNA (deoxyribonucleic acid) and to map all the genes of the human genome; provided biotechnologists with valuable resources to help them understand diseases including the genotyping of specific viruses to direct appropriate treatment, as well as many other applications.

And, despite some existing challenges, big data analytics and artificial intelligence (AI) coupled with the proliferation of more affordable computing hardware today, has reduced the cost of genome sequencing efforts to a few hundred dollars and has cut the turnaround time to a few days.

Right now, scientists around the world are using such tools in their rush to find a vaccine or an anti-viral against the deadly Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus which has the (COVID-19) coronavirus disease pandemic, which at time of writing has infected 1,272,737 known persons, resulting in 69,418 deaths worldwide, including 3,662 known cases and 61 deaths in Malaysia.

The SARS-CoV-2 virus

Before we go any further, we need to understand how a virus infects human cells and how to stop them


Unlike bacteria which are complete, microscopic, single-cell, living organisms similar to the amoeba which can replicate itself by splitting into 2, 4, 8, 16 and so on amoebae as we had learned back in school, a virus on the other hand is comprised of genetic material at its core enclosed within a layer of  proteins. Since viruses cannot independently replicate themselves, they need to hijack complete living cells, such as human cells, release their genetic material into the cell, which then hijacks the cell and re-purposes it  to make hundreds or even thousands of copies of the virus; and at some point, they exit the cell, sometimes causing it to self-destruct, and the newly created viruses go on to infect other healthy cells and the process repeats, unless checked by our immune system or a suitable medicine.

This process is how computer viruses – i.e. fragments of binary code created by man, hijack the operating system or software in a computer to make the computer do what it wants and also it replicates itself by infecting the operating systems or software of other computers. Computers can be infected by viruses either if they either do not have antivirus software installed or have antivirus software installed but the latest virus signatures have not been updated or are still unavailable.

Some biological viruses have a fragment of DNA at their core, but in the case of SARS-CoV-2 shown in the infographic above, it has a coil of RNA (ribonucleic acid) at its core, enclosed within a membrane of lipids (fatty substance), from which glycoprotien spikes stick out, and it’s the molecules at the tips of these spikes which enable it to attach to receptors on the surface of human lung cells and hijack it to multiply itself.

At the same time, washing hands well with soap and water washes away any SARS-CoV-2 viruses, since soap attaches itself to lipids in the membrane and carries them away, thus also destroying the virus.

Since SARS-CoV-2 is a relatively new SARS virus strain, our cells do not have the ability to recognise it as an undesirable alien and deny it the opportunity to attach to and infect them or we do not yet have the anti-bodies which can neutralise or destroy them.


One solution is to develop a vaccine, which often consists of neutered SARS-CoV-2 or similar viruses which are injected into healthy people so that their immune system responds and develops the ability to recognise SARS-CoV-2 viruses which may get into their bodies sometime later, and to reject or destroy them. In computer terms, vaccination enables our immune system and cells to recognise the virus signature as malevolent and block or remove it.

A more preferable solution, especially for already infected patients, is an anti-viral drug which can either prevent the virus from hijacking our cells or destroy them.

In computer terms, an anti-viral is like a virus removal program which scans your computer’s memory and hard disk and cleans away any viruses or malware it finds.

Back to the SARS-CoV-2 virus, the development of a suitable vaccine and to get it approved for use to immunise healthy humans against the virus can take up to 18 months or even more.

The second solution, which doctors and scientists are researching into now is to develop an anti-viral to cure patients suffering from COVID-19, though such a new anti-viral will also have to go through a rather long and rigorous testing and approval process for use on humans.

Alternatively, researchers are also trying to identify existing approved cures for earlier viral or other kinds of diseases which will work to either cure COVID-19 patients or at least relieve them of some of its worst symptoms and help them recover, and this is where the use of bioinformatics and big data analytics comes into play.


Genetic Engineering & Biotechnology News (GEN) of 5 February 2020 reported that researchers at South Korea-based Deargen and Dankook University in collaboration with researchers at Emory University in the U.S., have published a prediction model for antiviral drugs that may be effective on SARS-CoV-2, on the bioRxiv preprint server in an article – “Predicting commercially available antiviral drugs that may act on the novel coronavirus (2019-nCov), Wuhan, China, through a drug-target interaction deep learning model”.

“It was purely out of scientific curiosity that we wanted to look at whether our AI model can suggest any drug that could be used against SARS-CoV-2,” said Keunsoo Kang, PhD, assistant professor at Dankook University and senior author on the paper, and that this was a “drug re-purposing” approach, to use existing anti-virals on another virus. So, “only those anti-viral drugs that are available on the market were presented from the raw results”.

The team used their pre-trained deep learning-based drug-target interaction model, Molecule Transformer-Drug Target Interaction (MT-DTI), to identify commercially available drugs that could act on viral proteins of SARS-CoV-2. MT-DTI is a self-attention-based deep learning model designed for predicting an affinity score between a drug and a protein.

The result showed that atazanavir, an antiretroviral medication used to treat and prevent the human immunodeficiency virus (HIV), is the most promising chemical compound.

Kang speculated that the high antiviral effects of atazanavir “may be explained through the MT-DTI results showing the highest inhibitory potency on the viral proteinase.” Kang asserted that they found it surprising that Deargen’s AI-based prediction supports previous research.

Meanwhile, researchers from Army Medical University in Chongqing, China, posted a bioRxiv preprint titled, “Therapeutic Drugs Targeting 2019-nCov Main Protease by High-Throughput Screening.” Using high-throughput screening based on 8,000 clinical drug libraries, they identified four small molecular drugs that bind the SARS-CoV-2 main protease. The authors noted that these drugs have been proven to be safe and, therefore, may be promising candidates to roll out in the current outbreak.

Also, Kang and his team found that several antiviral agents, such as Kaletra (a lopinavir/ritonavir combination) could be used for the treatment of SARS-CoV-2. Overall, the authors suggest that the list of antiviral drugs identified by the MT-DTI model should be considered when establishing effective treatment strategies for SARS-CoV-2.

Whilst Deargen has no plans to develop an anti-viral against SARS-CoV-2 as yet, however the company is open to considering development in the future. However, Deargen has a reinforcement learning-based molecule optimisation/generation AI (artificial intelligence) model named “molecule equalizer (MolEQ)”, so if the company decides to go ahead with development, the team could generate putative molecules as candidates that are predicted to bind strongly to target proteins.

However, a problem is that regulations in circumstances like the SARS-CoV-2 outbreak are unclear, and Kang wonders whether a loosening of regulations combined with extensive support from the governments may help clinicians to prescribe these optional measures more easily to the patients considering the urgency of the situation, since allowing atazanavir and other top-ranked anti-viral drugs to be used for expanded experimental therapeutic options may be one way to help the patients affected by the SARS-CoV-2 outbreak.

Meanwhile, the U.S.-based Gilead Sciences said it would partner with Chinese health authorities on a randomised, controlled trial designed to assess its antiviral drug candidate remdesivir (GS-5734) as a potential treatment for COVID-19.

Gilead said that it has offered remdesivir, a Nuc inhibitor, for use in a small number of patients with  COVID-19 for emergency treatment in the absence of any approved treatment options. The company is also expediting appropriate laboratory testing of remdesivir against SARS-CoV-2 samples.

More recently, Pharmaceutical Technology of 18 March 2020 reported that China’s Science and Technology Ministry official Zhang Xinmin said that Japan-based Fujifilm Toyama Chemical’s anti-flu drug Favipiravir (also known as Avigan) helped COVID-19 patients recover.

Approval in Japan in 2014, in 2016, Japan provided Favipiravir as emergency aid for the Ebola virus outbreak in Guinea, according to Reuters.

Favipiravir received Chinese approval for manufacturing by Zhejiang Hisun Pharmaceutical to treat adults with new or recurring influenza, according to a filing by the company in February 2020 and also tat month, Chinese media reported that the drug had received approval as an investigational therapy for SARS-CoV-2 infection.


Meanwhile according to Genetic Engineering & Biotechnology News, a number of companies have already reported working on vaccine production, including a collaboration between the mRNA (messenger RNA) company Moderna and the National Institute of Allergy and Infectious Diseases (NIAID), a branch of the Maryland, U.S.-based National Institutes of Health (NIH), and agency under the U.S. Department of Health and Human Services.

In a press release on 23 January 2020, Moderna, a clinical stage biotechnology company pioneering messenger RNA (mRNA) therapeutics and vaccines  said that the Coalition for Epidemic Preparedness Innovations (CEPI) would fund it to develop and manufacture an mRNA vaccine against 2019-CoV-2.

The Vaccine Research Center (VRC) of the NIAD had collaborated with Moderna to design the vaccine and NIAID will conduct Investigational New Drug-enabling studies and a Phase 1 clinical study in the U.S.

Then again, even quick vaccine development may be too slow to catch up with the fast-growing outbreak.

Preventive measures

Meanwhile, big data analytics and AI are also being used in non-medical applications to predict and help prevent COVID-19 outbreaks.

For instance, AI startup company BlueDot has developed intelligent systems that sift through data about people to determine the chances of disease occurrence. Its AI platform is amongst the latest technological advances using data analytics to map and prevent diseases. BlueDot is notable in that it had predicted the SARS pandemic which turned out to be true.

Geographic Information System (GIS) technology has become an important tool for stopping the spread of the coronavirus, with John Hopkins University leading the way in this area, for example, John Hopkins’ dashboard which shows all cases of coronavirus around the world.

Data mining is critical for GIS technology to work because of using information to detect areas where people talk about the disease. Social media sites are good information sources for GIS as the it maps the area of interest where people are talking about the coronavirus. Accordingly, prevention measures can be implemented since these heatmaps can better track both the location and the spread of a disease.

It was practically impossible to track diseases 10 years ago but today, with AI, machine learning and GIS, data mining and extracting insights is both easier and more powerful at location viruses, thus enabling quicker prevention response times today.

And in China

China primarily uses AI technologies to deal with the COVID-19 outbreak, with its technology companies developing applications to help people confirm their movements during the period of the outbreak as a safety measure and to avoid further spread. Such applications are being use on trains for passenger screening, where applications check the movement and contact of people.

Also, technology companies in China are analysing passenger flight information with other cases such as in Guangzhou where AI robots are reminding people to wear masks for disease prevention.

Predictive analytics from data enables newer approaches to outbreak management by delivering updates which are then followed by additional updates.

Data analytics, AI and machine learning have been pivotal in addressing the coronavirus outbreak and according to World Health Organisation, the situation is stabilising despite new infections reported and deaths still increasing, though the WHO cautions that more needs to be done to eradicate this outbreak.

China’s government is working round the clock with the WHO and governments worldwide by using technology to fight the outbreak.