Developing an Effective Model for COVID-19
As the COVID-19 pandemic raged in early 2020, modeling its spread became critical.
The London School of Hygiene and Tropical Medicine (LSHTM) built a model to track the virus’ reproduction rate around the world. The model was based on data from the European Centre for Disease Prevention and Control.
Presented on their website via a simple dashboard, the model quickly became a valuable tool for policymakers and journalists, as well as a resource for other researchers. As its value became clear, LSHTM wanted to expand the model to provide detail at sub-national levels, including individual regions and states.
Standardizing Disparate Data Sets with AI
To collect data from separate regional repositories, the LSHTM built an R package – a set of computer code – for data collection and preparation. However, this new incoming data was hard to compare, due to inconsistent reporting formats, different systems for case reporting, and changes in testing over time.
Recognizing the importance of getting the fundamentals right, and lacking the capacity to do it themselves, LSHTM looked for help through the Royal Society’s Rapid Assistance in Modelling the Pandemic (RAMP) initiative. This called for volunteers to support modeling the pandemic and guide the UK’s response.
Keen to help in the fight against COVID, we offered our AI and data science expertise pro bono.
Using AI and Data Science to Refine LSHTM’s Model
Tessella helped develop code, built around advanced AI algorithms, to make sense of the different data sources available to the LSHTM.
Our first challenge was to standardize incoming data variables. This proved challenging since raw data varied hugely and was often presented in different formats. Some countries published cases and mortality rates only, while others provided more granular detail. Figures related to hospitalizations and patient recovery, for example.
To make things even trickier, the definition of “COVID-related deaths” differed from country to country. Some restricted their figures to deaths caused directly by COVID-19, whereas others included any deaths following a positive COVID diagnosis – regardless of the ultimate cause.
Using our knowledge and experience of data science and AI in life science projects, we worked with the LSHTM to explore how complete the data was and agree the most useful parameters. This involved trade-offs between using fewer parameters where there was good data around the world and more parameters with lots of missing values.
Eventually, we agreed on standard parameters (cases, deaths, recoveries, hospitalizations, tests), stratified by date and region. We updated the R package to adhere to these new data standards, which allowed the LSHTM to:
- Classify different data sets
- Account for differences in reporting
- Feed data into the model in a consistent format
Finally, we leveraged AI technology to make the package more robust and “production-ready” through better tests, documentation, and IT infrastructure.
Providing Expert Insight in Trying Times
The challenges presented by this project are far from uncommon. Although researchers have the data they need to build usable models, they don’t always have the time and technical expertise required to optimize them. Especially when complex technologies like AI are needed to process large and disparate volumes of data.
"This is quite a common scenario in academia", says Dr Sebastian Funk, associate professor at the LSHTM. "We can build models that we hope are useful to the world, but there is software engineering work to do around them that takes a lot of expert time, which we don’t always have enough of".
It was invaluable to have Tessella who understood the challenge and could just step in and take a lot of that work on. We were fortunate that in this difficult time, experts like Tessella were offering support for free. It’s just a shame we couldn’t keep them to support the work on an ongoing basis.
Dr Sebastian Funk
Associate Professor, LSHTM
This isn’t the first time our AI and data science expertise has been sought in connection with a large-scale healthcare project. And this gave us a better understanding of the challenges facing the LSHTM.
We often see these challenges where complex and disparate datasets are involved. Modelers don’t always have the capacity to find all the data they need, ensure it’s collected and curated in appropriate ways, check code, and integrate the model with backend IT or online user interfaces. These are all areas that Tessella can support.
Computing Consultant, Tessella
The open-source code is now available as a community resource. The LSHTM and others, including Medicin Sans Frontieres, are currently using it in their research.