Covid-19 analysis version 5
Supporting different data sources#
On December 14th the ECDC decided to stop providing daily data on new infections and deaths for the countries of the world. Instead they came up with weekly reports that are published on Fridays starting with December 18th.
For people like us who were using the ECDC data for almost 10 months this decision added a burden. Changing a data source on the fly has critical implications as all data of the past have to be re-calculated to ensure no ‘broken’ calculations happening at one particular date.
Luckily we started the redesign of our CovidCases class already in early December to support different data sources such as county/city data of different countries. Quickly we switched this to add support for different world data. Sources we implemented are the WHO (World Health Organization) and the OWID (Our World in Data) platform, an open source platform to collect a huge variety of numbers in many different areas. The OWID platform is using the data from the Johns Hopkins University in Baltimore, USA.
Of course changing the data source requires a comparison of the different source which was a big first step (with interesting results) and you can read about it here.
The following lists an overview of the changes in version 5. For each of the changes we have a separate post providing details of the changes.
New version of the CovidCases class#
The major change is the the class as a abstract class (in a C++ analogy it contains some pure virtual functions) that act as a parent to sub-classes. The sub-classes are responsible to download the data while the base class provides functions to process the data. See the documentation here for all details.
New sub-classes CovidCasesECDC, CovidCasesWHO and CovidCasesOWID#
These 3 new classes implement the download and preparation of the data from the ECDC, WHO and OWID. One of these classes can be used to add new classes such as one for the Worldometer data.
Click here to read more about the sub-classes.
New GeoInformationWorld class#
The ECDC used the ISO 3166 alpha-2 string to refer to the country code but has two bugs in their list. Instead of GB they used UK for the United Kingdom and EL instead of GR for Greece. An alpha-2 country code uses two alpha characters such as DE, US, JP etc. to identify a country.
At the same time the OWID platform uses ISO 3166 alpha-3 strings such as GER, USA, JPN etc.. Accordingly the ECDC and WHO use the population data of 2019, while the OWID doesn’t mention the source and time of their population data. Even the continents are not treated consistently as the WHO splits America into North-America and South-America while the OWID even uses South-East Asia. Of course this makes it difficult to compare continent numbers. The same inconsistency happens with the country names. It might be Vietnam or Viet Nam.
Therefore we introduced this class that contains alpha-2 and alpha-3 country codes as well as the country name (based on the WHO) and population (based on 2019) values. Read about the GeoInformationWorld here.
Other changes include the REST-API that makes now use of the WHO data and the CovidClassSnippet that shows you how to used the different data sources. The CovidDataClass Jupyter Notebook, the PlotterBuilder and CovidMap modules implement the changes in the classes as well. You may want to read about the changes in the REST API here.
All updated source code can be downloaded from our Covid-19 analysis repository on GitHub.