The CovidCases analysis class
CovidCases class documentation#
The development of the class is constantly in progress and we are adding more and more functions. The usage is pretty simple. In a first step you have to download the actual data and afterwards you call methods to access the data for a given list of countries. Here are the function definitions in detail:
@staticmethod
def download_CSV_file():
automatically downloads the database file from the ECDC if it doesn’t exists. It needs to be called in a try-except
block as it may throw a FileNotFoundError
or an IOError
exception. The function will check if the file YYYY-MM-DD-db.csv already exists in the covid-19-analysis/data folder. If so the function will return the fully qualified filename. If not it will connect to the ECDC server to download the file and return its filename as well.
def __init__(self, filename):
The constructor of the class just takes the fully qualified filename containing the csv downloaded from the ECDC. The class will keep a copy of all data in the file.
def get_country_data_by_geoid_list(self, geoIDs, lastNdays=0, sinceNcases=0):
Return a Pandas DataFrame
by a list of strings containing the geoIDs of countries such as ['DE', 'UK']
. Here you will find a list of GeoIDs and countries. Optional parameters are:
lastNdays
: returns just the data of the last n days.sinceNcases
: returns just the data since the nth case has been exceeded per country.
The returned DataFrame
consists of rows containing the date and columns containing the data as follows:
Column | Description |
---|---|
Country | The name of country |
GeoID | The GeoID of the country. Refer to this post to get a list of GeoIDs and country names. |
Population | The population of the country based on 2019 data. |
Continent | The continent of the country. |
Cases | The overall number of confirmed infections (here called cases) since December 31st. 2019 as published by the ECDC. |
DailyCases | The daily number of confirmed cases. |
Deaths | The overall number of deaths of confirmed cases. |
DailyDeaths | The daily number of deaths of confirmed cases |
PercentDeaths | The percentage of deaths of the confirmed cases. This is also called Case-Fatality-Rate (CFR) which is an estimation for the Infection-Fatality-Rate (IFR) which also includes unconfirmed (hidden or dark) infections |
DoublingTime | The time in days after which the number of Cases are doubled |
CasesPerMillionPopulation | The number of Cases divided by the population in million |
DeathsPerMillionPopulation | The number of Deaths divided by the population in million |
After calling add_r0
or add_incidence_7day_per_100Kpopulation
or add_lowpass_filter_for_attribute
(refer to the function definitions below) you will notice additional attributes such as:
Column | Description |
---|---|
R | An estimation of the reproduction number R0. The attribute should finally lowpassed filtered with a kernel size of 1x7. |
Incidence7DayPer100Kpopulation | The accumulated 7-day incidence. That is the sum of the daily cases of the last 7 days divided by the population in 100000 people. |
DailyCases7 | After calling add_lowpass_filter_for_attribute with the attribute name DailyCases and a filter size of 7 you will get this new attribute that represents the average number of DailyCases of the last 7 days. Of course you can filter all of the attributes given in the list above with whatever filter size. |
def get_country_data_by_geoid_string_list(self, geoIDstringList, lastNdays=0, sinceNcases=0):
Exactly the same as the function above, but this time the list of GeoIDs is given as a comma separted list such as "DE, UK"
.
def get_all_country_data(self, lastNdays=0, sinceNcases=0):
The function works as the two functions above, but this time it returns a DataFrame
for all countries in the csv. Notice that it might take some time before the function returns.
def save_df_to_csv(self, df, filename):
Saves the given DataFrame
df
to a csv file. The file will contain all columns of the DataFrame
, also those who have been added by the functions below.
def add_r0(self, df):
Adds an attribute to the given DataFrame
df
of each country that is an estimation of the reproduction number R0. Here the number is called ‘R’. The returned DataFrame
will contain low-passed filtered data with a kernel size of 1x7. If the attribute already exists in the df the function will return the given df.
def add_incidence_7day_per_100Kpopulation(self, df):
Adds an attribute to the df of each country that is representing the accumulated 7-day incidence. That is the sum of the daily cases of the last 7 days divided by the population in 100000 people. If the attribute already exists the function will return the given df.
def add_lowpass_filter_for_attribute(self, df, attribute, n):
Adds an attribute to the given dataframe df of each country that is the lowpass filtered data of the given attribute
(attribute name as a string). The width of the lowpass is given by the number n
. The name of the newly created attribute is the given name with a tailing number n. E.g. Cases
with n = 7
will add to a newly added attribute named Cases7
. If the attribute already exists the function will return the given DataFrame
df.
def get_available_GeoID_list(self):
Returns a DataFrame
having just two columns for the GeoID and Country name. You may want to store the returned DataFrame
as a csv file.
Methods added in version 4.0.0 (generation of the heatmaps)#
In the newest released version of our Repository, the ability for generating heatmaps of the world was added. In order
to plot the worldmaps we need to know which ISO2 Code belongs to which country. The following static methods are all
returning a list of ISO2 codes that are understood by the Pygal world map package. This is needed as Pygal uses different country codes as the ECDC and
some countries are not available in Pygal. Also be aware that the WHO writes the codes uppercase while Pygal asks
for lowercase codes. England is called gb in Pygal and UK in the ECDC data. This is similar to Greece which is called gr in Pygal and EL in the ECDC data.
The following methods returns the intersection of the country codes the ECDC uses and those that are understood by Pygal, grouped by the specified Continent. Once a data frame have been generated for these countries the CovidMap
class can be used to generate a world map for you. Refer to the code in the Junyper Notebook to see how to do this.
def get_pygal_[continent]_geoid_list():
def get_pygal_[continent]_geoid_string_list():
The upper methods returns an array of the country codes and the lower method a string with all codes comma separated.
the field [continent]
could be one of the following:
- european
- american
- asian
- african
- oceania