The CovidCases analysis class version 5
CovidCases class documentation#
This abstract class acts as a parent class for different sub-classes. While the sub-classes are responsible to get the data from different sources as a Pandas DataFrame
this class provides functions to process the data.
This are the methods provided by the class:
def __init__(self, DataFrame):
The constructor of the class just takes the Pandas DataFrame
created by a sub-class. The rows of the DataFrame
contain the data for a specific date, they build a time series of data with the latest date in the top (row 0). The columns have to include the following mandatory attributes and may have additional private columns if required:
Column | Description |
---|---|
GeoName | The name of the country, county or city |
GeoID | The GeoID of the country. Refer to this post to get a list of GeoIDs and country names. |
Population | The population of the country, county or city based on 2019 data. |
Continent | The continent of the country. In case of a city it may be the county. In case of a county it may be a federal state or region. In general it’s a grouping in a level above the meaning of the GeoName - GeoID combination. |
DailyCases | The daily number of confirmed cases. |
DailyDeaths | The daily number of deaths of confirmed cases |
Based on the given columns the class will generate the following columns:
Column | Description |
---|---|
Cases | The overall number of confirmed infections (here called cases) since December 31st. 2019 as published by the data source. |
Deaths | The overall number of deaths of confirmed cases. |
PercentDeaths | The percentage of deaths of the confirmed cases. This is also called Case-Fatality-Rate (CFR) which is an estimation for the Infection-Fatality-Rate (IFR) which also includes unconfirmed (hidden or dark) infections |
DoublingTime | The time in days after which the number of Cases are doubled |
CasesPerMillionPopulation | The number of Cases divided by the population in million |
DeathsPerMillionPopulation | The number of Deaths divided by the population in million |
After calling add_r0
or add_incidence_7day_per_100Kpopulation
or add_lowpass_filter_for_attribute
(refer to the function definitions below) you will notice additional attributes such as:
Column | Description |
---|---|
R | An estimation of the reproduction number R0. The attribute should finally be low-pass filtered with a kernel size of 7. |
Incidence7DayPer100Kpopulation | The accumulated 7-day incidence. That is the sum of the daily cases of the last 7 days divided by the population in 100000 people. |
DailyCases7 | After calling add_lowpass_filter_for_attribute with the attribute name DailyCases and a filter size of 7 you will get this new attribute that represents the average number of DailyCases of the last 7 days. Of course you can filter all of the attributes given in the list above with whatever filter size. |
def get_country_data_by_geoid_list(self, geoIDs, lastNdays=0, sinceNcases=0):
Return a Pandas DataFrame
by a list of strings containing the geoIDs of countries such as [[DE] [UK]]
. Here you will find a list of GeoIDs and countries. Optional parameters are:
lastNdays
: returns just the data of the last n days.sinceNcases
: returns just the data since the nth case has been exceeded per country.
def get__data_by_geoid_string_list(self, geoIDstringList, lastNdays=0, sinceNcases=0):
Exactly the same as the function above, but this time the list of GeoIDs is given as a comma separated list such as "DE, UK"
.
def get_all_data(self, lastNdays=0, sinceNcases=0):
The function works as the two functions above, but this time it returns a DataFrame
for all countries in the csv. Notice that it might take some time before the function returns.
def save_df_to_csv(self, df, filename):
Saves the given DataFrame df
to a csv file. The file will contain all columns of the DataFrame
, also those who have been added by the functions below.
def add_r0(self, df):
Adds an attribute to the given DataFrame
df
of each country that is an estimation of the reproduction number R0. Here the number is called ‘R’. The returned DataFrame
will contain low-passed filtered data with a kernel size of 7. If the attribute already exists in the df the function will return the given df.
def add_incidence_7day_per_100Kpopulation(self, df):
Adds an attribute to the df of each country that is representing the accumulated 7-day incidence. That is the sum of the daily cases of the last 7 days divided by the population in 100000 people. If the attribute already exists the function will return the given df.
def add_lowpass_filter_for_attribute(self, df, attribute, n):
Adds an attribute to the given DataFrame
df
of each country that is the low-pass filtered data of the given attribute
(attribute name as a string). The width of the low-pass is given by the number n
. The name of the newly created attribute is the given name with a tailing number n. E.g. Cases
with n = 7
will add to a newly added attribute named Cases7
. If the attribute already exists the function will return the given DataFrame
df.
@abstractmethod
def get_available_GeoID_list(self):
Returns a DataFrame
having just two columns GeoID
and GeoName
. You may want to store the returned DataFrame
as a csv file. The function has to be implemented by all sub-classes.
@abstractmethod
def get_data_source_info(self):
Returns a DataFrame
containing information about the data source. The DataFrame
holds 3 columns:
InfoFullName
: The full name of the data source
InfoShortName
: A shortname for the data source
InfoLink
: The link to get the data
The function has to be implemented by all sub-classes.
@abstractmethod
def review_geoid_list(self, geoIDs):
Returns a corrected version of the given geoID list to ensure that mismatches like UK versus GB are corrected by the sub-class. For instance: If the given list contains [‘DE’, ‘UK’] the function will return [‘DE’, ‘GB’] to correct the wrong UK with the ISO-3166-alpha_2 conformal GB.
CovidCases sub classes documentation#
Refer to the CovidCases sub-classes documentation for details about the different sub-classes as these might contain additional features or attributes.**