Warning: This is just me trying to learn Altair + trying to answer my own questions.

1. Data

First up, loading datasets

Pakistan Data

Code for Pakistan has made a public app and is updating data on a google sheet.

def get_pk_data():
    pk_url = "https://docs.google.com/spreadsheets/d/1ljt1URrDZRqTK0qke2yV3lD24CqrfnemhLKClMBYYrQ/export?format=csv&id=1ljt1URrDZRqTK0qke2yV3lD24CqrfnemhLKClMBYYrQ&gid=357374787"
    dateparse = lambda x: datetime.strptime(x, "%d-%m-%y")
    pk = pd.read_csv(pk_url, parse_dates=["Date"], date_parser=dateparse)
    return pk
    
pk = get_pk_data()
pk.head(2)
Date Province Suspected_24 Suspected_Cum Tested_24 Tested_Cum Confirmed_24 Confirmed_Cum Admitted_24 Admitted_Cum Discharged_24 Discharged_Cum Expired_24 Expired_Cum
0 2020-04-03 Islamabad 398.0 2395 398 2395 6 68 3 18 0 3 0 0
1 2020-04-03 Punjab 436.0 5522 12189 15134 75 920 72 456 0 6 0 11
pk.columns
Index(['Date', 'Province', 'Suspected_24', 'Suspected_Cum', 'Tested_24',
       'Tested_Cum', 'Confirmed_24', 'Confirmed_Cum', 'Admitted_24',
       'Admitted_Cum', 'Discharged_24', 'Discharged_Cum', 'Expired_24',
       'Expired_Cum'],
      dtype='object')
cases = alt.Chart(pk).mark_line(point=True).encode(
x="Date", y="Confirmed_Cum", color="Province")

tested = alt.Chart(pk).mark_line(point=True).encode(
x="Date", y="Tested_Cum", color="Province")

cases | tested

Global Data

Loading data from the john hopkins data repo, specifically the time series data.

This has timeseries data for Deaths, Recovered and Confirmed cases.

countries = ["Pakistan", "India", "Australia"]
d = df.query("country in @countries and type == 'deaths'")

alt.Chart(d).mark_line(point=True).encode(
    x="date", y="cases", color="country")

Australia Data only

From the Guardian, which is collecting the data from different sources and putting it all into a Google sheet.

#collapse
# google sheets and json
# https://www.theguardian.com/news/datablog/ng-interactive/2020/mar/23/how-many-cases-of-coronavirus-are-there-in-australia-live-statistics

aus_json_data = "https://interactive.guim.co.uk/docsdata/1q5gdePANXci8enuiS4oHUJxcxC13d6bjMRSicakychE.json"

r = requests.get(aus_json_data)
aus = r.json()
# now we have a dict object with one key, sheets
print(aus['sheets']['about'][0]['about'])
print(aus['sheets'].keys())

aus_totals = pd.DataFrame.from_records(aus['sheets']['latest totals'])
aus_updates = pd.DataFrame.from_records(aus['sheets']["updates"])
aus_totals
This data has been compiled by Guardian Australia from official state and territory media releases and websites. Some death dates and figures are from media reports. We assign cases to the date on which they were reported by the health department, and deaths are assigned to the date they occured. Extended data on testing and demographics varies between each state and territory so may not always be available. Please contact nick.evershed@theguardian.com if you spot an error in the data or to make a suggestion. This data is released under a Attribution 3.0 Australia (CC BY 3.0 AU) license, which means it is ok to re-use but please provide attribution and a link to Guardian Australia
dict_keys(['updates', 'latest totals', 'locations', 'age distribution', 'sources', 'about', 'data validation'])
State or territory Long name Confirmed cases (cumulative) Deaths Tests conducted Tests per million Percent positive Last updated
0 NSW New South Wales 2182 9 105543 13001 2.1 2020-03-31
1 VIC Victoria 968 4 47000 7089 2.1 2020-04-01
2 QLD Queensland 781 2 38860 7597 2.0 2020-04-01
3 SA South Australia 337 26000 14802 1.3 2020-03-31
4 ACT Australian Capital Territory 84 1 4059 9481 2.1 2020-04-01
5 NT Northern Territory 16 2020-03-31
6 TAS Tasmania 69 2 1779 3322 3.9 2020-04-01
7 WA Western Australia 392 2 14188 5393 2.8 2020-04-01
8 National National 4829 20 237429 9324 2.0 2020-04-01
aus_updates.columns
Index(['State', 'Date', 'Time', 'Cumulative case count', 'Cumulative deaths',
       'Tests conducted (negative)', 'Tests conducted (total)',
       'Intensive care (count)', 'Hospitalisations (count)',
       'Recovered (cumulative)', 'Update Source', 'Under 60', 'Over 60',
       'Community', 'Community - no known source', 'Travel-related',
       'Under investigation', 'Notes'],
      dtype='object')
aus_updates.sort_values(by="Date", inplace=True)
aus_updates.head()
State Date Time Cumulative case count Cumulative deaths Tests conducted (negative) Tests conducted (total) Intensive care (count) Hospitalisations (count) Recovered (cumulative) Update Source Under 60 Over 60 Community Community - no known source Travel-related Under investigation Notes
15 VIC 01/02/2020 4 162 Victoria DHHS
14 NSW 01/02/2020 4 100 NSW Health
13 SA 01/02/2020 0 25 SA Health website
53 NSW 01/03/2020 5 NSW Health media release 1 1 4 of 5 cases discharged
56 WA 01/03/2020 1 594 594 WA Health

#collapse
base = alt.Chart(aus_totals)

tests = base.mark_bar().encode(
    x="State or territory",
    y="Tests conducted:Q",
    color="State or territory"
).properties(
    title="Tests Conducted"
)

cases = base.mark_bar().encode(
    x="State or territory",
    y="Confirmed cases (cumulative):Q",
    color="State or territory"
).properties(
    title="Covid cases"
)

tests | cases

2. SIR Model

pk
Date Province Suspected_24 Suspected_Cum Tested_24 Tested_Cum Confirmed_24 Confirmed_Cum Admitted_24 Admitted_Cum Discharged_24 Discharged_Cum Expired_24 Expired_Cum
0 2020-03-31 Islamabad 195.0 1395 159 1469 7 58 3 16 0 3 0 0
1 2020-03-31 Punjab 154.0 3770 -992 2500 59 652 50 303 0 5 3 9
2 2020-03-31 Sindh 383.0 6328 383 6328 119 627 9 315 26 42 2 7
3 2020-03-31 Khyber Pakhtunkhwa 312.0 1334 188 1512 26 221 10 70 0 2 1 6
4 2020-03-31 Balochistan 130.0 1825 82 1792 9 153 6 139 0 2 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
149 2020-03-10 Sindh 8.0 83 116 116 13 13 12 12 1 1 0 0
150 2020-03-10 Khyber Pakhtunkhwa 0.0 20 26 26 0 0 0 0 0 0 0 0
151 2020-03-10 Balochistan 0.0 15 14 14 0 0 0 0 0 0 0 0
152 2020-03-10 Azad Kashmir 0.0 3 4 4 0 0 0 0 0 0 0 0
153 2020-03-10 Gilgit-Baltistan 1.0 10 22 22 1 1 1 1 0 0 0 0

154 rows × 14 columns