Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subnational data for Vietnam #413

Merged
merged 23 commits into from
Sep 27, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8996977
Add priliminary support for Vietnam subnational data
Aug 28, 2021
4f57d0f
Add empty locations to Unknown
Sep 2, 2021
a66539f
Update with cases_death from 5F team
Sep 12, 2021
0f0cafb
Change data source for Vietnam
Sep 13, 2021
abcd78a
Add priliminary support for Vietnam subnational data
biocyberman Aug 28, 2021
cd25e34
Add empty locations to Unknown
biocyberman Sep 2, 2021
7edc95c
Update with cases_death from 5F team
biocyberman Sep 12, 2021
7ca169a
Change data source for Vietnam
biocyberman Sep 13, 2021
29070dd
Refactor code for PR #413
biocyberman Sep 15, 2021
ce5de11
Merge branch 'vietnam' of https://github.com/biocyberman/covidregiona…
RichardMN Sep 16, 2021
b54bd03
Reformatting of code
RichardMN Sep 16, 2021
abf4cdd
Initial implementation of a generic json_reader function and download…
RichardMN Sep 16, 2021
7bd4fb0
Using JSON reader to download Vietnam code
RichardMN Sep 16, 2021
3aeddbe
Closer to a tidy version of cleaning Vietnam data
RichardMN Sep 17, 2021
5681a2c
Fixing final glitches in clean_common
RichardMN Sep 17, 2021
5f7e82e
Merge branch 'json-reader' into pr/413
RichardMN Sep 19, 2021
f0cec88
Merge branch 'json-reader-vietnam' into pr/413
RichardMN Sep 19, 2021
2c20067
JSON-reading implementation of Vietnam
RichardMN Sep 19, 2021
26cd8ad
Remove prefixes in vietnam_codes.R
RichardMN Sep 19, 2021
9c35694
lint cleaning
RichardMN Sep 19, 2021
4b26675
Update DESCRIPTION
seabbs Sep 20, 2021
b3af1a6
fix merge issues
seabbs Sep 27, 2021
d1232f1
update description, news and run tests
seabbs Sep 27, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/Vietnam.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
on:
schedule:
- cron: '36 12 * * *'
workflow_dispatch:

name: Vietnam

jobs:
Vietnam:
runs-on: macOS-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v2

- uses: r-lib/actions/setup-r@v1

- name: Query dependencies
run: |
install.packages('remotes')
saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2)
writeLines(sprintf("R-%i.%i", getRversion()$major, getRversion()$minor), ".github/R-version")
shell: Rscript {0}

- name: Cache R packages
uses: actions/cache@v2
with:
path: ${{ env.R_LIBS_USER }}
key: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-${{ hashFiles('.github/depends.Rds') }}
restore-keys: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-

- name: Install dependencies
run: |
install.packages(c("remotes"))
remotes::install_deps(dependencies = TRUE)
install.packages("devtools")
shell: Rscript {0}

- name: Install package
run: R CMD INSTALL .

- name: Test dataset
run: |
options("testDownload" = TRUE)
options("testSource" = "Vietnam")
devtools::load_all()
testthat::test_file("tests/testthat/test-regional-datasets.R", reporter = c("summary", "fail"))
shell: Rscript {0}
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ export(SouthAfrica)
export(Switzerland)
export(UK)
export(USA)
export(Vietnam)
export(WHO)
export(expect_clean_cols)
export(expect_columns_contain_data)
Expand Down
124 changes: 124 additions & 0 deletions R/Vietnam.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#' Vietnam Class for downloading, cleaning and processing
#' notification data
#'
#' @description Information for downloading, cleaning
#' and processing covid-19 region data for Vietnam.
#'
# nolint start
#' @source \url{https://github.com/biocyberman/covidregionaldata/}
# nolint end
biocyberman marked this conversation as resolved.
Show resolved Hide resolved
#' @export
#' @concept dataset
#' @family subnational
#' @examples
#' \dontrun{
#' region <- Vietnam$new(verbose = TRUE, steps = TRUE, get = TRUE)
#' region$return()
#' }
Vietnam <- R6::R6Class("Vietnam",
inherit = DataClass,
public = list(

# Core Attributes (amend each paramater for country specific infomation)
biocyberman marked this conversation as resolved.
Show resolved Hide resolved
#' @field origin name of country to fetch data for
origin = "Vietnam",
#' @field supported_levels List of supported levels.
supported_levels = list("1"),
#' @field supported_region_names List of region names in order of level.
supported_region_names = list("1" = "region"),
#' @field supported_region_codes List of region codes in order of level.
supported_region_codes = list("1" = "is_3166_2"),
biocyberman marked this conversation as resolved.
Show resolved Hide resolved
#' @field common_data_urls List of named links to raw data.
# nolint start
common_data_urls = list(
"case_by_time" = 'https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=case_by_time',
"death_by_time" = 'https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=death_by_time',
"recovered_by_time" = 'https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=recovered_by_time'

),
# nolint end
#' @field source_data_cols existing columns within the raw data
source_data_cols = c(
"cases_total", "deaths_total", "recovered_total"
),
#' @field source_text Plain text description of the source of the data
source_text = "Public COVID-19 data curated by 5F team",
#' @field source_url Website address for explanation/introduction of the
#' data
source_url = "https://covid.ncsc.gov.vn", # nolint

#' @description Set up a table of region codes for clean data
#' @importFrom tibble tibble
set_region_codes = function(){
self$codes_lookup$`1` <- covidregionaldata::vietnam_codes
},

#' @description Provincial Level Data
#' cleaning
#' @param ... pass additional arguments
#'
#' @importFrom dplyr filter select mutate rename
#' @importFrom tidyr replace_na drop_na
#' @importFrom lubridate dmy
#' @importFrom jsonlite fromJSON
biocyberman marked this conversation as resolved.
Show resolved Hide resolved
clean_common = function() {
biocyberman marked this conversation as resolved.
Show resolved Hide resolved
Sys.setenv("VROOM_CONNECTION_SIZE" = 131072*4) # Fix VROOM error
biocyberman marked this conversation as resolved.
Show resolved Hide resolved
provines_url = 'https://covid.ncsc.gov.vn/api/v3/covid/provinces'
bundles = names(self$data$raw)
provines_data = jsonlite::fromJSON(provines_url)
biocyberman marked this conversation as resolved.
Show resolved Hide resolved

get_bundles_data = function(bundles){
bundles_data = list()
for (bundle in bundles){
url = paste0('https://covid.ncsc.gov.vn/api/v3/covid/provinces?filter_type=', bundle)
data = jsonlite::fromJSON(url)
bundles_data = c(bundles_data, setNames(list(data), bundle))
}
bundles_data
}

biocyberman marked this conversation as resolved.
Show resolved Hide resolved
bundles_data = get_bundles_data(bundles)

get_province = function(id, data){
row_dat = provines_data[(id=id),]
death_by_time= do.call(cbind, data$death_by_time[id])
case_by_time=do.call(cbind, data$case_by_time[id])
recovered_by_time=do.call(cbind, data$recovered_by_time[id])
if (!identical(row.names(death_by_time), row.names(death_by_time))) {
stop("Dates on case_by_time and death_by_time do not match!")
RichardMN marked this conversation as resolved.
Show resolved Hide resolved
}
df = dplyr::tibble(date= lubridate::dmy(row.names(case_by_time)),
biocyberman marked this conversation as resolved.
Show resolved Hide resolved
id = row_dat$id,
name = row_dat$name,
case_by_time= case_by_time,
death_by_time= death_by_time,
recovered_by_time= recovered_by_time)
df
}

df = do.call(rbind, lapply(provines_data$id, function(id){get_province(id, bundles_data)}))
names(df) <- c("date", "id", "region_name", "cases_total", "deaths_total", "recovered_total")

self$data$clean <- df %>%
select( date, region_name, cases_total, deaths_total, recovered_total) %>%
mutate(cases_total = as.numeric(cases_total),
deaths_total = as.numeric(deaths_total),
recovered_total = as.numeric(recovered_total),
region_name = stringr::str_replace_all(region_name, 'TP HCM', 'Hochiminh'),
) %>%
tidyr::drop_na(date, region_name) %>%
rename(level_1_region = region_name) %>%
mutate(
level_1_region = stringi::stri_trans_general(level_1_region, "latin-ascii"),
level_1_region = stringi::stri_trim_both(level_1_region),
level_1_region = stringr::str_replace_all(level_1_region, '\\(.*\\)|-| ', ''),
level_1_region = stringr::str_to_title(level_1_region),
level_1_region = tidyr::replace_na(level_1_region, "Unknown")
) %>%
left_join(
self$codes_lookup$`1`,
by = c("level_1_region" = "level_1_region")
)
}
)
)
6 changes: 6 additions & 0 deletions R/datasets.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@
#' @return A tibble of region codes and related information.
"france_codes"

#' Region Codes for Vietnam Dataset.
#'
#' @description The region codes for Viet Nam
#' @return A tibble of region codes and related information.
"vietnam_codes"

#' Region Codes for JHU Dataset. Taken from the region codes provided as
#' part of the WHO dataset.
#'
Expand Down
41 changes: 19 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ the temporary directory by default),

``` r
start_using_memoise()
#> Using a cache at: /tmp/RtmprTOAdV
#> Using a cache at: /tmp/RtmpPgZXiv
```

To stop using `memoise` use,
Expand All @@ -105,7 +105,7 @@ the Google COVID-19 open data project), use:
``` r
nots <- get_national_data()
#> Downloading data from https://covid19.who.int/WHO-COVID-19-global-data.csv
#> Rows: 132483 Columns: 8
#> Rows: 142911 Columns: 8
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): Country_code, Country, WHO_region
Expand All @@ -117,20 +117,20 @@ nots <- get_national_data()
#> Cleaning data
#> Processing data
nots
#> # A tibble: 132,483 x 15
#> date un_region who_region country iso_code cases_new cases_total
#> <date> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2020-01-03 Asia EMRO Afghanistan AF 0 0
#> 2 2020-01-03 Europe EURO Albania AL 0 0
#> 3 2020-01-03 Africa AFRO Algeria DZ 0 0
#> 4 2020-01-03 Oceania WPRO American Samoa AS 0 0
#> 5 2020-01-03 Europe EURO Andorra AD 0 0
#> 6 2020-01-03 Africa AFRO Angola AO 0 0
#> 7 2020-01-03 Americas AMRO Anguilla AI 0 0
#> 8 2020-01-03 Americas AMRO Antigua & Bar… AG 0 0
#> 9 2020-01-03 Americas AMRO Argentina AR 0 0
#> 10 2020-01-03 Asia EURO Armenia AM 0 0
#> # … with 132,473 more rows, and 8 more variables: deaths_new <dbl>,
#> # A tibble: 142,911 × 15
#> date un_region who_region country iso_code cases_new cases_total
#> <date> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2020-01-03 Asia EMRO Afghanistan AF 0 0
#> 2 2020-01-03 Europe EURO Albania AL 0 0
#> 3 2020-01-03 Africa AFRO Algeria DZ 0 0
#> 4 2020-01-03 Oceania WPRO American Samoa AS 0 0
#> 5 2020-01-03 Europe EURO Andorra AD 0 0
#> 6 2020-01-03 Africa AFRO Angola AO 0 0
#> 7 2020-01-03 Americas AMRO Anguilla AI 0 0
#> 8 2020-01-03 Americas AMRO Antigua & Barbuda AG 0 0
#> 9 2020-01-03 Americas AMRO Argentina AR 0 0
#> 10 2020-01-03 Asia EURO Armenia AM 0 0
#> # … with 142,901 more rows, and 8 more variables: deaths_new <dbl>,
#> # deaths_total <dbl>, recovered_new <dbl>, recovered_total <dbl>,
#> # hosp_new <dbl>, hosp_total <dbl>, tested_new <dbl>, tested_total <dbl>
```
Expand Down Expand Up @@ -171,7 +171,7 @@ for example by level 1 region in the UK, use:
``` r
uk_nots <- get_regional_data(country = "UK", verbose = FALSE)
uk_nots
#> # A tibble: 6,916 x 26
#> # A tibble: 7,501 × 26
#> date region region_code cases_new cases_total deaths_new deaths_total
#> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2020-01-30 East Mi… E12000004 NA NA NA NA
Expand All @@ -184,16 +184,13 @@ uk_nots
#> 8 2020-01-30 Scotland S92000003 NA NA NA NA
#> 9 2020-01-30 South E… E12000008 NA NA NA NA
#> 10 2020-01-30 South W… E12000009 NA NA NA NA
#> # … with 6,906 more rows, and 19 more variables: recovered_new <dbl>,
#> # … with 7,491 more rows, and 19 more variables: recovered_new <dbl>,
#> # recovered_total <dbl>, hosp_new <dbl>, hosp_total <dbl>, tested_new <dbl>,
#> # tested_total <dbl>, areaType <chr>, cumCasesByPublishDate <dbl>,
#> # cumCasesBySpecimenDate <dbl>, newCasesByPublishDate <dbl>,
#> # newCasesBySpecimenDate <dbl>, cumDeaths28DaysByDeathDate <dbl>,
#> # cumDeaths28DaysByPublishDate <dbl>, newDeaths28DaysByDeathDate <dbl>,
#> # newDeaths28DaysByPublishDate <dbl>, newPillarFourTestsByPublishDate <lgl>,
#> # newPillarOneTestsByPublishDate <dbl>,
#> # newPillarThreeTestsByPublishDate <dbl>,
#> # newPillarTwoTestsByPublishDate <dbl>
#> # newDeaths28DaysByPublishDate <dbl>, …
```

Now we have the data we can create plots, for example the time-series of
Expand Down
33 changes: 33 additions & 0 deletions data-raw/vietnam_codes.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Set vietnam region codes
#
# Level 1 codes: ISO-3166-2
# Source: https://en.wikipedia.org/wiki/ISO_3166-2:VN
#
library(rvest)
library(stringi)
library(stringr)
library(dplyr)
library(tibble)

# Level 1 -----------------------------------------------------------------
# Get ISO codes
vn_iso <- "https://en.wikipedia.org/wiki/ISO_3166-2:VN"

level_1_region_df <- read_html(vn_iso) %>%
html_element(css="table.wikitable:nth-child(11)") %>%
html_table()

vietnam_codes <- data.frame(
level_1_region_code = level_1_region_df$Code,
level_1_region = level_1_region_df$`Subdivision name (vi)`,
stringsAsFactors = FALSE
) %>%
mutate(
level_1_region = stringi::stri_trans_general(level_1_region, "latin-ascii"),
level_1_region = stringi::stri_trim_both(level_1_region),
level_1_region = stringr::str_replace_all(level_1_region, '\\(.*\\)|-| ', ''),
level_1_region = stringr::str_to_title(level_1_region)
RichardMN marked this conversation as resolved.
Show resolved Hide resolved
)

# update package region_codes
usethis::use_data(vietnam_codes, overwrite = TRUE)
Binary file modified data/all_country_data.rda
Binary file not shown.
Binary file added data/vietnam_codes.rda
Binary file not shown.
3 changes: 2 additions & 1 deletion man/Belgium.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Brazil.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Canada.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Colombia.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Covid19DataHub.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Cuba.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/France.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Germany.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Google.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading