-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data import from data.statistik.gv.at #11
Conversation
Thank you very much for this contribution! As discussed behind our firewall, I have already tested this and the new I'll make some finetunings in the next couple of days and hopefully, I'll be able to adress these points
|
@GregorDeCillia the issue with incorrect timings is already fixed. |
@bernhard-da thanks. I think the other TODOs should also be pretty straightforward to complete. |
in `$meta$measures$fun`
- re-implement d3e8ef0 - allow years up to 2150
|
until now, all fields were shown if no argument was provided
the $extras$attribute_description field in the meta-jsons of open.data can now be read correctly for all 273 "valid" datasets valid datasets are datasets that satisfy * json$resources[[1]] == "csv" * length(json$resources) > 1 The returned descriptions are not always consistent with the contents of ${opendata_id}_HEADER.csv because the labels might be slightly different. However, the codes always match and use the same ordering
read_delim throws an error if the url for the csv is passed as a character. Due to differences in R<4.0 and the newest version, this caused a bug in od_table()
- status code 200 never occurs - check if raw object is empty rather than parsing it
- cache csv files in ~/.STATcubeR_cache - only download data again if last_modified in json is newer than cache - avoid nested loops for parsing - exclude unnecessary columns in meta$measures - make sure $data uses factors for classification variables
I just made some major changes to the import logic in 1179486. The returned object now only contains three columns in Now the question is how to deal with aggregation of data. The way I see it, there are two types of datasets
My suggestion: Since it is basically impossible to automatically detect total codes, I'd suggest x <- od_table("OGD_veste309_Veste309_1")
## define total codes
x$total_codes(
`C-A11-0` = "A11-1",
`C-STAATS-0` = "STAATS-9",
`C-VEBDL-0` = "VEBDL-10",
`C-BESCHV-0` = "BESCHV-1"
)
## `C-STAATS-0` and `C-BESCHV-0` are aggregated using total codes
## For `C-A11-0` and `C-VEBDL-0` total codes are excluded to make the result tidy
x$tabulate("C-A11-0", "C-VEBDL-0", "F-VESTE_AM")
#> # A tibble: 18 x 3
#> Sex `Region (NUTS2)` `Arithmetic mean`
#> <fct> <fct> <int>
#> 1 Male AT11 Burgenland 17
#> 2 Male AT12 Lower Austria 18
#> 3 Male AT13 Vienna 21
#> 4 Male AT21 Carinthia 18
#> 5 Male AT22 Styria 18
#> 6 Male AT31 Upper Austria 20
#> 7 Male AT32 Salzburg 19
#> 8 Male AT33 Tyrol 19
#> 9 Male AT34 Vorarlberg 20
#> 10 Female AT11 Burgenland 15
#> 11 Female AT12 Lower Austria 14
#> 12 Female AT13 Vienna 17
#> 13 Female AT21 Carinthia 14
#> 14 Female AT22 Styria 15
#> 15 Female AT31 Upper Austria 15
#> 16 Female AT32 Salzburg 15
#> 17 Female AT33 Tyrol 15
#> 18 Female AT34 Vorarlberg 15 What do you think @bernhard-da ? Another issue with aggregation are hierarchical fields which include partial sums. In some cases, this can be detected in |
this affects two datasets, which are both problematic with the current import logic. - `OGDEXT_BINNENWAND_1` has no variabe codes in the first line of `OGDEXT_BINNENWAND_1.csv` - `OGDEXT_VORNAMEN_1` contains fields of type <chr> like `F-VORNAME`
add column `label_en` to - `$meta$database` - `$meta$measures` - `$meta$fields` - `$field(i)`
this option affects - the printing method - the labeling of $data TODO: language can be switched with {object}$language <- {new_language}, however, `$data` is not updated in this case
restructure sc_table to make it compatible with the base class sc_data. In the new version, the main part of tabulate() is executed in the base class which provides more flexibility. - mixtures of fields with totals and without totals are allowed - parameters can be specified as codes or as labels - parameter raw can be used to return codes instead of labels The sc_table class now also inherits total_codes() and a more flexible implementation of field() $data and $meta are now parsed eagerly which means that the slots are calculated at construction time rather than the first access there was a regression regarding the annotations parameter. Annotations are stored as attributes in $data_raw and dropped during aggregation. They currently don't have priority and will be re-implemented at some point
if the tibble package is attached, the data will be printed with this class. These changes do not cause a formal dependency to tibble because they only reroute S3 dispatch if the `print.tbl` generic is in the current search path.
register s3 methods of pillar is available with registerS3method(). ALternatively, vctrs::s3_register() could be used Also, export the print method for sc_tibble_meta() and avoid warnings in devtools::check() due to param inconsitency because of missing ellipsis use \uxxxx escape to print the ellipsis in the footer of print.sc_tibble_meta
since this class is not exported and there are no factory methods, use the classname as the name of the R6 class generator object the field $data_raw was renamed to $data and the prevois $data is now only available via $tabulate()
* use R6/roxygen2 to document the whole class * move request time to $meta$source * rename $raw -> $json * advanced printing
* R6/roxygen2 documentation * link ?sc_table_class from ?sc_table
od_tabulate() now only matches for labels in the current language. If the language of a table is set to "en", german labels cannot be used. Update examples accordingly
Breaking changes
|
A new version of the pkgdown site is available at https://statistikat.github.io/STATcubeR/dev/ This now includes a roxygen2 documentation of the three main R6 classes The class documentation is supplemental to the documentation of the constructor methods The index sites were updated
Updates for all pkgdown related source files will be added in a separate branch (#13) because I cannot test the REST API documentation in my current development environment and need to transfer it to another server. The pkgdown manual is still ahead of the VCS in some regard. For example: |
when developing the R6 documentation, it was shortly tested how the man pages would look if the od_table class was directly exported as od_Table. These man entries mistakenly use this invalid syntax
not related to od_table, but the old version caused errors because of positional arguments in R/table_custom.R#L27
the annotation parameter is not working properly with the introduction of sg_data which allows tabulate() to operate via sums or via total codes. Use if(FALSE) to skip this example for now in the long run, it will have to be decided how annotations should be aggregated in $tabulate()
previsouly, this function expected sc_table objects and now it operates with the base class. This required some rerouting of the different implementations and merging of certain man-pages the od_tabulate() function was removed from the NAMESPACE because it would just be an alias for sc_tabulate() at this point the fact that the annotations param is broken is now part of the class documentation of sc_table_class
work-in-progress: add od-functionality