Adding data guide. #268

seabbs · 2021-04-13T18:39:05Z

The adding data guide needs to be expanded. Current version is here: https://github.com/epiforecasts/covidregionaldata/blob/master/.github/ADDING-DATA.md

It should include:

How to download data
How to clean data when there is only one level
How to clean data with multiple levels
How to incrementally implement the prototype class
How to test the prototype class
A style guide on documentation and method use (i.e only use self in top level method calls where possible).

The text was updated successfully, but these errors were encountered:

RichardMN · 2021-04-14T09:49:41Z

I'll put my hand up for some of this. I think I can do:

How to download data
How to clean data when there is only one level
How to clean data with multiple levels
How to incrementally implement the prototype class

Thinking about this, it almost feels something which might draft better with a wiki approach rather than concurrent PRs on a branch.

One suggestion which I think I would make would be to work through a flowchart/decision tree of the data available and then model the new code not on R/Country Template.R but on one of the existing classes. (This is essentially what I did for most of them.)

Q1: Do you have data only for level 1?
- Yes
  - Model based on Cuba or Canada or SouthAfrica or Italy or India [or ...]
- No
  - Q2: Is your data available is only for level 2 but then aggregated to get level 1 data?
    - Model on Lithuania or Brazil or Germany
  - Q3: Is your data available as separate sources for levels 1 and 2?
    - Model on USA, look at France
  - Q4: Is some data available only at certain levels?
    - Model on Belgium (see also France)
  - Q5: Are you doing something different from all the others?
    - Look at UK, but this probably isn't what you want to do.

This approach is more likely to give people a framework with the parts they need already in than providing a very generic framework which has many methods in place which may not be necessary for most use cases.

It may be useful to fill out the taxonomy above for general reference too.

What I don't think I can do apart from asking more questions is advise on testing or on debugging (two different things). Debugging the R6 code has been a learning experience, I'm not sure I'm doing it the best way possible but perhaps someone else may have some tips we could share.

seabbs · 2021-04-14T09:55:31Z

That would be amazing Richard.

I really like the flow chart idea that seems like a much nicer way of handling it.

We could use GitHub wikis for both this and contributing guide (as suggested by @kathsherratt)? That might make changing them easier and more natural.

Totally agree on the testing/debugging issues. I also don't think I have found the optimal work flow. I think perhaps if we all chip on with suggestions we can agree some good working practices?

seabbs · 2021-04-14T10:47:18Z

My personal workflow has been to implement each method in turn, load the class and then use n <- Class$new(); debug(n$download) to drop into debug in the method I am working on.

RichardMN · 2021-04-14T15:02:13Z

Good - I am starting with a wiki formulation in https://github.com/epiforecasts/covidregionaldata/wiki/Adding-Data-(wiki-draft)

I've drawn in from @seabbs blob and will also pull in some of the text I'd written for a vignette on this topic for 0.8.3

Currently I'm littering it with headers as a way of outlining - please feel free to also edit on this text, including adding in comments/questions/suggestions. I would propose to draft on the wiki for a few days and then transfer into a PR (I think it's useful to have this documentation in the package) but we may choose to leave it on the wiki.

seabbs · 2021-04-15T12:07:04Z

Just skimmed through where this is at. Looks really great currently. Really like the Q and A design. I think I am quite in favour in leaving it in the wiki (so that is a 100% course change from me) but very open to alternative views on that.

RichardMN · 2021-04-15T16:16:04Z

Thanks for the kind comments.

Can someone who is familiar with the ECDC / WHO / JRC code sketch out or write a version of what could be said about doing this sort of a system? I've looked a bit at these Classes but really am not familiar and don't know what counts as a good example and what counts as an edge case which is functional but to be avoided.

I think I know what I want to write more in the cleaning single levels and cleaning multiple levels, I think I've got a sense of what needs to go in the incremental and testing sections.

I have not written about only using self in top-level functions - I think I would include the clean_level_? functions in that category. At least some of my code uses self$clean to pass intermediate forms from one of these to the other. I don't know what arguments there are for passing things on the stack as opposed to them being referred to in the object they're on.

One other thing I might suggest is that if people know of open data sources which we don't yet include, we could start a "wish list" of these which could be added. It could be faster than me going looking for new data sources, but would have to come with the clear statement that listing something doesn't mean it's automatically, or even ever, going to be added. (There may be complications or limitations which aren't apparent, or which the nominator doesn't consider important.)

seabbs · 2021-04-19T13:06:12Z

@joseph-palmer as you are handling adding the JHU and google data sketching out some docs for these kind of classes might be a good intro?

Maybe we should leave documenting the self usage for a bit as I agree I am not totally clear what the best approach is.

I thought we might wishlist datasets as a discussion section? Agree we should post to this and definitely stress that it is very much not a given anything will be implemented based on it being listed..

https://github.com/epiforecasts/covidregionaldata/discussions/categories/new-datasets

joseph-palmer · 2021-04-20T12:04:16Z

Iv'e put a page up https://github.com/epiforecasts/covidregionaldata/wiki/Adding-a-national-data-source and will keep going over it to make sure its clear. I will update as I add in JHU and Google data as I'm sure that will raise steps I've missed. Could even do a walk though of creating it which might be nice.

seabbs · 2021-04-22T17:01:49Z

I'll look at this again at more detail today/tomorrow but are we happy with how the package links to these docs at the moment?

RichardMN · 2021-04-22T18:15:59Z

I think it’s fine.

I just looked and I cannot find how the package links to these but I think that’s probably ok. I have to imagine that anyone who wants to add a data set will at least visit the GitHub page and may find their way to the wiki.

i note (and this is something which I guess may go into the pkgdown.yml) a typo in the following line:

Clases that define individual datasets and contain all the methods needed to download, process, and clean them.

classes.

Let’s also put in a sentence there saying that individual data classes may have params which are documented in their data class but can be passed to get_regional_data et al.

joseph-palmer · 2021-04-30T17:47:40Z

This looks like its finished so closing

seabbs added the documentation Improvements or additions to documentation label Apr 13, 2021

This was referenced Apr 13, 2021

use of self inside class methods #201

Closed

Documentation for adding new data: explain interactive implementation #169

Closed

SMG -> vignettes #132

Closed

joseph-palmer closed this as completed Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding data guide. #268

Adding data guide. #268

seabbs commented Apr 13, 2021 •

edited

Loading

RichardMN commented Apr 14, 2021

seabbs commented Apr 14, 2021

seabbs commented Apr 14, 2021

RichardMN commented Apr 14, 2021 •

edited

Loading

seabbs commented Apr 15, 2021

RichardMN commented Apr 15, 2021

seabbs commented Apr 19, 2021

joseph-palmer commented Apr 20, 2021

seabbs commented Apr 22, 2021

RichardMN commented Apr 22, 2021 •

edited

Loading

joseph-palmer commented Apr 30, 2021

Adding data guide. #268

Adding data guide. #268

Comments

seabbs commented Apr 13, 2021 • edited Loading

RichardMN commented Apr 14, 2021

seabbs commented Apr 14, 2021

seabbs commented Apr 14, 2021

RichardMN commented Apr 14, 2021 • edited Loading

seabbs commented Apr 15, 2021

RichardMN commented Apr 15, 2021

seabbs commented Apr 19, 2021

joseph-palmer commented Apr 20, 2021

seabbs commented Apr 22, 2021

RichardMN commented Apr 22, 2021 • edited Loading

joseph-palmer commented Apr 30, 2021

seabbs commented Apr 13, 2021 •

edited

Loading

RichardMN commented Apr 14, 2021 •

edited

Loading

RichardMN commented Apr 22, 2021 •

edited

Loading