Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consistency with gtfstio #48

Closed
mpadge opened this issue Jan 31, 2022 · 9 comments
Closed

Improve consistency with gtfstio #48

mpadge opened this issue Jan 31, 2022 · 9 comments

Comments

@mpadge
Copy link
Contributor

mpadge commented Jan 31, 2022

@dhersz I find the following behaviour somewhat confusing:

f <- "/<a>/<gtfs>/<feed>.zip"
gtfs0 <- gtfsio::import_gtfs (f)
f <- gtfstools::frequencies_to_stop_times (gtfs0)
#> Error in gtfstools::frequencies_to_stop_times(gtfs0): Assertion on 'gtfs' failed: Must inherit from class 'dt_gtfs', but has classes 'gtfs','list'.

gtfs1 <- gtfstools::read_gtfs (f)
f <- gtfstools::frequencies_to_stop_times (gtfs1)
#> Error: The following columns in the GTFS object 'frequencies' element do not inherit from the required classes:
#>   - 'headway_secs': requires integer, but inherits from character

Created on 2022-01-31 by the reprex package (v2.0.1.9000)

I would expect the first to be handled internally, and not to error like that. I suspect the best - and definitely least confusing - way to achieve that would be to entirely ditch the gtftools::read_gtfs function, so all packages can always rely on gtfsio::import_gtfs(). You could then just have one additional wrapper function at the start of all gtfstools functions which calls the additional new_gtfs and convert_from_standard functions. (If that slowed things down, you easily just memoise the calls so conversion would be instantaneous.) Having read_gtfs here is really quite confusing for me, and I considered myself "internal" to this whole ecosystem - that must mean we should presume it is even more confusing for any general users.

The second example appears to be a small 🐛 here. Sorry for not submitting exactly reproducible code, but it's hard these days with transit.feeds in private s3 buckets. That's the feed referred to in r-transit/gtfsio#25 - an older version of the Madrid metro system.

@dhersz
Copy link
Member

dhersz commented Jan 31, 2022

Hi @mpadge, thanks for opening this issue.

I agree that having gtfsio::import_gtfs() and gtfstools::read_gtfs() can be quite confusing, especially if you take into account that each one of our packages has a GTFS reading function that is fundamentally a wrapper to gtfsio's (e.g. gtfs2gps::read_gtfs() and tidytransit::read_gtfs() as well). Converting from the standard in every function could potentially be quite slow, especially in gtfs2gps and tidytransit cases, that convert the time columns to ITime and hms, respectively, but I like the memoise approach. Perhaps we should move this discussion to the gtfsio repo to hear the opinions of others as well.

The second error also seems to be related to gtfsio. It's hard to tell without the GTFS, but I suspect that gtfsio is not finding the correct specification to headway_secs column because it should look for the column type inside the frequencies element of the GTFS standards list, but it's actually looking for the <dir_name>/frequencies element (because the frequencies table is nested inside a directory). When it doesn't find the column type, the function assumes it's an unofficial field and defaults to reading it as character. It should be an easy fix to consider only the last part of a path when looking for the columns specification.

@mpadge
Copy link
Contributor Author

mpadge commented Jan 31, 2022

The slowness will be absolutely no issue if everything is memoised. I use that approach in a heap of my packages, most notably in dodgr which constantly passes and re-modifies really huge rectangular objects. Functions which would take minutes just for equivalent post-processing steps of otherwise standard input become instantaneous once memoised. I would suggest it is definitely the solution here, but note that when i say here i mean here, and not gtfsio. That is not the place to implement the memoisation, because what you want to memoise are the subsequent steps overlaid in each package on top of the basic gtfsio::import_gtfs() routine. That should ultimately be the only routine used by all of our packages.

It's an interesting case in terms of GitHub structure and functionality, because we need some kind of master issue which equally affects all separate repos, but that's not currently possible. Don't know the best solution for where and how to continue discussion in a coordinated way?

As for the bug - I can easily reproduce it locally, so will do so, debug, and hopefully submit a PR. Thanks!

@randl068
Copy link

Sorry if this has been answered, but is there now a work-around for the character as integer problem with the frequencies to travel time function?

@dhersz
Copy link
Member

dhersz commented Feb 15, 2022

Not yet @randl068. If you're coming across this issue it's probably because you also have a "nested GTFS". If that's the case, you could try unnesting it manually and then using gtfstools to read it as usual.

@dhersz
Copy link
Member

dhersz commented Feb 15, 2022

If that's not the cause of the issue here (i.e. you don't have a nested GTFS) then I'll need to know more about the feed you're using and in which context you're trying to using the function to understand what might be the problem.

@randl068
Copy link

Could you explain what it would mean to manually unnest the GTFS?

@dhersz
Copy link
Member

dhersz commented Feb 15, 2022

I'm assuming you have a zip file that contains a directory, inside which there are the GTFS tables. Is that correct?

If so, I'd suggest moving the GTFS tables outside the directory, and then removing the directory altogether.

If you want to do it with R, I suspect the best way would be to unzip the file .zip to a temporary folder, move the tables inside the directory to the top-level temporary folder, delete the now-empty directory and then zipping the contents of the temporary folder.

@randl068
Copy link

Wonderful, thank you!

@dhersz
Copy link
Member

dhersz commented Jun 13, 2022

Closing this issue in favor of r-transit/gtfsio#29.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants