Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for a high level version of the datasets #159

Open
cuducos opened this issue Nov 7, 2017 · 2 comments
Open

API for a high level version of the datasets #159

cuducos opened this issue Nov 7, 2017 · 2 comments

Comments

@cuducos
Copy link
Collaborator

cuducos commented Nov 7, 2017

What is the problem?

Dealing with the CSV generated by the toolbox is not trivial: before pd.read_csv we need to define a lot of dtype, in Jarbas we spent a bunch of lines of code deserializing data (converting strings to date objects, to integers and floats).

How can this be addressed?

@turicas and I talked today and he suggested that the toolbox could offer an API not only to generate a CSV version of our datasets, but also a high level iterator for them. Something like:

from serenata_toolbox.federal_senate import Reader


for row in Reader('path_to.csv'):
    print(row)

And the output would be an object with proper types (int, Decimal, date etc.).

Who could help with this issue?
@turicas ; )

@turicas
Copy link
Contributor

turicas commented Dec 5, 2017

I'm implementing this on: https://github.com/turicas/serenata-toolbox/tree/feature/dataset-reader

@turicas
Copy link
Contributor

turicas commented Jun 6, 2018

All the datasets in Brasil.IO will use the datapackage specification (for more info, see this milestone) and I think it could be the default way to access data in Serenata also (there are libraries to deal with it automatically so we don't need to create converters, just the datapackage spec). What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants