API for a high level version of the datasets #159

cuducos · 2017-11-07T18:28:07Z

What is the problem?

Dealing with the CSV generated by the toolbox is not trivial: before pd.read_csv we need to define a lot of dtype, in Jarbas we spent a bunch of lines of code deserializing data (converting strings to date objects, to integers and floats).

How can this be addressed?

@turicas and I talked today and he suggested that the toolbox could offer an API not only to generate a CSV version of our datasets, but also a high level iterator for them. Something like:

from serenata_toolbox.federal_senate import Reader


for row in Reader('path_to.csv'):
    print(row)

And the output would be an object with proper types (int, Decimal, date etc.).

Who could help with this issue?
@turicas ; )

The text was updated successfully, but these errors were encountered:

turicas · 2017-12-05T20:31:44Z

I'm implementing this on: https://github.com/turicas/serenata-toolbox/tree/feature/dataset-reader

turicas · 2018-06-06T20:57:38Z

All the datasets in Brasil.IO will use the datapackage specification (for more info, see this milestone) and I think it could be the default way to access data in Serenata also (there are libraries to deal with it automatically so we don't need to create converters, just the datapackage spec). What do you think?

cuducos added the enhancement label Nov 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for a high level version of the datasets #159

API for a high level version of the datasets #159

cuducos commented Nov 7, 2017

turicas commented Dec 5, 2017

turicas commented Jun 6, 2018

API for a high level version of the datasets #159

API for a high level version of the datasets #159

Comments

cuducos commented Nov 7, 2017

turicas commented Dec 5, 2017

turicas commented Jun 6, 2018