Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataTables support #86

Closed
nalimilan opened this issue Feb 19, 2017 · 14 comments
Closed

Add DataTables support #86

nalimilan opened this issue Feb 19, 2017 · 14 comments
Milestone

Comments

@nalimilan
Copy link

As you may know, the master branch of DataFrames has been split out to a new DataTables package to easy the transition. Currently, building the docs for the new package fails because only DataFrame is supported (see this log). Do you think you could add support for the new package? It shouldn't be hard since it's exactly the same as the other one, which was supported.

@davidanthoff
Copy link
Member

Yes, I'll definitely tackle this eventually. I don't have much time this semester, though, so not sure when exactly. One big question in terms of timing is when julia 0.6 will get out. I'm almost tempted to wait with the next Query release for julia 0.6, and then reuse the whole . stuff for lifting...

@davidanthoff davidanthoff added this to the Backlog milestone Feb 20, 2017
@nalimilan
Copy link
Author

As a quick solution, wouldn't copying the current DataFrames code work?

@davidanthoff
Copy link
Member

No, the current DataFrames code is based on the DataArrays version of DataFrames.

#62 has some old work to make things work with the NullableArrays version of DataFrames, but I'm not sure it is working right now and it certainly won't integrate easily on master because of changes on master since. It might be feasible to do a patch release based on #62 that ignores the recent stuff on master, but that would also be a fair bit of extra work, and I'm hesitant to use the little time I have right now for that...

@nalimilan
Copy link
Author

Hmm. I'm confused then. How did the Documenter build on the Nullable-based DataFrames pass a few months ago? Cf. this file, which is included in a manual about the Nullable-based DataFrames. When the manual chapter about Query.jl was added, DataFrames master was based on Nullable (JuliaData/DataFrames.jl#1105).

@davidanthoff
Copy link
Member

Hmm, now that is an excellent question :) I think a previous version of Query worked with DataFrames no matter whether one used NullableArray or DataArray, and maybe that was still the latest when we build these docs. Might even be that the last tagged version works with both. But that is no longer the case on master.

@nalimilan
Copy link
Author

OK. So looks like we have two choices: copy the DataFrames code from the current tagged version to adapt it to DataArrays, and tag a new version based on this; or temporarily remove the section of the manual about Query.jl.

@davidanthoff
Copy link
Member

I have a little time now, I'll see what I can do.

@davidanthoff davidanthoff modified the milestones: v0.3.1, Backlog Feb 21, 2017
@davidanthoff
Copy link
Member

Alright, #88 has support for DataTables. It is bare bone, i.e. no documentation, no tests and no examples, but that is all I have time for right now.

My plan would be to release a v0.3.1 with these changes, and then later integrate all of that stuff back into master, with tests, doc etc.

@nalimilan Could you try #88 and let me know whether that works for you? Once I get an ok from you, I'll merge and tag a release.

@davidanthoff
Copy link
Member

Oh, and I should add, it does support DataTable both as a source and a sink, i.e. you can query DataTables, and also do @collect DataTable. And it of course also mixes nicely with DataFrames, i.e. you can query a DataTable, collect in a DataFrame or the other way around.

@davidanthoff
Copy link
Member

@nalimilan Actually, if you could just try the release-0.3 branch, I merged things already into that one.

@nalimilan
Copy link
Author

Great! I've just tested it on Travis, and it seems to work (though I couldn't see the result, only that it succeeded).

FWIW, we are trying to move abstractions to AbstractTables, so hopefully at some point it won't be necessary to keep specific support for both packages.

@davidanthoff
Copy link
Member

Excellent, thanks! Should be tagged soon, see JuliaLang/METADATA.jl#8070.

I see two AbstractTables on github, this and this, but both seem not very active. Is there some other place where work on AbstractTables occurs? Query already in some way defines an interface for tabular data (iterators of named tuples), so at least from Query's point of view I'm not sure we would need another layer of abstraction. But I might misunderstand the role of AbstractTables, so a pointer to the latest thinking would be great.

@nalimilan
Copy link
Author

Development has not been active over the last few months, but we hope to move some useful abstractions there. It would live under JuliaData. Ideally it would provide convenience interfaces to work either with columns or with named tuples. The latter isn't really fleshed out yet, but it could be similar to what Query does. The advantage would be that new tabular data types would only need to implement a few methods, and they would work will all packages, be it modeling packages or data management packages like Query.

@davidanthoff
Copy link
Member

Yeah, we should coordinate. Query implicitly defines such an interface already, and with julia 0.6 I'll be able to use Holy traits for the whole dispatch story on that front. Essentially there will be a trait IsIterableTable that one can dispatch on, and then there is a well defined interface for consuming that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants