Add documentation for Query.jl #1105

davidanthoff · 2016-10-15T05:35:48Z

@nalimilan Here is a first go at a section for the doc that deals with Query that you suggested. My idea was to give a hint of how things work, and then point folks to the Query documentation.

nalimilan

Thanks, that's really useful! I'd rather create a "Querying frameworks" page where both Queries.jl and StructuredQueries.jl will be presented in separate sections. Else, users might not immediately see from the "Query package" title why they should be interested in this.

nalimilan · 2016-10-17T07:47:00Z

docs/src/man/query_package.md

@@ -0,0 +1,69 @@
+# Query package
+
+The [Query.jl](https://github.com/davidanthoff/Query.jl) package provides advanced data manipulation capabilities for `DataFrames` (and many other data structures). This section provides a short introduction to the package, the [Query.jl](https://github.com/davidanthoff/Query.jl) [documentation](http://www.david-anthoff.com/Query.jl/stable/) has a more comprehensive documentation of the package.


I would make this a single link to the docs, as there's already a link to the package project above (same remark for the last paragraph of the page).

nalimilan · 2016-10-17T07:47:41Z

docs/src/man/query_package.md

+Pkg.add("Query")
+```
+
+A query is started with the `@from` macro and consists of a series of query commands. Query provides commands that can filter, project, join, group, flatten and group data from a `DataFrame`. A query returns either an iterator of the transformed data, or one can materialize the results of a query into a variety of data structures, including a new `DataFrame`.


Query -> Query.jl? Else it's going to be quite confusing with the term "query".

Syntax is a bit weird in the second part of the sentence: it doesn't match well with "returns either".

nalimilan · 2016-10-17T07:49:16Z

docs/src/man/query_package.md

+```julia
+using DataFrames, Query
+
+df = DataFrame(name=["John", "Sally", "Roger"], age=[54., 34., 79.], children=[0, 2, 4])


Why not integer ages?

Because age is a continuous variable. Doesn't really matter, right?

No, not really.

nalimilan · 2016-10-17T07:50:16Z

docs/src/man/query_package.md

+
+df = DataFrame(name=["John", "Sally", "Roger"], age=[54., 34., 79.], children=[0, 2, 4])
+
+x = @from i in df begin


Do we have a way of printing the results too? I think a doctest should do it. That would make it much easier to follow, without running the code in parallel.

I'll add the output for now. Doctests would be great, but the dependency management for doctests is not great at this point, i.e. you would have to manually make sure Query.jl is installed. Maybe easier to wait until Documenter.jl has a better story for managing dependencies and then converting things to doctests?

You mean, it will need to be installed in order to build the docs? Doesn't sound too much of a requirement. I'm not sure how we can ensure building the online docs work, though, but it would probably be possible.

@MichaelHatherly?

Add it to the travis script in the same place as Documenter is installed?

We'll get some kind of docs/REQUIRE eventually, but it'll need to be in Base to be of much use to anyone. I'll get around to implementing it at some point...

nalimilan · 2016-10-17T07:51:22Z

docs/src/man/query_package.md

+end
+```
+
+The `@where` command in this query will filter the source data by applying the filter condition `i.age > 40`. This filters out any rows in which the `age` column is not larger than 40. The `@select` command then projects the columns of the source data onto a new column structure. The example here applies three specific modifications: 1) it only keeps a subset of the columns in the source `DataFrame`, i.e. the `age` column will not be part of the transformed data; 2) it changes the order of the two columns that are selected; and 3) it renames one of the columns that is selected from `children` to `number_of_children`. The `@collect` statement determines the data structure that the query returns. In this example the results are returned as a `DataFrame`.


Maybe explain what i is? Same for {}.

nalimilan · 2016-10-17T07:52:46Z

docs/src/man/query_package.md

+A query without a `@collect` statement returns a standard julia iterator that can be used with any normal julia language construct that can deal with iterators. Here are some examples:
+
+```julia
+using DataFrames, Query


I wouldn't repeat this line, nor the one below (same for next example).

nalimilan · 2016-10-17T07:54:47Z

docs/src/man/query_package.md

+end
+```
+
+A query that ends with a `@collect` statement without a specific type will materialize the query results into an array. Note also the difference in the `@select` statement: The previous queries all used the `{}` syntax in the `@select` statement to return results that had a column structure. The last query instead just selects a single value from each row in the `@select` statement.


I first understood "to return results that had a column structure" to mean "return results whose structure was columnar" instead of "return results in a columnar form". Can you make this more explicit? Maybe "return results in a tabular form" or something like that?

davidanthoff · 2016-10-17T17:28:34Z

@nalimilan Thanks for the great feedback! I think I addressed all your points.

When I run make.jl locally, the docs don't seem to pick up the stylesheet, so I have a hard time checking the final look here. I don't understand why that is happening because I use pretty much the same setup for the Query.jl documentation, and there everything works great locally...

nalimilan · 2016-10-17T19:32:17Z

docs/src/man/querying_frameworks.md

+│ 2   │ 4                  │ Roger   │
+```
+
+The query starts with the ``@from`` macro. The first argument ``i`` is the name of the range variable that will be used to refer to an individual row in later query commands. The next argument ``df`` is the data source that one wants to query. The `@where` command in this query will filter the source data by applying the filter condition `i.age > 40`. This filters out any rows in which the `age` column is not larger than 40. The `@select` command then projects the columns of the source data onto a new column structure. The example here applies three specific modifications: 1) it only keeps a subset of the columns in the source `DataFrame`, i.e. the `age` column will not be part of the transformed data; 2) it changes the order of the two columns that are selected; and 3) it renames one of the columns that is selected from `children` to `number_of_children`. The example query uses the ``{}`` syntax to achieve this. A ``{}`` in a Query.jl expression instantiates a new [NamedTuple](https://github.com/blackrock/NamedTuples.jl), i.e. it is a shortcut for writing ``@NT(number_of_children=>i.children, name=>i.name)``. The `@collect` statement determines the data structure that the query returns. In this example the results are returned as a `DataFrame`.


Should use single backquotes everywhere, right?

IIRC single backticks are for code and double are for LaTeX math.

Yup, single backticks for these ones.

Argh, sorry, I was still in github markdown mode. Fixed with the next push.

nalimilan · 2016-10-17T19:35:01Z

When I run make.jl locally, the docs don't seem to pick up the stylesheet, so I have a hard time checking the final look here. I don't understand why that is happening because I use pretty much the same setup for the Query.jl documentation, and there everything works great locally...

Sorry, no idea. I'm not familiar yet with Documenter.jl

Not sure if that's expected, but on Julia 0.6, I get this error if I don't load NamedTuples in addition to Query:

julia> q1 = @from i in df begin
                   @where i.age > 40
                   @select {number_of_children=i.children, i.name}
                   @collect DataFrame
              end

ERROR: UndefVarError: @NT not defined

MichaelHatherly · 2016-10-17T19:42:42Z

When I run make.jl locally, the docs don't seem to pick up the stylesheet, so I have a hard time checking the final look here.

I'll have a look, not come across missing stylesheet problems before.

MichaelHatherly · 2016-10-17T19:50:52Z

docs/src/man/querying_frameworks.md

+     @where i.age > 40
+     @select {number_of_children=i.children, i.name}
+end
+````


One too many backticks here.

MichaelHatherly · 2016-10-17T19:51:11Z

docs/src/man/querying_frameworks.md

+
+Or one can use a comprehension to extract the name of a subset of rows:
+
+````julia


One too many backticks.

MichaelHatherly · 2016-10-17T19:54:42Z

@davidanthoff the stylesheets are loading alright for me on a local build of this branch. What are you versions of Documenter and Julia? Which web browser are you viewing the docs in?

davidanthoff · 2016-10-17T20:50:08Z

I get this error if I don't load NamedTuples in addition to Query

I need to tag a new version, things should work on Query.jl master.

davidanthoff · 2016-10-17T20:56:21Z

What are you versions of Documenter and Julia? Which web browser are you viewing the docs in?

julia is 0.5.0 on Windows, Documenter is 0.6.0. I narrowed the problem down: The link to "Querying frameworks" in the navigation area on the left is file:///C:/Users/anthoff/.julia/v0.5/DataFrames/docs/build/man//querying_frameworks.html. Note that there are two / between man and querying_frameworks.html. When I click that it shows the content of that page, but the stylesheets are messed up. If I then manually remove the second /, everything looks correct.

So I think all that needs to be done is make sure that there are no two / in a row in the URLs.

nalimilan · 2016-10-18T07:24:16Z

Can you change examples into doctests then?

davidanthoff · 2016-10-18T16:44:50Z

Can you change examples into doctests then?

Done.

nalimilan · 2016-10-18T18:33:57Z

docs/src/man/querying_frameworks.md

+     @collect DataFrame
+end
+
+println(q1)


Am I right that you shouldn't need println?

Yep, removed in the next push.

nalimilan · 2016-10-18T18:35:49Z

docs/src/man/querying_frameworks.md

+
+```@meta
+DocTestSetup = quote
+    using DataFrames, Query


There's no way to avoid repeating the code block from above? There's always the risk of them not being in sync.

I don't think so. Each jldoctest block is run independently and no state is preserved between these code blocks. I guess the only alternative would be to repeat the query itself in the jldoctest, and only have the using and DataFrame construction code in the @meta section.

nalimilan · 2016-10-18T18:36:22Z

docs/src/man/querying_frameworks.md

+│ 2   │ 4                  │ Roger │
+```
+
+```@meta


Why is this necessary (and below too)?

@meta blocks are like global state, i.e. they are run before every jldoctest block that follows the @meta block. I think "clearing" that out after it has been used is safer, otherwise if someone adds another jldoctest later in the documentation, it might accidentally pick up this init code from a previous jldoctest block.

Really it would be nice if one could specify an init code block for a given jldoctest that is only run before that code block. But I don't think that can be done right now...

Essentially I'm just following the guidance at the bottom of the page here.

OK.

@MichaelHatherly Do you think it would be possible to have a way of specifying that a series of doctests goes together and should share state? Looks like it could also be useful to allow setting a DocTestSetup block to be used only by the following doctests, but not the others.

Yes, we probably could do. There's already precedence for having "named" groups with the @repl and @example blocks, so we could probably do the same thing with doctests.

FWIW, named @example blocks might be a better fit for this section of docs anyway.

Ah, that's exactly what I have in mind. @davidagold Do you feel like changing the examples to use the same @example name?

nalimilan · 2016-10-18T19:57:35Z

There are conflicts with master.

ararslan · 2016-10-18T21:22:47Z

I probably missed a conversation somewhere about it, but why are we running Query doctests from DataFrames?

davidanthoff · 2016-10-18T23:17:49Z

@ararslan @nalimilan suggested that we add a section about query frameworks to the documentation, and also that I convert the examples into doc tests.

nalimilan · 2016-10-19T07:20:04Z

The idea is that if we don't ensure examples work and give the expected result, they will stop working at some point (witness the recent issue with constructor examples).

ararslan

Okay, LGTM then.

nalimilan · 2016-11-01T10:32:38Z

@davidanthoff Can you just make the move to @example so that we can merge this? Thanks!

davidanthoff · 2016-11-08T22:55:58Z

I just tried to implement the @example thing, but at least on my system it doesn't splice the results from the last line of such a block into the output. I'm on Windows trying this with Documenter v0.7.1.

Maybe we should merge this PR at this point, given that it works, and then we can still change this at a later point?

nalimilan · 2016-11-09T08:56:09Z

I just tried to implement the @example thing, but at least on my system it doesn't splice the results from the last line of such a block into the output. I'm on Windows trying this with Documenter v0.7.1.

@MichaelHatherly Any suggestions?

Maybe we should merge this PR at this point, given that it works, and then we can still change this at a later point?

We can certainly do that if @example doesn't work, but I'd rather do it before merging if that's not too hard.

MichaelHatherly · 2016-11-09T16:40:13Z

With the following diff everything displays as it should on Linux/Windows Julia 0.5: https://gist.github.com/MichaelHatherly/48f57ea5a60e0334e1cfc77feb68b4f4.

davidanthoff · 2016-11-09T18:30:59Z

Thanks, now it seems to work, I must have missed something yesterday.

The DataFrame display formatting is pretty messed up. I think there is actually a bug both in the DataFrame show method and in Documenter here.

Also is there some way to have some text between the code example block and the output? Right now there is no visual indication that one is code and the other one is output, which looks strange to me.

ararslan · 2016-11-09T18:47:43Z

Just as an aside, it's best to avoid using @ in commit messages, as it always pings the GitHub user with that name. Common replacements would be "example macro" or "at-example."

nalimilan · 2016-11-10T10:11:24Z

Is there no way to check the output of @example like with doctests?

davidanthoff · 2016-11-10T22:30:49Z

I don't think so.

nalimilan · 2016-11-10T22:38:09Z

@MichaelHatherly?

MichaelHatherly · 2016-11-12T07:15:09Z

(Sorry, been busy with non-Julia things.) There's no way to check @example output other than using a jldoctest block instead — I've not come up with any kind of satisfactory middle ground between the two.

The DataFrame display formatting is pretty messed up. I think there is actually a bug both in the DataFrame show method and in Documenter here.

Is this the character width for the unicode chars that form the table frame? If so there's not much that can be done in Documenter I don't think aside from changing the font again to something that properly supports those characters.

Also is there some way to have some text between the code example block and the output? Right now there is no visual indication that one is code and the other one is output, which looks strange to me.

Probably can do. Proposals/PRs for style changes to the output would definitely be welcomed.

nalimilan · 2016-11-12T14:16:26Z

(Sorry, been busy with non-Julia things.) There's no way to check @example output other than using a jldoctest block instead — I've not come up with any kind of satisfactory middle ground between the two.

OK, though that's kind of unfortunate. Any plans to change this in the future? I don't like encouraging people to write untested examples...

nalimilan · 2016-11-12T14:16:57Z

Thanks @davidanthoff!

davidanthoff · 2016-11-13T00:34:54Z

Hm, when I look at this here now it doesn't show any of the example output...

MichaelHatherly · 2016-11-13T07:23:07Z

I don't like encouraging people to write untested examples...

@example will still print a warning when any errors are encountered, which can be turned into an error by calling makedocs with the strict = true keyword. Not quite as thorough as a unit test of course, kind of more like a Jupyter notebook cell really.

Hm, when I look at this here now it doesn't show any of the example output...

Package dependencies need fixing, DataArrays isn't being installed: https://travis-ci.org/JuliaStats/DataFrames.jl/jobs/175315815#L277

nalimilan · 2016-11-13T11:31:44Z

Why not make the strict=true behavior the default? I don't see the use case for building docs with errors in them.

MichaelHatherly · 2016-11-14T16:25:53Z

Why not make the strict=true behavior the default?

That would be quite a bad breaking change. strict=true will error out when anything is slightly wrong, such as not including all docstrings in the generated docs — for example, building DataFrames docs lists quite a large number that haven't been included in @docs blocks.

I don't see the use case for building docs with errors in them.

I agree with that, but it does need a gradual deprecation period to not cause too much trouble for all the packages that are currently ignoring those warnings. Since Documenter is currently going through the ssh-key deprecation I'd also prefer to only do one major deprecation at a time.

davidanthoff · 2016-11-14T17:17:41Z

For what its worth, I think the doctests should actually run as part of the normal test suite, i.e. one should be able to add a line like Documenter.test("DataFrames") to the runtest.jl file in DataFrames, so that the doctests actually run if someone does Pkg.test("DataFrames"). See JuliaDocs/Documenter.jl#198.

davidanthoff · 2016-11-14T17:19:28Z

For the failure itself: Query currently targets the tagged versions of DataFrames. So I guess this PR here really should go into a release branch for the 0.8.x series of DataFrames... Is there some way to publish new doc versions as stable before the current DataFrames master gets tagged?

nalimilan · 2016-11-14T17:22:07Z

Just make a PR against release-0.8, and I'll add a tag so that it appears in the online docs.

Add documentation for Query.jl

7c676a2

nalimilan reviewed Oct 17, 2016

View reviewed changes

Update querying framework docs

fb02f34

nalimilan reviewed Oct 17, 2016

View reviewed changes

MichaelHatherly reviewed Oct 17, 2016

View reviewed changes

Fix backtricks in querying framework documentation

c27b1e2

MichaelHatherly mentioned this pull request Oct 18, 2016

Fix hrefs on Windows JuliaDocs/Documenter.jl#330

Merged

Add doctests to Query.jl documentation

8e9e5ea

nalimilan reviewed Oct 18, 2016

View reviewed changes

davidanthoff added 2 commits October 18, 2016 12:48

Remove unnecessary code from query doctest

ecea150

Merge branch 'master' into query-doc

e7dee76

Merge branch 'master' into query-doc

f0c8415

nalimilan approved these changes Oct 18, 2016

View reviewed changes

ararslan approved these changes Oct 19, 2016

View reviewed changes

Use @example block for Query examples

647b57e

nalimilan merged commit 4c47ed3 into JuliaData:master Nov 12, 2016

davidanthoff deleted the query-doc branch November 13, 2016 00:33

davidanthoff mentioned this pull request Nov 14, 2016

Query docs backport #1126

Merged

nalimilan mentioned this pull request Jan 25, 2017

Add Query.jl section to the manual #1083

Closed

nalimilan mentioned this pull request Feb 21, 2017

Add DataTables support queryverse/Query.jl#86

Closed

nalimilan pushed a commit that referenced this pull request Jul 8, 2017

Add documentation for Query.jl (#1105)

a5fd323

nalimilan pushed a commit that referenced this pull request Jul 8, 2017

Add documentation for Query.jl (#1105)

d20250d

rofinn pushed a commit that referenced this pull request Aug 17, 2017

Add documentation for Query.jl (#1105)

79e85cc

nalimilan pushed a commit that referenced this pull request Aug 25, 2017

Add documentation for Query.jl (#1105)

f68f72b

quinnj pushed a commit that referenced this pull request Sep 2, 2017

Add documentation for Query.jl (#1105)

5e8d761

		@@ -0,0 +1,69 @@
		# Query package

		The [Query.jl](https://github.com/davidanthoff/Query.jl) package provides advanced data manipulation capabilities for `DataFrames` (and many other data structures). This section provides a short introduction to the package, the [Query.jl](https://github.com/davidanthoff/Query.jl) [documentation](http://www.david-anthoff.com/Query.jl/stable/) has a more comprehensive documentation of the package.


		df = DataFrame(name=["John", "Sally", "Roger"], age=[54., 34., 79.], children=[0, 2, 4])

		x = @from i in df begin


		Or one can use a comprehension to extract the name of a subset of rows:

		````julia

Add documentation for Query.jl #1105

Add documentation for Query.jl #1105

Conversation

davidanthoff commented Oct 15, 2016

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidanthoff commented Oct 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Oct 17, 2016

MichaelHatherly commented Oct 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelHatherly commented Oct 17, 2016

davidanthoff commented Oct 17, 2016

davidanthoff commented Oct 17, 2016

nalimilan commented Oct 18, 2016

davidanthoff commented Oct 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Oct 18, 2016

ararslan commented Oct 18, 2016

davidanthoff commented Oct 18, 2016

nalimilan commented Oct 19, 2016

ararslan left a comment

Choose a reason for hiding this comment

nalimilan commented Nov 1, 2016

davidanthoff commented Nov 8, 2016

nalimilan commented Nov 9, 2016

MichaelHatherly commented Nov 9, 2016

davidanthoff commented Nov 9, 2016

ararslan commented Nov 9, 2016

nalimilan commented Nov 10, 2016

davidanthoff commented Nov 10, 2016

nalimilan commented Nov 10, 2016

MichaelHatherly commented Nov 12, 2016

nalimilan commented Nov 12, 2016

nalimilan commented Nov 12, 2016

davidanthoff commented Nov 13, 2016

MichaelHatherly commented Nov 13, 2016

nalimilan commented Nov 13, 2016

MichaelHatherly commented Nov 14, 2016

davidanthoff commented Nov 14, 2016

davidanthoff commented Nov 14, 2016

nalimilan commented Nov 14, 2016