Support exporting to JSON Table Schema #14386

rgbkrk · 2016-10-10T14:03:41Z

For Jupyter based frontends, we would love to see a common tabular format in JSON that we can render (in addition to or in lieu of the current HTML). This would provide us the flexibility to style and format according to data type, as well as have better hooks for theming of tabular data on frontends. Everyone has an opinion, let's give them flexibility to apply it.

It's important to us to support a common JSON format so that for R, Julia, and other languages also can display their DataFrames with similar formatting and styling out of the box.

The best one I've seen so far, with a great amount of discussion and collaboration, is the JSON Table Schema.

Update: In order to include both data + schema, we're using data resource which has media type application/vnd.dataresource+json.

/cc @captainsafia @ellisonbg @jreback @TomAugspurger

The text was updated successfully, but these errors were encountered:

jreback · 2016-10-10T14:15:15Z

xref #9146, #9166

TomAugspurger · 2016-10-10T14:16:58Z

I'll dig into the schema later, but just to make sure: the basic idea is for pandas to publish multiple outputs (application/html, application/json) wherever we publish just the HTML right now?
More concretely, what changes do we need to make to Series / DataFrames / Indexes to support this? IIRC there isn't a _repr_json_ equivalent of _repr_html_.

rgbkrk · 2016-10-10T14:19:55Z

Interesting - I just noticed they wrote a wrapper for pandas: https://github.com/frictionlessdata/jsontableschema-pandas-py

On the JupyterLab, notebook, and nteract side, we'd have https://github.com/frictionlessdata/jsontableschema-js to lean on.

rgbkrk · 2016-10-10T14:22:19Z

the basic idea is for pandas to publish multiple outputs (application/html, application/json) wherever we publish just the HTML right now?

Yes. The media type (mime type in Jupyter parlance) would be something like application/vnd.table-schema.v1+json.

rgbkrk · 2016-10-10T14:39:59Z

While there's not a repr for arbitrary media types in IPython (we can evolve that as a result of this discussion), there is a way to display raw messages with IPython.display.display:

IPython.display.display({
    'application/json': releases
}, raw=True)

Which shows up in nteract as:

pwalsh · 2016-10-10T15:10:02Z

Hi. I'm one of the authors of JSON Table Schema, and also part of the team working on reference implementations for this and the related family of specs. The JavaScript implementation is just a little behind the Python one, and probably also of relevance here.

Happy to help.

edit: added link to the JavaScript implementation, in addition to the Python one previously linked.

rgbkrk · 2016-10-10T15:11:35Z

By the way, on the nteract and jupyterlab side, it's pretty easy for us to iterate with new renderers and media types.

TomAugspurger · 2016-10-10T21:07:59Z

I don't really see a reason not to add this in pandas; The additional code shouldn't be too much of a burden.

Would clients expect to receive the entire DataFrame, and do their own truncation? I worry a bit about the overhead of publishing huge DataFrames. I would say follow the options in pd.options.display.max_rows, etc. and only ship over some of the DataFrame (but need some way of saying that there's more...)

A few things directly related to the spec that pandas might have trouble with:

field descriptors: in principal _metadata should carry this, but IIRC we don't have a good story on propagating that though operations, so it's liable to be dropped
field types: shouldn't have any problems here
primary key: Typically this would be the (multi)Index, but we don't require uniqueness on that.
field names: Somewhat rare, but we can have MultiIndexes in the columns, so we could have "multiple rows" of field names; These can be collapsed down to tuples.

pwalsh · 2016-10-11T07:15:52Z

We are very happy to make any changes needed to https://github.com/frictionlessdata/jsontableschema-pandas-py in order to support this smoothly, and especially in reference to things like streaming data out of a DataFrame, or limiting the rows from a frame for preview, and so forth.

TomAugspurger · 2016-12-12T22:56:27Z

Started on this here: master...TomAugspurger:json-schema very early, one test, no docs :D

Some design things I'd like to nail down before submitting a PR:

The actual message published to the jupyter channel will be

{'schema': schema, 'data': data}

where schema is a valid JSON table schema and data is like pd.DataFrame.to_json(orient='records')

{
  "data": "[{\"a\":3,\"b\":3},{\"a\":0,\"b\":2},{\"a\":1,\"b\":1},{\"a\":3,\"b\":0},{\"a\":3,\"b\":1}]",
  "schema": {
    "fields": [
      {
        "type": "integer",
        "name": "a"
      },
      {
        "type": "integer",
        "name": "b"
      }
    ]
  }
}

Does that sound right?

Truncation: I think we'll follow pd.options.display.max_rows and only send that many rows; Will need to think about if people have set their display.large_repr to be info...
Name: I've called it _repr_json_ for now, thoughts on what that should be? IIUC this won't be special like _repr_html_ is and called automatically. We'll have to publish this ourselves, and we can choose the name?
@jreback all this stuff I'm doing here, do we already have a simpler way of going from type to a "base" type. I don't want to have to worry about int16 vs int32, etc.
Speaking of types, pandas doesn't have a string type, so right now we send those over as "any". :( Do we want to do a bit of inference to maybe send those as strings, or leave that to the client? pandas 2 will have a string type, but that'll be a bit.
Indexes: When should we send them?
1. Always
2. When any (or all) of the levels are named

jreback · 2016-12-12T23:37:51Z

@TomAugspurger

don't put this in core/generic.py (the actual table creation), instead pandas.formats.json might be appropriate (but make it clear this is an export only format).

so we already have all of the accessors, you can simply use your translation function.

In [5]: from pandas.types.common import is_integer_dtype, is_timedelta64_dtype, is_string_dtype

In [6]: is_integer_dtype(np.float)
Out[6]: False

In [7]: is_integer_dtype(np.integer)
Out[7]: True

In [8]: is_integer_dtype(np.dtype('m8[ns]'))
Out[8]: False

In [9]: is_timedelta64_dtype(np.dtype('m8[ns]'))
Out[9]: True

In [10]: is_string_dtype(np.dtype('O'))
Out[10]: True

In [11]: is_string_dtype(pandas.types.dtypes.CategoricalDtype())
Out[11]: True

rgbkrk · 2016-12-13T01:33:56Z

Does the data field have to be double encoded? We can handle raw JSON across the jupyter messaging spec.

Name: I've called it repr_json for now, thoughts on what that should be? IIUC this won't be special like repr_html is and called automatically. We'll have to publish this ourselves, and we can choose the name?

_repr_json_ will tell the frontend to render application/json which in nteract and in the soon to be released notebook provides a tree view of a JSON structure:

I'd like to see this table get published with a custom mimetype. To demonstrate, I took the liberty of taking parts of your function, a fake mimetype (not sure what the official is), and creating a little React component (style would get better after):

The mimetype I used is application/vnd.tableschema.v1+json and I published it via IPython.display rather than a repr function since we don't have a precedent for this table type yet.

/cc @minrk @takluyver

pwalsh · 2016-12-13T06:10:48Z

Hi @rgbkrk

Addressing some points above and raised in our Gitter channel

(I'm one of the authors of JSON Table Schema and related specs)

Mime types: See my notes in here. I'm working on this right now (meaning, making the submission for the new mime types today). We'll be submitting application/tableschema+json
jsontableschema-js is npm installable, has feature parity with jsontableschema-py
Just FYI, I'm currently on a sprint to close a range of issues and publish v1 of all our specs before end of year, and IETF RFC submissions follow immediately. There are other aspects there that are relevant here (e.g.: "Tabular Data Resource" specification), but I can go over them with you (if you like) after we release v1

takluyver · 2016-12-13T11:52:20Z

I've called it _repr_json_ for now, thoughts on what that should be? IIUC this won't be special like _repr_html_ is and called automatically.

We do actually look for _repr_json_:

https://github.com/ipython/ipython/blob/5.1.0/IPython/core/formatters.py#L782

minrk · 2016-12-14T14:01:27Z

We currently only support single method name:mime-type mapping. This doesn't extend to custom mime-types, though the protocol allows it. I've been planning to add a _repr_mime_, where the method returns the mime-keyed dict(s), but haven't gotten to it. I thought I opened an issue for it years ago, but maybe only in my brain. I just opened ipython/ipython#10090 for this.

rgbkrk · 2016-12-14T18:47:37Z

I did open a similarly worded issue in ipython/ipython#10058. 😉 Either way, I would love to have the ability to return mime bundles for a repr.

Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup.

Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. DOC: More notes in prose docs Move files use isoformat updates Moved to to_json

Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. DOC: More notes in prose docs Move files use isoformat updates Moved to to_json json_table no config refactor with classes Added duration tests more timedelta Change default orient Series test fixup docs JSON Table -> Table doc Change to table orient added version Handle Categorical Many more tests

jreback · 2017-02-06T20:49:04Z

@TomAugspurger I think this will close #9166 if you make build_table_schema accessible, e.g.

pandas.io.json.table.build_schema , certainly not publicly broadcast, but accessible

Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. DOC: More notes in prose docs Move files use isoformat updates Moved to to_json json_table no config refactor with classes Added duration tests more timedelta Change default orient Series test fixup docs JSON Table -> Table doc Change to table orient added version Handle Categorical Many more tests

Lays the groundwork for #14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. Added publish to dataframe repr

Lays the groundwork for pandas-dev#14386 This handles the schema part of the request there. We'll still need to do the work to publish the data to the frontend, but that can be done as a followup. Added publish to dataframe repr

TomAugspurger · 2017-04-15T16:20:30Z

Closed by #14904

jreback added API Design IO JSON read_json, to_json, json_normalize Compat pandas objects compatability with Numpy or Python functions Needs Discussion Requires discussion from core team before further action labels Oct 10, 2016

rgbkrk mentioned this issue Oct 10, 2016

Media type for JSON Table Schema frictionlessdata/schemas#63

Open

rgbkrk mentioned this issue Oct 19, 2016

Improving Jupyter for Spark jupyter/jupyter#212

Closed

jorisvandenbossche mentioned this issue Nov 2, 2016

Defining a standard way to return DataFrame/Table jupyterlab/jupyterlab#1194

Open

rgbkrk mentioned this issue Dec 13, 2016

[WIP] feat: prototype a table component + mimetype for tabular data nteract/nteract#1280

Closed

TomAugspurger mentioned this issue Dec 17, 2016

ENH: Added to_json_schema #14904

Merged

TomAugspurger closed this as completed Apr 15, 2017

jorisvandenbossche added this to the 0.20.0 milestone Apr 16, 2017

jorisvandenbossche added Enhancement and removed Needs Discussion Requires discussion from core team before further action API Design Compat pandas objects compatability with Numpy or Python functions labels Apr 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support exporting to JSON Table Schema #14386

Support exporting to JSON Table Schema #14386

rgbkrk commented Oct 10, 2016 •

edited

Loading

jreback commented Oct 10, 2016

TomAugspurger commented Oct 10, 2016

rgbkrk commented Oct 10, 2016

rgbkrk commented Oct 10, 2016

rgbkrk commented Oct 10, 2016 •

edited

Loading

pwalsh commented Oct 10, 2016 •

edited

Loading

rgbkrk commented Oct 10, 2016

TomAugspurger commented Oct 10, 2016

pwalsh commented Oct 11, 2016

TomAugspurger commented Dec 12, 2016 •

edited

Loading

jreback commented Dec 12, 2016

rgbkrk commented Dec 13, 2016 •

edited

Loading

pwalsh commented Dec 13, 2016

takluyver commented Dec 13, 2016

minrk commented Dec 14, 2016 •

edited

Loading

rgbkrk commented Dec 14, 2016

jreback commented Feb 6, 2017

TomAugspurger commented Apr 15, 2017

Support exporting to JSON Table Schema #14386

Support exporting to JSON Table Schema #14386

Comments

rgbkrk commented Oct 10, 2016 • edited Loading

jreback commented Oct 10, 2016

TomAugspurger commented Oct 10, 2016

rgbkrk commented Oct 10, 2016

rgbkrk commented Oct 10, 2016

rgbkrk commented Oct 10, 2016 • edited Loading

pwalsh commented Oct 10, 2016 • edited Loading

rgbkrk commented Oct 10, 2016

TomAugspurger commented Oct 10, 2016

pwalsh commented Oct 11, 2016

TomAugspurger commented Dec 12, 2016 • edited Loading

jreback commented Dec 12, 2016

rgbkrk commented Dec 13, 2016 • edited Loading

pwalsh commented Dec 13, 2016

takluyver commented Dec 13, 2016

minrk commented Dec 14, 2016 • edited Loading

rgbkrk commented Dec 14, 2016

jreback commented Feb 6, 2017

TomAugspurger commented Apr 15, 2017

rgbkrk commented Oct 10, 2016 •

edited

Loading

rgbkrk commented Oct 10, 2016 •

edited

Loading

pwalsh commented Oct 10, 2016 •

edited

Loading

TomAugspurger commented Dec 12, 2016 •

edited

Loading

rgbkrk commented Dec 13, 2016 •

edited

Loading

minrk commented Dec 14, 2016 •

edited

Loading