Comment block #2

hubgit · 2015-09-04T08:07:18Z

It's common (though not specified) for lines beginning with # at the start of a CSV file to be treated as comments (with a "citation needed" caveat - I'm not sure which parsers, if any, do this by default).

If the YAML header was to have a '#' at the start of each line, this might be more compatible with parsers that aren't expecting a YAML block?

The text was updated successfully, but these errors were encountered:

mfenner · 2015-09-04T08:11:05Z

Yes, we had the same discussion somewhere else. Something I want to do (and Noam Ross has done some of this on Twitter) is a small table that shows support for comment lines and skipping lines in the header in the common parsers. The read.csv command in R supports comments automatically, Excel (at least on the Mac) does not. I'm not sure what happens if you do comments in a yaml header.

hubgit · 2015-09-04T08:16:29Z

I wrote up a summary of CSV parsers a little while ago - quite a few of them have a parameter for a character that should be treated as the start of a comment line.

mfenner · 2015-09-04T09:20:05Z

Great! I've started a table at https://github.com/csvy/csvy.github.io/blob/master/index.md summarizing the support for skipping lines and comment lines, based on your summary.

hadley · 2017-12-29T21:45:22Z

I think it would be a good idea to change the example on the homepage to comment out the yaml block, if not formally make that part of the spec.

jrovegno · 2018-01-30T20:54:03Z

@hadley I've been thinking about it, I've seen that there is no real interest in meeting this:

CSVY is a simple container of a Tabular Data Package, where the (Metadata+Schema) are translated from JSON to YAML and put in the YAML frontmatter part of the file, after the YAML frontmatter part is the Data part stored using the CSV Dialect Description Format. It’s possible put multiple Data resources separates by the YAML Header delimiter.

@leeper in his project rio they use csvy library for R for read one resource in one file, but the Tabular Data Package could contain multiple resource.
Also use comments in the yaml fromatter part, instead of putting it in another separate file, to be compatible with the old parsers.

How can we reach a consensus on the specification?

hadley · 2018-01-30T22:10:57Z

I think it's worth formally allowing two specifications:

yaml metadata in # comment block. This is compatible with most csv parsers, and will never be worse than yaml metadata that is not commented out
yaml metadata in a separate file that has the same name (apart from extension) as the csv file. This is useful for adding metadata to existing files which you don't otherwise want to modify

I don't see the advantage of supporting multiple yaml headers - I don't think it's necessary to provide the metadata for multiple resources embedded in a single resource. In other words, the equivalent to a "tabular data package" would simply be a directory of csvy files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comment block #2

Comment block #2

hubgit commented Sep 4, 2015

mfenner commented Sep 4, 2015

hubgit commented Sep 4, 2015

mfenner commented Sep 4, 2015

hadley commented Dec 29, 2017

jrovegno commented Jan 30, 2018

hadley commented Jan 30, 2018

Comment block #2

Comment block #2

Comments

hubgit commented Sep 4, 2015

mfenner commented Sep 4, 2015

hubgit commented Sep 4, 2015

mfenner commented Sep 4, 2015

hadley commented Dec 29, 2017

jrovegno commented Jan 30, 2018

hadley commented Jan 30, 2018