Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment block #2

Open
hubgit opened this issue Sep 4, 2015 · 6 comments
Open

Comment block #2

hubgit opened this issue Sep 4, 2015 · 6 comments

Comments

@hubgit
Copy link

hubgit commented Sep 4, 2015

It's common (though not specified) for lines beginning with # at the start of a CSV file to be treated as comments (with a "citation needed" caveat - I'm not sure which parsers, if any, do this by default).

If the YAML header was to have a '#' at the start of each line, this might be more compatible with parsers that aren't expecting a YAML block?

@mfenner
Copy link

mfenner commented Sep 4, 2015

Yes, we had the same discussion somewhere else. Something I want to do (and Noam Ross has done some of this on Twitter) is a small table that shows support for comment lines and skipping lines in the header in the common parsers. The read.csv command in R supports comments automatically, Excel (at least on the Mac) does not. I'm not sure what happens if you do comments in a yaml header.

@hubgit
Copy link
Author

hubgit commented Sep 4, 2015

I wrote up a summary of CSV parsers a little while ago - quite a few of them have a parameter for a character that should be treated as the start of a comment line.

@mfenner
Copy link

mfenner commented Sep 4, 2015

Great! I've started a table at https://github.com/csvy/csvy.github.io/blob/master/index.md summarizing the support for skipping lines and comment lines, based on your summary.

@hadley
Copy link
Contributor

hadley commented Dec 29, 2017

I think it would be a good idea to change the example on the homepage to comment out the yaml block, if not formally make that part of the spec.

@jrovegno
Copy link
Member

@hadley I've been thinking about it, I've seen that there is no real interest in meeting this:

CSVY is a simple container of a Tabular Data Package, where the (Metadata+Schema) are translated from JSON to YAML and put in the YAML frontmatter part of the file, after the YAML frontmatter part is the Data part stored using the CSV Dialect Description Format. It’s possible put multiple Data resources separates by the YAML Header delimiter.

@leeper in his project rio they use csvy library for R for read one resource in one file, but the Tabular Data Package could contain multiple resource.
Also use comments in the yaml fromatter part, instead of putting it in another separate file, to be compatible with the old parsers.

How can we reach a consensus on the specification?

@hadley
Copy link
Contributor

hadley commented Jan 30, 2018

I think it's worth formally allowing two specifications:

  • yaml metadata in # comment block. This is compatible with most csv parsers, and will never be worse than yaml metadata that is not commented out

  • yaml metadata in a separate file that has the same name (apart from extension) as the csv file. This is useful for adding metadata to existing files which you don't otherwise want to modify

I don't see the advantage of supporting multiple yaml headers - I don't think it's necessary to provide the metadata for multiple resources embedded in a single resource. In other words, the equivalent to a "tabular data package" would simply be a directory of csvy files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants