layout |
---|
index |
This page describe the specs of yaml frontmatter for csv file format. The main goals of the format are extreme simplicity and readability.
Because for data human's curators from no-data, CSV, metadata+CSV to Semi-structured data, the technological gap is too large. A simple file format to add metadata to the existing datasets is needed. JSON is very cryptic for humans, but YAML can do the job as it can be easily read both by humans and softwares.
There are important initiatives, like Tabular Data Resource which it plans to use (json + csv), but most are meant to be published and read by machines.
CSVY is a simple container of a Tabular Data Resource, where the (Metadata+Schema) are translated from JSON to YAML and put in the YAML frontmatter part of the file, after the YAML frontmatter part is the Data part stored using the CSV Dialect Description Format.
A YAML metadata block is a valid YAML object, delimited by a line of three hyphens ---
at the top and a line of three hyphens ---
or three dots ...
at the bottom.
Use the Table Schema, it's important to know that the CSVY format is designed to store only one dataset per file.
---
profile: tabular-data-resource
name: my-dataset
path: https://raw.githubusercontent.com/csvy/csvy.github.io/master/examples/example.csvy
title: Example file of csvy
description: Show a csvy sample file.
format: csvy
mediatype: text/vnd.yaml
encoding: utf-8
schema:
fields:
- name: var1
type: string
- name: var2
type: integer
- name: var3
type: number
dialect:
csvddfVersion: 1.0
delimiter: ","
doubleQuote: false
lineTerminator: "\r\n"
quoteChar: "\""
skipInitialSpace: true
header: true
sources:
- title: The csvy specifications
path: http://csvy.org/
email: ''
licenses:
- name: CC-BY-4.0
title: Creative Commons Attribution 4.0
path: https://creativecommons.org/licenses/by/4.0/
---
var1,var2,var3
A,1,2.0
B,3,4.3
- R: csvy using
read_csvy()
andwrite_csvy()
- R: rio using
import()
andexport()
(supported provided by the csvy package) - Ruby: csvreader using
Csv.parser.meta
to get the parsed meta data block hash (or nil) if none.
For backward compatibility you can always add to your data.csv a data.yml metadata file, the next step when there is proper implementation make a single file container, data.csvy will not be a problem at all.
Parser support for skipping multiple lines in the header (which would contain the YAML), and for comment lines (lines starting with #
). Based on CSV Parser Notes by @hubgit.
Language | Parser | Skip lines | Comment lines | Comments |
---|---|---|---|---|
Excel Mac | yes | no | ||
Python | pandas.read_csv | yes | yes | |
R | read.table | yes | yes | |
Ruby | csv.read | no | yes | skip lines via regex |
- CSVW (CSV on the Web)
- Tabular Data Package
- ARFF (Attribute-Relation File Format)
Use Github Issues.
- RFC documents the format used for Comma-Separated Values (CSV) RFC4180
- Model for Tabular Data and Metadata on the Web
- Front Matter
- Data Package
- Tabular Data Packages
- Table Schema
- Okfn Tools
- Json to Yaml converter
- Codebeautify yaml to (json/xml/csv)
- Is there a yaml front matter standard validator?
- Add YAML front matter block support for pandas.io.parsers.read_csv
- Allow custom metadata to be attached to panel/df/series
- Using YAML frontmatter with CSV
- Pandoc Yaml Metadata Block