Skip to content

Latest commit

 

History

History
110 lines (90 loc) · 5.52 KB

index.md

File metadata and controls

110 lines (90 loc) · 5.52 KB
layout
index

Welcome to CSVY.

This page describe the specs of yaml frontmatter for csv file format. The main goals of the format are extreme simplicity and readability.

Because for data human's curators from no-data, CSV, metadata+CSV to Semi-structured data, the technological gap is too large. A simple file format to add metadata to the existing datasets is needed. JSON is very cryptic for humans, but YAML can do the job as it can be easily read both by humans and softwares.

Based on Tabular Data Resource

There are important initiatives, like Tabular Data Resource which it plans to use (json + csv), but most are meant to be published and read by machines.

CSVY is a simple container of a Tabular Data Resource, where the (Metadata+Schema) are translated from JSON to YAML and put in the YAML frontmatter part of the file, after the YAML frontmatter part is the Data part stored using the CSV Dialect Description Format.

YAML Header delimiter

A YAML metadata block is a valid YAML object, delimited by a line of three hyphens --- at the top and a line of three hyphens --- or three dots ... at the bottom.

Defining the Table Schema

Use the Table Schema, it's important to know that the CSVY format is designed to store only one dataset per file.

---
profile: tabular-data-resource
name: my-dataset
path: https://raw.githubusercontent.com/csvy/csvy.github.io/master/examples/example.csvy
title: Example file of csvy 
description: Show a csvy sample file.
format: csvy
mediatype: text/vnd.yaml
encoding: utf-8
schema:
  fields:
  - name: var1
    type: string
  - name: var2
    type: integer
  - name: var3
    type: number
dialect:
  csvddfVersion: 1.0
  delimiter: ","
  doubleQuote: false
  lineTerminator: "\r\n"
  quoteChar: "\""
  skipInitialSpace: true
  header: true
sources:
- title: The csvy specifications
  path: http://csvy.org/
  email: ''
licenses:
- name: CC-BY-4.0
  title: Creative Commons Attribution 4.0
  path: https://creativecommons.org/licenses/by/4.0/
---
var1,var2,var3
A,1,2.0
B,3,4.3

Libraries supporting CSVY

  • R: csvy using read_csvy() and write_csvy()
  • R: rio using import() and export() (supported provided by the csvy package)
  • Ruby: csvreader using Csv.parser.meta to get the parsed meta data block hash (or nil) if none.

Backwards Compatibility

For backward compatibility you can always add to your data.csv a data.yml metadata file, the next step when there is proper implementation make a single file container, data.csvy will not be a problem at all.

Parser support for skipping multiple lines in the header (which would contain the YAML), and for comment lines (lines starting with #). Based on CSV Parser Notes by @hubgit.

Language Parser Skip lines Comment lines Comments
Excel Mac yes no
Python pandas.read_csv yes yes
R read.table yes yes
Ruby csv.read no yes skip lines via regex

Related Projects

Authors and Contributors

Support or Contact

Use Github Issues.

References