Skip to content

HXL schemas

David Megginson edited this page Apr 23, 2018 · 17 revisions

The Validation page validates a HXL dataset against a simple, spreadsheet-style HXL schema. This article describes the schema format.

Schema hashtags

The schema is itself a HXL dataset, using the following hashtags:

Schema tag Required Description Example
#valid_tag yes A tag pattern (see tag patterns) for the hashtag being described, including the "#" character. #sector
#valid_required no Without the +min or +max attributes, a truthy value (like "1") means simply that the value is required. 1
#valid_required+min no The minimum number of times a non-empty value for the tag must appear in each row of the dataset. Defaults to no minimum. 1
#valid_required+max no The maximum number of times a non-empty value for the tag may appear in each row of the dataset. Defaults to no maximum. 5
#valid_unique no Require individual values in all matching columns to be unique throughout the document true
#valid_unique+key no Define a comma-separated list of tag patterns that determines whether two rows match, and report any duplicate rows using a compound key made up of matching values from the row. #org,#adm1+code,#sector
#valid_correlation no Define a comma-separated list of tag patterns that should always have values that should always have the same values for any given value of #valid_tag. (Note: this note reciprocal: if #adm1 and #adm2 should always have the same values for any value of #adm3, it doesn't necessarily follow that #adm3 needs to have the same value for any combination of values for #adm1 and #adm2.) #adm1,#adm2 (for #adm3)
#valid_datatype no The type of data expected in the column under the HXL tag. Currently-allowed values are "text", "number", "url", "email", and "phone" ("date" coming soon). Defaults to no type checking. number
#valid_datatype+consistent no Test for consistent datatypes in a column. Test first for dates (only if tagged #date), then for numbers, then for strings. Ignores empty values. true
#valid_value+whitespace no Reports an error for any irregular whitespace (allows no leading or trailing space, and only single internal space characters) when set to a truthy value. true
#valid_value+min no The minimum value allowed when #valid_datatype is "number". Defaults to no minimum value. Ignored for non-numeric datatypes. 100
#valid_value+max no The maximum value allowed when #valid_datatype is "number". Defaults to no maximum value. Ignored for non-numeric datatypes. 10000
#valid_value+regex no A regular expression pattern that the value must match. ^([0-9])(,[0-9])*$
#valid_value+list no A list of allowed values, separated by "|". female|male
#valid_value+case no A truthy value like "1" if matches for patterns and enumerations should be case-insensitive. 0
#valid_value+url no The URL of a HXL dataset containing allowed values (possibly thousands of them). Use together with #valid_value+target_tag for the hashtag of the column containing the values. http://example.org/codes/p-codes.hxl
#valid_value+target_tag no When used together with #valid_value+url, a tag pattern (see Tag patterns) for the column containing the allowed values in the external HXL dataset. #adm1+code
#valid_severity no The severity of the error, for user feedback. Allowed values are "info", "warning", or "error" (the default). warning
#description no A human-readable description of the error, to provide user feedback. It is a good idea to include at least one #sector column in a 3W.

Sample schema

The generic core HXL schema is available on HDX at https://data.humdata.org/dataset/hxl-core-schemas

Here is a simple sample schema:

#valid_tag #valid_severity #valid_required +min #valid_required +max #valid_datatype #valid_value +list #description
#org error 1 text You must provide the name of the organisation doing the work.
#sector error 1 1 text WASH | Health| Education| CCCM| Protection You must provide the primary cluster for the activity
#subsector info text Adding a subsector allows better aid coordination.
#country error 1 1 text Guinea | Liberia| Sierra Leone You must specify the country where the work is taking place.
#adm1 warning 1 text We strongly encourage specifying the administrative subdivision as well as the country.
Clone this wiki locally