The format values entered for the format overlay are dependent on the attribute types defined in the Capture Base. Especially, we have:
- Text/Numeric: Regular Expression (Regex)
- Binary: MIME type registered with the Internet Assigned Numbers Authority
- DateTime: date and time representation as defined by ISO 8601
A regular expression, or shortened as "regex", is a powerful way to search and filter strings. You could build a search pattern using character literals, operators, or constructs to match specific types of characters in a string. The OCA uses the Rust regex Flavour, with the full documentation that could be found here.
The character classes and ranges would appear inside square brackets ([...]
). A character class would match a single character with the following pattern:
[abc]
matches charactera
,b
, orc
.[^abc]
matches any character that is nota
,b
, orc
.[a-c]
matches any character in the rangea-c
, that is,a
,b
, orc
.[a-zA-Z]
matches any character in the range ofa-z
orA-Z
, that is, any uppercase or lowercase letter.[0-9]
matches any single digit.
abc|xyz
matchesabc
orxyz
.a*
matchesa
repeated for zero or more times.a+
matchesa
repeated for one or more times.a?
matchesa
repeated for zero or one time.a{n, m}
matchesa
repeated for at leastn
and at mostm
times.
You may have noticed that the regular expressions allow partial matching by default. E.g. abc
actually matches any string that contains the pattern abc
; it could be abc
, abcde
, or &0xyzabcdef-()
.
The following anchors would help.
^
matches the beginning of the string.^abc
only matches strings that begin withabc
.$
matches the end of the string.abc$
only matches strings that end withabc
.^abc$
only matches the exact stringabc
.
.
matches for any character.\
matches the following special character literally.\*
matches the literal*
.\d
matches any single digit. It is equivalent to[0-9]
.\w
matches any letter, digit or underscore character.\s
matches any space character.
Target Strings | Regex Pattern |
---|---|
codes contain any capital latter | [A-Z] |
codes with only capital letters | ^[A-Z]*$ |
codes with only capital letters, or only with lowercase letters | ^([A-Z]*|[a-z]*)$ |
10 characters codes, with capital and lowercase letters only | ^[A-Za-z]{10}$ |
5-10 characters codes, with capital and small letters only | ^[A-Za-z]{5,10}$ |
messages, 250 characters max | ^.{0,250}$ |
Canadian postal codes (A1A 1A1 ) |
^[A-Z][0-9][A-Z]\s[0-9][A-Z][0-9]$ |
In OCA, we also use regex to deal with numeric attributes. However, regex does not understand numbers; it only matches them as characters. E.g. if we want an integer within the range ^([1-2][0-9])|(20)$
to match "character 1
or 2
followed by any digit; or characters 30
". It may be tricky to deal with complicated numeric conditions.
Target Strings | Regex Pattern |
---|---|
any string starts with a digit 0-5
|
^[0-5] |
integer numbers between 1 and 50 | ^([1-9]|[1-4][0-9]|50)$ |
integer numbers between -50 and 50 | ^-?([0-9]|[1-4][0-9]|50)$ |
any integer or decimal number, may begin with + or -
|
^[-+]?\d*\.?\d+$ |
decimal numbers between 0 and 1, inclusive | ^\+?((0?\.\d+)|(1(\.0+)?))$ |
decimal numbers between -90 and 90, inclusive | ^[-+]?(90(\.0+)?|[1-8]?\d?(\.\d+)?)$ |
decimal numbers between -180 and 180, inclusive | ^[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d?))(\.\d+)?)$ |
latitude and longitude (combination of the two above, separated with a single comma and space), see visualization below | ^[-+]?(90(\.0+)?|[1-8]?\d?(\.\d+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d?))(\.\d+)?)$ |
A useful website for testing regular expressions is Regular Expressions 101. You could input any regex and type a series of test strings, then any matches found will be marked out.
If this is not done by default, please remember to check the regex flags g
and m
for easier testing.
A MIME type, defined by ITEF RFC6838, indicates the format for mostly file types.
All MIME types follow a basic template of two parts, separated by a single slash (/
):
type/subtype
You could find a complete list of MIME types here. The following are some frequently used MIME types.
Image | Video | Audio | Application | Text |
---|---|---|---|---|
image/png | video/mp4 | audio/mpeg | application/pdf | text/csv |
image/jpeg | video/raw | audio/ogg | text/xml | |
image/tiff | text/markdown |
ISO 8601 specifies an international format of date and time data. You could find a summary of the standard by Markus Kuhn here.
By ISO 8601, you could use the following representations:
YYYY
for years,MM
for months (in two digits,01
through12
), andDD
for days. Separated by a single dash (-
) or nothing.Www
, the literalW
and two-digit week numberww
, could be used after the year instead. An optional followingD
represents the weekday number, from 1 through 7, beginning with Monday. Separated by a single dash (-
) or nothing.DDD
, the ordinal date, could be used after the year instead. It is a three-digit number of days in a year from001
through365
or366
.hh
for hours,mm
for minutes, andss
(orss.sss
for a certain number of decimal places) for seconds. The time is led by a literalT
and separated by a single colon (:
) or nothing.Z
for the time in UTC, or±hh:mm
±hhmm
±hh
for other time zones after the time representation.PnYnMnDTnHnMnS
,PnW
, orP<date>T<time>
, with all capital letters being literals and alln
's being numbers, could be used to represent durations.<start>/<end>
,<start>/<duration>
,<duration>/<end>
, or<duration>
could be used to represent time intervals.Rn/<interval>
orR/<interval>
, withn
for the number of repetitions, could be used to represent repeated intervals.
The following are some ISO 8601 DateTime examples.
Type | ISO 8601 Format | Example of a DateTime Allowed |
---|---|---|
date (year, month, and day) | YYYY-MM-DD |
2001-02-03 |
date (year and month) | YYYY-MM |
2001-02 |
date (year, month, and day), basic format | YYYYMMDD |
20010203 |
date (year, month, and day) and time | YYYY-MM-DDThh:mm:ss.sss |
2001-02-03T04:00:00 |
date (year, month, and day) and time, in UTC | YYYY-MM-DDThh:mm:ss.sssZ |
2001-02-03T04:00:00Z |
time, with time zone offset (in hours) | Thh:mm:ss.sss±hh |
T04:00:00-05 |
durations (in years, months, days, and hours) | PnYnMnDTnH |
P1Y2M3DT4H |