Skip to content

Commit

Permalink
Issue 84: whitespace and normalised strings
Browse files Browse the repository at this point in the history
  • Loading branch information
michmech committed Apr 8, 2024
1 parent 75f5481 commit 445474b
Show file tree
Hide file tree
Showing 32 changed files with 236 additions and 160 deletions.
18 changes: 9 additions & 9 deletions dmlex-v1.0/specification/core/objectTypes/definition.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,18 @@
<title>Properties</title>
<listitem>
<para><literal>text</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>UNIQUE</glossterm>. Non-empty string. A statement, in the same
language as the headword, that describes and/or explains the meaning of a sense. In DMLex,
the term definition encompasses not only formal definitions, but also less formal
explanations.</para>
<glossterm>required</glossterm> (exactly one) and <glossterm>UNIQUE</glossterm>. Normalised
string. A statement, in the same language as the headword, that describes and/or explains
the meaning of a sense. In DMLex, the term definition encompasses not only formal
definitions, but also less formal explanations.</para>
</listitem>
<listitem>
<para><literal>definitionType</literal>
<glossterm>optional</glossterm> (zero or one). If a sense contains multiple definitions,
indicates the difference between them, for example that they are intended for different
audiences. The <code><olink targetptr="values_definitionTypeTag">definitionTypeTag</olink></code> object type can be used
to constrain and/or explain the definition types that occur in the lexicographic
resource.</para>
<glossterm>optional</glossterm> (zero or one). Normalised string. If a sense contains
multiple definitions, indicates the difference between them, for example that they are
intended for different audiences. The <code><olink targetptr="values_definitionTypeTag"
>definitionTypeTag</olink></code> object type can be used to constrain and/or explain
the definition types that occur in the lexicographic resource.</para>
</listitem>
<listitem>
<para><literal>listingOrder</literal>
Expand Down
7 changes: 5 additions & 2 deletions dmlex-v1.0/specification/core/objectTypes/entry.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,15 @@
<title>Properties</title>
<listitem>
<para><literal>headword</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm> (in combination with other unique properties if present). Non-empty string. The entry's
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm> (in
combination with other unique properties if present). Normalised string. The entry's
headword.</para>
</listitem>
<listitem>
<para><literal>homographNumber</literal>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm> (in combination with other unique properties if present). The entry's homograph number, as a guide to distinguish entries with the same headword.</para>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm> (in
combination with other unique properties if present). Number. The entry's homograph number,
as a guide to distinguish entries with the same headword.</para>
</listitem>
<listitem>
<para><literal><olink targetptr="core_partOfSpeech">partOfSpeech</olink></literal>
Expand Down
23 changes: 13 additions & 10 deletions dmlex-v1.0/specification/core/objectTypes/example.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,22 @@
<title>Properties</title>
<listitem>
<para><literal>text</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Non-empty string. The example itself.</para>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Normalised
string. The example itself.</para>
</listitem>
<listitem>
<para><literal>sourceIdentity</literal>
<glossterm>optional</glossterm> (zero or one). An abbreviation, a code or some other string
of text which identifies the source. The <code><olink targetptr="values_sourceIdentityTag"
>sourceIdentityTag</olink></code> object type can be used to explain the meaning of the source
identifiers, to constrain which source identifiers are allowed to occur in the lexicographic
resource, and to map them onto external inventories and ontologies.</para>
<glossterm>optional</glossterm> (zero or one). Normalised string. An abbreviation, a code or
some other string of text which identifies the source. The <code><olink
targetptr="values_sourceIdentityTag">sourceIdentityTag</olink></code> object type can be
used to explain the meaning of the source identifiers, to constrain which source identifiers
are allowed to occur in the lexicographic resource, and to map them onto external
inventories and ontologies.</para>
</listitem>
<listitem>
<para><literal>sourceElaboration</literal>
<glossterm>optional</glossterm> (zero or one). Non-empty string. A free-form statement about
the source of the example. If <code>sourceIdentity</code> is present, then
<glossterm>optional</glossterm> (zero or one). Normalised string. A free-form statement
about the source of the example. If <code>sourceIdentity</code> is present, then
<code>sourceElaboration</code> can be used for information about where in the source the
example can be found: page number, chapter and so on. If <code>sourceIdentity</code> is
absent then <code>sourceElaboration</code> can be used to fully name the source.</para>
Expand All @@ -49,8 +51,9 @@
</listitem>
<listitem>
<para><literal>soundFile</literal>
<glossterm>optional</glossterm> (zero or one). A pointer to a file, such as a filename or a URI, containing a sound recording of the
example.</para>
<glossterm>optional</glossterm> (zero or one). An <emphasis>Internationalized Resource
Identifier</emphasis> (<link linkend="bib_rfc3987">IRI</link>) pointing to a file which
contains a sound recording of the example.</para>
</listitem>
<listitem>
<para><literal>listingOrder</literal>
Expand Down
20 changes: 11 additions & 9 deletions dmlex-v1.0/specification/core/objectTypes/inflectedForm.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,20 @@
<title>Properties</title>
<listitem>
<para><literal>text</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm> (in combination with other unique properties if present). Non-empty string. The text of the inflected
form.</para>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm> (in
combination with other unique properties if present). Normalised string. The text of the
inflected form.</para>
</listitem>
<listitem>
<para><literal>tag</literal>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm> (in combination with other unique properties if present). Non-empty string. An abbreviation, a code or
some other string of text which identifies the inflected form, for example <code>pl</code>
for plural, <code>gs</code> for genitive singular, <code>com</code> for comparative. The
<code><olink targetptr="values_inflectedFormTag">inflectedFormTag</olink></code> object
type can be used to explain the meaning of the inflection tags, to constrain which
inflection tags are allowed to occur in the lexicographic resource, and to map them onto
external inventories and ontologies.</para>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm> (in
combination with other unique properties if present). Normalised string. An abbreviation, a
code or some other string of text which identifies the inflected form, for example
<code>pl</code> for plural, <code>gs</code> for genitive singular, <code>com</code> for
comparative. The <code><olink targetptr="values_inflectedFormTag"
>inflectedFormTag</olink></code> object type can be used to explain the meaning of the
inflection tags, to constrain which inflection tags are allowed to occur in the
lexicographic resource, and to map them onto external inventories and ontologies.</para>
</listitem>
<listitem>
<para><literal><olink targetptr="core_label">label</olink></literal>
Expand Down
13 changes: 7 additions & 6 deletions dmlex-v1.0/specification/core/objectTypes/label.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,13 @@
<title>Properties</title>
<listitem>
<para><literal>tag</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Non-empty string. An abbreviation, a code or
some other string of text which identifies the label, for example <code>neo</code> for
neologism, <code>colloq</code> for colloquial, <code>polit</code> for politics. The
<code><olink targetptr="values_labelTag">labelTag</olink></code> object type can be used to explain
the meaning of the labels, to constrain which labels are allowed to occur in the
lexicographic resource, and to map them onto external inventories and ontologies.</para>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Normalised
string. An abbreviation, a code or some other string of text which identifies the label, for
example <code>neo</code> for neologism, <code>colloq</code> for colloquial,
<code>polit</code> for politics. The <code><olink targetptr="values_labelTag"
>labelTag</olink></code> object type can be used to explain the meaning of the labels,
to constrain which labels are allowed to occur in the lexicographic resource, and to map
them onto external inventories and ontologies.</para>
</listitem>
<listitem>
<para><literal>listingOrder</literal>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
<title>Properties</title>
<listitem>
<para><literal>title</literal>
<glossterm>optional</glossterm> (zero or one). Non-empty string. A human-readable title of
<glossterm>optional</glossterm> (zero or one). Normalised string. A human-readable title of
the lexicographic resource.</para>
</listitem>
<listitem>
Expand Down
14 changes: 7 additions & 7 deletions dmlex-v1.0/specification/core/objectTypes/partOfSpeech.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@
<title>Properties</title>
<listitem>
<para><literal>tag</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Non-empty string. An abbreviation, a code or
some other string of text which identifies the part-of-speech label, for example
<code>n</code> for noun, <code>v</code> for verb, <code>adj</code> for adjective. The
<code><olink targetptr="values_partOfSpeechTag">partOfSpeechTag</olink></code> object type can be used to explain
the meaning of the part-of-speech tags, to constrain which part-of-speech tags are allowed
to occur in the lexicographic resource, and to map them onto external inventories and
ontologies.</para>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Normalised
string. An abbreviation, a code or some other string of text which identifies the
part-of-speech label, for example <code>n</code> for noun, <code>v</code> for verb,
<code>adj</code> for adjective. The <code><olink targetptr="values_partOfSpeechTag"
>partOfSpeechTag</olink></code> object type can be used to explain the meaning of the
part-of-speech tags, to constrain which part-of-speech tags are allowed to occur in the
lexicographic resource, and to map them onto external inventories and ontologies.</para>
</listitem>
<listitem>
<para><literal>listingOrder</literal>
Expand Down
3 changes: 2 additions & 1 deletion dmlex-v1.0/specification/core/objectTypes/pronunciation.xml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@
</listitem>
<listitem>
<para><literal><olink targetptr="core_transcription">transcription</olink></literal>
<glossterm>optional</glossterm> (zero or more) and <glossterm>unique</glossterm>.</para>
<glossterm>optional</glossterm> (zero or more) and <glossterm>unique</glossterm>.
Normalised string.</para>
</listitem>
</itemizedlist>
</listitem>
Expand Down
9 changes: 5 additions & 4 deletions dmlex-v1.0/specification/core/objectTypes/sense.xml
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,11 @@
</listitem>
<listitem>
<para><literal>indicator</literal>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm>. A short statement, in the same language as
the headword, that gives an indication of the meaning of a sense and permits its
differentiation from other senses in the entry. Indicators are sometimes used in
dictionaries instead of or in addition to definitions.</para>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm>. Normalised
string. A short statement, in the same language as the headword, that gives an indication of
the meaning of a sense and permits its differentiation from other senses in the entry.
Indicators are sometimes used in dictionaries instead of or in addition to
definitions.</para>
</listitem>
<listitem>
<para><literal><olink targetptr="core_label">label</olink></literal>
Expand Down
4 changes: 2 additions & 2 deletions dmlex-v1.0/specification/core/objectTypes/transcription.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
<title>Properties</title>
<listitem>
<para><literal>text</literal>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Non-empty string. The actual
transcription.</para>
<glossterm>required</glossterm> (exactly one) and <glossterm>unique</glossterm>. Normalised
string. The actual transcription.</para>
</listitem>
<listitem>
<para><literal>scheme</literal>
Expand Down
36 changes: 28 additions & 8 deletions dmlex-v1.0/specification/dmlex.xml
Original file line number Diff line number Diff line change
Expand Up @@ -341,21 +341,41 @@
including all the modules. An object type defined in one module is guaranteed not to
name-conflict with another object type in another module.</para>
</section>
<section>
<section id="modelProperties">
<title>Properties</title>
<para>For every object type, DMLex defines which properties it can have, whether the properties
are required or optional, what their arities are (for example “zero or more”), and what
kinds of values it can contains.</para>
<para>There are two kinds of properties: those that contain literal values and those that
contain objects.</para>
<para>Some properties are defined to contain literal values such as strings and numbers. The
following types of literal values are used in DMLex: <simplelist>
<member>non-empty string,</member>
<member>number,</member>
<member>unique identifier</member>
<member>and reference to a unique identifier</member>
</simplelist>
</para>
following types of literal values are used in DMLex:</para>
<itemizedlist>
<listitem>
<para>normalised string: a string that contains no new lines, does not start or end with a whitespace, contains no block of ASCII whitespace more than a single space and is non-empty.</para>
</listitem>
<listitem>
<para>number: including negative numbers and floating-point numbers</para>
</listitem>
<listitem>
<para>non-negative integer number</para>
</listitem>
<listitem>
<para>boolean: a true/false value</para>
</listitem>
<listitem>
<para>URI</para>
</listitem>
<listitem>
<para>IETF language code</para>
</listitem>
<listitem>
<para><link linkend="bib_rfc3987">IRI</link>: an Internationalized Resource Identifier</para>
</listitem>
<listitem>
<para>closed list of possible values</para>
</listitem>
</itemizedlist>
<para>Some properties are defined to contain objects (of types defined in DMLex), for
example an object of type <code>entry</code> can contain objects of type
<code>sense</code>. In such cases, the name of the property is the same as the name of
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@
</listitem>
<listitem>
<para><literal>lemma</literal>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm>. Non-empty string. The lemmatized form of the
collocate. An application can use it to provide a clickable link for the user to search for
the lemma in the rest of the lexicographic resource or on the web. (If you want to link the
collocate explicitly to a specific entry or to a specific sense in your lexicographic
resource, or even in an external lexicographic resource, you can use the Linking Module for
that.)</para>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm>. Normalised
string. The lemmatized form of the collocate. An application can use it to provide a
clickable link for the user to search for the lemma in the rest of the lexicographic
resource or on the web. (If you want to link the collocate explicitly to a specific entry or
to a specific sense in your lexicographic resource, or even in an external lexicographic
resource, you can use the Linking Module for that.)</para>
</listitem>
<listitem>
<para><literal><olink targetptr="annotation_label">label</olink></literal>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@
<itemizedlist>
<listitem>
<para><literal>description</literal>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm>. A plain-text form of the etymology, which may
contain notes about the etymology. This may be used instead of or alongside a structured list of
origin and etymon objects.</para>
<glossterm>optional</glossterm> (zero or one) and <glossterm>unique</glossterm>.
Normalised string. A plain-text form of the etymology, which may contain notes about the
etymology. This may be used instead of or alongside a structured list of origin and
etymon objects.</para>
</listitem>
<listitem>
<para><literal><olink targetptr="etymology_etymon">etymon</olink></literal>
Expand Down
Loading

0 comments on commit 445474b

Please sign in to comment.