Home

Jump to bottom Edit New page

Quentin Reul edited this page Nov 18, 2016 · 37 revisions

Welcome to the draft-material wiki! Link the pages you create from here.

1. Introduction

1.1 Purpose of the page

This page lists use cases and technologies to implement the use cases. All use cases share the task to work with XML, RDF and JSON. Currently there is a strong focus on working with XML and RDF, but this may change in the future.

1.2 Structure of the page

There are two main sub sections: use cases and technology solutions. Both are sub divided further. In the use case section, there are sub types of use cases. In the technology solutions section, there are solutions grouped under certain aspects, like: is a proposed extension of the RDF technology stack or XML technology stack; is a best practice; is a tool; etc. The aim of this structure is to detect gaps that may benefit from further standardization, best practice description or tool development.

2. Use Cases

2.1 Data Enrichment in Digital Publishing

Owner: Christian, Quentin, Rob

What:

XML, JSON & Linked Data Challenge:

2.2 Data in Public Administration

Owner: Gerard

What:

XML, JSON & Linked Data Challenge:

2.3 Mapping of XML Dictionary Data into RDF

Owner: Timea

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0010.html And also: https://ldl4.com/2016/09/26/what-ive-learned-while-triplifying-a-real-dictionary/

XML, JSON & Linked Data Challenge:

2.4 Mapping of XML Book Content (z-bible) into RDF

Owner: Bernát

What:

XML, JSON & Linked Data Challenge:

2.5 Mapping of XML EAD into RDF

Owner: Gerard

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0025.html

XML, JSON & Linked Data Challenge:

2.6 Enrichment in Localization and Translation

Owner: Phil

What:

XML, JSON & Linked Data Challenge:

2.7 Data acquisition from job postings via GATE

Owner: Christoph

What: GATE Embedded pipeline annotates raw job posting texts, resulting in XML, from which RDF is to be extracted. Implemented using Krextor.

XML, JSON & Linked Data Challenge: The structure of the GATE XML output (empty element nodes used to select ranges of text and link them to structured annotations) is hard to process using declarative XPath-based approaches.

2.8 AutomationML industry automation models integration

Owner: Christoph

What: integrating multiple views on industry automation settings modelled using AutomationML, by converting from AutomationML to RDF, and further from RDF to a deductive database, where conflict resolution rules are applied (paper). Implemented using Krextor.

XML, JSON & Linked Data Challenge: One challenge so far is that in AutomationML certain elements and attributes can occur in many different contexts (i.e. parent elements).

3. Solutions

3.1.1 General Aspects of RDF/XML Interoperability

Owner: Quentin

As part of [WK](http://wolterskluwer.com/ Wolters Kluwer), we have developed a lingua franca (in the form of an ontology) to express descriptive and structure metadata about our content (see Standardizing Legal Content with OWL and RDF). The original goal of the lingua franca was to implement a generic ''semantic content integration'' channel to deliver content stored in local Content Management System to the WK global publishing platform. As part this semantic content integration channel, we developed a validation and conversion tool that leveraged both XHTML and RDF to express information about content to be merged and transformed into the XML format supported by the publishing platform.

As RDF includes the RDX/XML serialization, we first attempted to use XSLTs to handle the conversion. However, a set of RDF triples express a directed, labeled graph that can be rendered in different manners in XML. For instance, the rdf:Description element can be used to represent a resource or its type (e.g. skos:Concept) can be used. This is because the RDF/XML syntax is schemaless. As such, transformation between RDF and XML using XSLTs is not optimal and XSLTs need to be tailored to different flavor of RDF/XML (which is costly). One solution that we have used is to transform the RDF input (in any supported syntax) to be converted into a XML canonical form prior to applying XSLTs. The approach is supplemented by a library of Java functions that can be called from XSLTs to process the graph. This approach has enabled us to migrate data across systems in several projects. However, we have seen issues with the performance of transformation as well as increase in complexity. In other words, it is difficult to maintain as requirements for transformation change.

Based on our experience, we have observed that XML is great for representing content, whereas RDF is great for representing meta-information (e.g. title, date, etc.) about the content. The main aspect is to assign URIs to the abstract content and there physical encoding (e.g. XML format). A few standards (e.g. RDFa) have already been defined to integrate meta-information in HTML.

3.1.2 Classification of Solutions

Owner: Felix and everybody

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0014.html

Solutions can be classified with regards to input and output.

Going from RDF > XML (sometimes called “lowering”)
Going from XML > RDF (sometimes called “lifting”)
Doing round tripping (i.e. both (1) and (2))
Embedding RDF in XML (already feasible with technologies like RDFa or JSON-LD)
Embedding XML in RDF (e.g. rdf:XMLLiteral; works well when the RDF is serialized in XML, e.g., using RDF/XML)

3.2 Best Practices

3.2.1 a) Conversion from XML to RDF on the Schema Level, without Round Tripping

Owner: Timea

What:

Example:

3.2.1 b) Conversion from XML to RDF on the Schema Level, without Round Tripping

Owner: Jean-Pierre Evain

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0015.html

Example: Sport metadata and audiovisual content in its semantic context

Sport metadata is delivered in live data streams that evolve with time (e.g. results). Different providers use different data formats such as csv, xml or e.g. pdf. In the case of xml, different schemas are being used.

Semantic allows linking resources and associated metadata to contextual sport data (events, locations, athletes, results, etc.) and also to external linked resources. The first task consists of converting ingested data into RDF. The most convenient is converting data to xml (if not natively xml) and to RDF.

In sport applications, a sport ontology is being used, which is not a structural conversion from xsd to rdf. An ontology is a model derived from the xml data (and associated schema). xml to rdf is therefore the transformation of xml instances into rdf individuals.

Equally, linked data is being mapped to the ontology.

RDF to XML is not required. Instead, the results of SPARQL queries are exported as Java beans or json depending on the application and development platforms.

3.2.2 Preparing RDF/XML for Conversion with XSLT

Owner: Martynas

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0012.html

Example:

3.2.3 Roundtripping: Converting XML to RDF and back again

Owner: Felix et al.

What: See http://archive.xmlprague.cz/2016/files/xmlprague-2016-proceedings.pdf#page=133

By Roundtripping we mean that XML is converted to RDF, some processing in RDF is done, and the output is embedded into the original XML again. The forehand mentioned paper describes several approaches to realize roundtripping:

Embed Linked Data into XML via Structured Markup, e.g. RDFa
Anchor Linked Data in XML Attributes
Embed Linked Data in Metadata Sections of XML Files
Anchor Linked Data via Annotations in XML Content

See the paper for details. The approaches do not need an extension of the RDF or XML technology stack, but tooling. A demo implementation is available at http://api-dev.freme-project.eu/doc/freme-showcase/xml-to-rdf.html

3.3 Existing Tools

3.3.1 XML > RDF Conversion Tooling: Krextor (XSLT-based)

Owner: Christoph

What: Krextor (homepage, old but still helpful paper) is a library of high-level XSLT templates and functions for XML→RDF conversion. Krextor enables the specification of mappings from XML-based formats to RDF at levels ranging from a declarative “schema to ontology” mapping (for many practical situations) and low-level XSLT (for full power). Krextor is not schema-aware; the mapping author is expected to know the schema of the XML input and the ontology of the RDF output and has to write the mapping manually.

Advantages over from-scratch one-off XSLT implementations include:

Krextor employs a high-level abstraction of RDF. Instead of just generating RDF/XML output (which is what most from-scratch one-off XSLTs for XML→RDF do), its basic actions are creating resources and adding properties to them. The rules for generating URIs are specified independently from the rules for mapping XML elements/attributes to RDF vocabulary terms.
Templates for many common tasks (e.g. generating URIs from ID/name attributes) are part of Krextor's library.
convenient Java and command line interfaces

Krextor's most serious shortcomings are:

It is hard to specify your own XML→RDF extraction rules without a strong XSLT background. This is because when things go wrong you will receive low-level error messages from the XSLT processor.
Krextor is not optimized for performance (but for expressiveness of mappings and ease of implementing new mappings).
Krextor has only been tested with the Saxon XSLT processor, as Krextor's main developer (Christoph) is not aware of any other free processor for XSLT ≥ 2.0. Also, as the current free version of Saxon does not support full XSLT 3.0, Krextor is still limited to XSLT 2.0.

Example: see https://github.com/EIS-Bonn/krextor/wiki/Documentation

3.4 Extensions of the RDF and / or XML Technology Stack

3.4.1 RDF Data Shapes and XML > RDF Conversion

Owner: Jose

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0013.html

Example:

3.4.2 XSPARQL

Owner: Axel?

What: Round-tripping; XSPARQL combines XQuery and SPARQL. It is specifically suited for queries over XML or RDF or both at the same time, rather than transforming whole RDF graphs or whole XML documents (which requires more effort than, say, in XSLT, for the absence of implicit recursion).

(The latter sentence reflect's Christoph's view; please feel free to improve)

Example:

3.4.3 RDF/XML extension for Quads

Owner: Martynas

What: See https://lists.w3.org/Archives/Public/semantic-web/2016Jun/0022.html

Example:

3.4.4 Create XML Documents from RDF Data

Owner: Bernát

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0023.html

Example:

3.4.5 XML that does not let RDF introduce Namespaces

Owner: Liam

What: See https://lists.w3.org/Archives/Public/public-rax/2016Jul/0018.html

Example:

3.4.6 STTL for RDF to XML

STTL (paper, specification) is an approach that combines SPARQL and XML templates. It is implemented in the Corese library.

Old Material (Please do not edit): Collection of known solutions & ideas

If you want to use material from below please integrate it where it fits in the above structure.

XML to RDF:

Paper: http://archive.xmlprague.cz/2016/files/xmlprague-2016-proceedings.pdf#page=133

Other ideas:

Maestro edition of TopBraid Composer
MarkLogic