Minutes

Administrivia

Roll call: see attendee list and remote attendees

Agenda and notes

Welcome and introductions / agenda review

Round table, introductions
- Users and impelementors both
- Some reports of success, some of failure
- Learning curve is steep; documentation is lacking

Summary of new action items

~~ACTION: Norm to rename the “archived-specification” Github repository~~
~~ACTION: Norm to republish the last XML drafts as “1.1”~~
~~ACTION: Norm to setup xproc.org with Travis CI~~
ACTION: Norm to create standard library for exproc steps
ACTION: Norm to move xprocbook.com it to github
ACTION: Ari/Nic weigh in on the current community group
~~ACTION: Norm to turn the xproc repo into an org~~
ACTION: Norm, in November, to revisit the prospect of a workshop
ACTION: Norm to convert pain points into Github issues
ACTION: Henry to review issues and find relevant working group minutes

Where are we, where are we going?

Norm waffles on about the course of the WG.
The W3C WG has close down
There is a W3C Community Group but it has not seen a lot of activity (yet?)
There is a WG note from July 2016, but it’s the December 2015 draft, before the WG shifted to thinking of an entirely new syntax
Hypothesis: this group of people would be most interested in an XProc V.next that was still XML but fixed the most egregious problems.
- Some support for this hypothesis
- If XProc 1.0 didn’t exist, I’d prefer the direction of XProc 2.0.
  - Some support for the non-XML syntax
  - Given that 1.0 does exist, and the XML syntax exists; incremental improvements seem best.
- A lot of support for the XML syntax; can be processed by XML tools
Norm mutters about the size of the XML community vs the broader developer community
- The major use of XProc today is publishing toolchains: if you’re going to focus on a problem, that’s a problem to focus on
  - The trajectory of that problem is XML today; unclear what it will be in the future
  - Note: W3C and IDPF have agreed to merge to work jointly on EPUB
  - Thinking about XProc 1.1 as servicing that community would serve as a good focus
- Doing 2.0 is a very large risk; no gaurantee of a community. The 1.1 effort has a much nearer term community.
- ACTION: Norm to rename Github repository back to “specification” (from archived-specification) so all of the issue links work again. #facepalm
- ACTION: Norm will take the last XML draft from the xproc20 branch and use it as the basis of a new xproc11 branch.
  - Unify the small-fixes, variables-everywhere, and flowchart drafts
  - ETA: October 2016
- Group gazes at http://xproc.github.io/archived-specification/langspec/xproc20/head/xproc20/

Implementor reports, V.next initiatives (Where’s the pain?)

Many of the issues below were discussed by the WG. See http://www.w3.org/XML/XProc/2015/06/10-12-minutes

See also: the June 2015 web archive; that’s where the substantive discussion occured.

Poor support for text/binary documents
- JSON, CSV, JPG, ZIP, etc.
- Support on p:document, for example
- Extracting binaries from ZIP files
Variables/parameters as nodes
- With types (as=)
- Variables anywhere
Attribute value templates
- Can href on p:document be an AVT?
Text value templates?
Being able to import XSLT/XQuery functions
Handling base URIs of documents
- Often have to manually set the base URIs
Output port for validation results
- Already supported by p:validate-with-schematron
- XML Schema/RELAX NG “reports” of validation
- NVDL?
- Unified reporting language for validation errors
  - SVRL for all validation steps?
- Jing does’t report URI; we should consider making the error locations standard somehow
Sequence of steps with multiple input and output ports cannot be bound with default readable port
- Perhaps content-type specific ports?
Can we support JSON more directly/interoperably
cx:depends extension should be standardized
cx:message extension should be standardized
cx:eval extension should be standardized
Random syntactic simplifications
- <p:input port="source" href="test.xml"/>
- <p:input port="source" step="fred" port="secondary"/>
- Allow step and port attributes on steps?
Make exproc steps standard?
Solve interop issue for exproc steps
Extend p:unzip so that it can write to disk
Document properties/metadata co-pipe
Can we get serialization information about secondary outputs from XSLT
- Missing Saxonica API?
- Provide option to have secondary outputs actually be written to disk
Better/more valuable debugging output
- Extend error vocabulary?
p:log improvements
- href as AVT?
- Create paths if they don’t exist
Relax the “well-formed XML” requirement to “any XDM”
- Allow “any” XDM node(s) to flow through the pipeline
- Only check for “well-formed XML” at serialization
Documentation issues
- Tutorials
- Examples
- Spin the wiki back up on github
- xprocbook.com
- “The spec is conceptually difficult to understand”
  - Volunteers accepted
- Point xproc.org at something managed on github
  - ACTION: Norm to setup xproc.org with Travis CI from a repository on Github
Could we have “instructions” that aren’t steps?
- For x in 1 to 10?
- Choose?
- If?
Documents and sequences of documents flow through the pipeline; could we support the notion of “sets” (nodes+names) flowing through the pipeline?
- Sets of documents are used in some publishing workflows
- Output of a pipeline that is an entire website
A “resource manager”
- Should there be some named local storage, with in the context of a pipeline (or set of pipeline) executions.
- Probably addressed by URI, among other things
Define p:collection() for the inputs to steps
- So that you can operation on a group of inputs
XProc 1.1 can be based on XDM (XPath 2.0, 3.x?)
Replace parameters inputs with maps
Can p:for-each/p:iteration-source and p:viewport/p:viewport-source be simplified
- Remove redundancy?

      <p:for-each>
        <p:iteration-source select="//section"/>
        <p:variable name="class" select="/section/@class"/>

      <p:for-each>
        <p:iteration-source select="//section" as="element()"/>
        <p:variable name="class" select="@class"/>

How important is backwards compatibility?
What about XQuery/XSLT 3.x?
What about XSLT 3.x streaming?
What is the use case for binary?
- Adding binary files to ZIPs?
Standard format for pipeline documentation
- Like JavaDoc
- Possible community project?
Standard way of performing unit tests on pipelines
- Make the XProc test suite runner more stable and useful?
Collect additional tests
- Create new test suite for 1.1
ACTION: Jim to review the old XProc issues list and make sure that any stragglers get caught

XProc interop

See XML London paper re: interop issues
There’s a Calabash->Morgana pipeline
Different interpretations of the specs
- Does p:store create the folder structure?
ACTION: Norm to create standard library for exproc steps
- Does XML Calabash do something magic here?

Backwards compatibility

Compatability is a goal; but not a requirement
Backwards incompatibilities must be clearly documented

Thinking about semantics: a dataflow VM / thoughts about streaming

Henry’s semantics work was about the 2.0 work. To the extent that we’ve refocused on 1.1, it’s less relevant
Henry presents background on Markup Technology pipelines
Henry writes on the blackboard (yes, an actual blackboard!)
- Dimensions along which the components of pipelines vary
  - What flows?
    - Markup Technology: Infoset items, documents, sequences, sets
  - How do they flow?
    - What does a step look like?
      - A step is an operation on a pipeline
    - What does a step implementation look like?
      - There may be a variety of implementations depending on the other dimensions
      - Do you pull input or do you provide handlers?
      - Do you want to push or be called?
  - You start with a sequence of steps
    - Look at the set of implementations you have of those steps
    - “Compile” down to the set of implementations with appropriate shims
    - Compiling establishes where the thread boundaries have to be
ACTION: Henry to add his examples here.

Lowering the barrier to entry

How to learn XProc? Cry. Or pray.
Update and finish xprocbook.com?
- ACTION: Norm to move it to github, per xproc.org
Roger Costello’s tutorial
Examples on the xproc-dev mailing list
Data<2>Type tutorial is a good reference
Talk to Oxygen about adding support for Morgana XProc.
- Common “shim” for XML Calabash and Morgana?
How can we increase the diversity in the XProc community?

Building a community / Keeping up the momentum: AKA “Then What?”

How do we translate excitement here into longer-term progress on language specifications?
The current W3C Community Group is very clearly focused on the 2.0 work: Data Pipelining Use Cases
- Rename?
- Start another one?
- ACTION: Ari/Nic weigh in on the current community group
We have a github repo for specs/notes/impl/etc.
- ACTION: Norm to turn the xproc repo into an org
We have a wiki on the github repo
Another workshop?
- General sense of “yes”
- Spec hackathon?
- Co-located with XML Prague?
  - Proposal: Two days, in Leipzig 7/8 Feb 2017
  - ACTION: Norm to raise this again in late November
    - Publish actions and agendas
- Could we have (semi-)regular meetings, perhaps hangouts?
  - Schedule when we have results/actions to discuss?
- Continue to use xproc-dev mailing list for communications
- ACTION: Norm to convert pain points into Github issues
  - Are “github projects” relevant?
  - Gitter for xproc11 repo?
  - ACTION: All to categorize issues as “addressed” in the spec; add links as appropriate.
    - ETA mid-November 2016
  - ACTION: Henry to review issues and find relevant working group minutes
- Self-hosted “slack” for conversations
  - ACTION: Gerrit to send link

Step wish list

ODBC/SQL step
Working with RDF
p:validate-rdf-schema
p:validate-json-schema
Long running pipeline that listens to a port?
Interaction with the user on the command-line?
p:until (repeat until no change)
p:wait-for-change (to the result of an HTTP GET, for example)

Review of open action items

Which actions need to be done before another meeting?

New spec draft (ETA Oct)
Issues created and categorized issues (ETA Nov)

Technical discussion

Henry reviews some “code phrases” from the working group.

“Metadata pipe” (document properties)
- Everyone’s favorite problem with character entities: they disappear after you read the document.
  - How do you get ü back in the output?
- The metadata pipe is a way of passing along information, for example, that you’d like entities.
- Clearly generalizes to serialization options for p:xslt outputs
- The media type also needs to be carried along
- So does the base URI
- This is all out-of-band information that you’d like to travel with the document.
- Pipes should be two part: the data pipe and the metadata pipe.
- In XProc 1.0, there are lots of things that don’t change if you don’t touch them (the step definitions tell you what is and isn’t changed).
- It’s a little harder to see how that extends to the metadata: if you go through an XSLT step, then the serialization options may have changed.
- Things that you do that change the base URI may not obviously change the base URI in the metadata pipe.
- “Getting this right will take some work.”

Any other business?

Thanks to Steven Pemberton, CWI, and W3C.nl for hosting us

Provide feedback

Saved searches

Use saved searches to filter your results more quickly