Skip to content

Minutes

Norman Walsh edited this page Oct 4, 2016 · 11 revisions

Administrivia

Agenda and notes

Welcome and introductions / agenda review

  • Round table, introductions
    • Users and impelementors both
    • Some reports of success, some of failure
    • Learning curve is steep; documentation is lacking

Summary of new action items

  • ACTION: Norm to rename the “archived-specification” Github repository
  • ACTION: Norm to republish the last XML drafts as “1.1”
  • ACTION: Norm to setup xproc.org with Travis CI
  • ACTION: Norm to create standard library for exproc steps
  • ACTION: Norm to move xprocbook.com it to github
  • ACTION: Ari/Nic weigh in on the current community group
  • ACTION: Norm to turn the xproc repo into an org
  • ACTION: Norm, in November, to revisit the prospect of a workshop
  • ACTION: Norm to convert pain points into Github issues
  • ACTION: Henry to review issues and find relevant working group minutes

Where are we, where are we going?

  • Norm waffles on about the course of the WG.
  • The W3C WG has close down
  • There is a W3C Community Group but it has not seen a lot of activity (yet?)
  • There is a WG note from July 2016, but it’s the December 2015 draft, before the WG shifted to thinking of an entirely new syntax
  • Hypothesis: this group of people would be most interested in an XProc V.next that was still XML but fixed the most egregious problems.
    • Some support for this hypothesis
    • If XProc 1.0 didn’t exist, I’d prefer the direction of XProc 2.0.
      • Some support for the non-XML syntax
      • Given that 1.0 does exist, and the XML syntax exists; incremental improvements seem best.
    • A lot of support for the XML syntax; can be processed by XML tools
  • Norm mutters about the size of the XML community vs the broader developer community
    • The major use of XProc today is publishing toolchains: if you’re going to focus on a problem, that’s a problem to focus on
      • The trajectory of that problem is XML today; unclear what it will be in the future
      • Note: W3C and IDPF have agreed to merge to work jointly on EPUB
      • Thinking about XProc 1.1 as servicing that community would serve as a good focus
    • Doing 2.0 is a very large risk; no gaurantee of a community. The 1.1 effort has a much nearer term community.
    • ACTION: Norm to rename Github repository back to “specification” (from archived-specification) so all of the issue links work again. #facepalm
    • ACTION: Norm will take the last XML draft from the xproc20 branch and use it as the basis of a new xproc11 branch.
      • Unify the small-fixes, variables-everywhere, and flowchart drafts
      • ETA: October 2016
    • Group gazes at http://xproc.github.io/archived-specification/langspec/xproc20/head/xproc20/

Implementor reports, V.next initiatives (Where’s the pain?)

Many of the issues below were discussed by the WG. See http://www.w3.org/XML/XProc/2015/06/10-12-minutes

See also: the June 2015 web archive; that’s where the substantive discussion occured.

  • Poor support for text/binary documents
    • JSON, CSV, JPG, ZIP, etc.
    • Support on p:document, for example
    • Extracting binaries from ZIP files
  • Variables/parameters as nodes
    • With types (as=)
    • Variables anywhere
  • Attribute value templates
    • Can href on p:document be an AVT?
  • Text value templates?
  • Being able to import XSLT/XQuery functions
  • Handling base URIs of documents
    • Often have to manually set the base URIs
  • Output port for validation results
    • Already supported by p:validate-with-schematron
    • XML Schema/RELAX NG “reports” of validation
    • NVDL?
    • Unified reporting language for validation errors
      • SVRL for all validation steps?
    • Jing does’t report URI; we should consider making the error locations standard somehow
  • Sequence of steps with multiple input and output ports cannot be bound with default readable port
    • Perhaps content-type specific ports?
  • Can we support JSON more directly/interoperably
  • cx:depends extension should be standardized
  • cx:message extension should be standardized
  • cx:eval extension should be standardized
  • Random syntactic simplifications
    • <p:input port="source" href="test.xml"/>
    • <p:input port="source" step="fred" port="secondary"/>
    • Allow step and port attributes on steps?
  • Make exproc steps standard?
  • Solve interop issue for exproc steps
  • Extend p:unzip so that it can write to disk
  • Document properties/metadata co-pipe
  • Can we get serialization information about secondary outputs from XSLT
    • Missing Saxonica API?
    • Provide option to have secondary outputs actually be written to disk
  • Better/more valuable debugging output
    • Extend error vocabulary?
  • p:log improvements
    • href as AVT?
    • Create paths if they don’t exist
  • Relax the “well-formed XML” requirement to “any XDM”
    • Allow “any” XDM node(s) to flow through the pipeline
    • Only check for “well-formed XML” at serialization
  • Documentation issues
    • Tutorials
    • Examples
    • Spin the wiki back up on github
    • xprocbook.com
    • “The spec is conceptually difficult to understand”
      • Volunteers accepted
    • Point xproc.org at something managed on github
      • ACTION: Norm to setup xproc.org with Travis CI from a repository on Github
  • Could we have “instructions” that aren’t steps?
    • For x in 1 to 10?
    • Choose?
    • If?
  • Documents and sequences of documents flow through the pipeline; could we support the notion of “sets” (nodes+names) flowing through the pipeline?
    • Sets of documents are used in some publishing workflows
    • Output of a pipeline that is an entire website
  • A “resource manager”
    • Should there be some named local storage, with in the context of a pipeline (or set of pipeline) executions.
    • Probably addressed by URI, among other things
  • Define p:collection() for the inputs to steps
    • So that you can operation on a group of inputs
  • XProc 1.1 can be based on XDM (XPath 2.0, 3.x?)
  • Replace parameters inputs with maps
  • Can p:for-each/p:iteration-source and p:viewport/p:viewport-source be simplified
    • Remove redundancy?
      <p:for-each>
        <p:iteration-source select="//section"/>
        <p:variable name="class" select="/section/@class"/>
      <p:for-each>
        <p:iteration-source select="//section" as="element()"/>
        <p:variable name="class" select="@class"/>
  • How important is backwards compatibility?
  • What about XQuery/XSLT 3.x?
  • What about XSLT 3.x streaming?
  • What is the use case for binary?
    • Adding binary files to ZIPs?
  • Standard format for pipeline documentation
    • Like JavaDoc
    • Possible community project?
  • Standard way of performing unit tests on pipelines
    • Make the XProc test suite runner more stable and useful?
  • Collect additional tests
    • Create new test suite for 1.1
  • ACTION: Jim to review the old XProc issues list and make sure that any stragglers get caught

XProc interop

  • See XML London paper re: interop issues
  • There’s a Calabash->Morgana pipeline
  • Different interpretations of the specs
    • Does p:store create the folder structure?
  • ACTION: Norm to create standard library for exproc steps
    • Does XML Calabash do something magic here?

Backwards compatibility

  • Compatability is a goal; but not a requirement
  • Backwards incompatibilities must be clearly documented

Thinking about semantics: a dataflow VM / thoughts about streaming

  • Henry’s semantics work was about the 2.0 work. To the extent that we’ve refocused on 1.1, it’s less relevant
  • Henry presents background on Markup Technology pipelines
  • Henry writes on the blackboard (yes, an actual blackboard!)
    • Dimensions along which the components of pipelines vary
      • What flows?
        • Markup Technology: Infoset items, documents, sequences, sets
      • How do they flow?
        • What does a step look like?
          • A step is an operation on a pipeline
        • What does a step implementation look like?
          • There may be a variety of implementations depending on the other dimensions
          • Do you pull input or do you provide handlers?
          • Do you want to push or be called?
      • You start with a sequence of steps
        • Look at the set of implementations you have of those steps
        • “Compile” down to the set of implementations with appropriate shims
        • Compiling establishes where the thread boundaries have to be
  • ACTION: Henry to add his examples here.

Lowering the barrier to entry

  • How to learn XProc? Cry. Or pray.
  • Update and finish xprocbook.com?
    • ACTION: Norm to move it to github, per xproc.org
  • Roger Costello’s tutorial
  • Examples on the xproc-dev mailing list
  • Data<2>Type tutorial is a good reference
  • Talk to Oxygen about adding support for Morgana XProc.
    • Common “shim” for XML Calabash and Morgana?
  • How can we increase the diversity in the XProc community?

Building a community / Keeping up the momentum: AKA “Then What?”

  • How do we translate excitement here into longer-term progress on language specifications?
  • The current W3C Community Group is very clearly focused on the 2.0 work: Data Pipelining Use Cases
    • Rename?
    • Start another one?
    • ACTION: Ari/Nic weigh in on the current community group
  • We have a github repo for specs/notes/impl/etc.
    • ACTION: Norm to turn the xproc repo into an org
  • We have a wiki on the github repo
  • Another workshop?
    • General sense of “yes”
    • Spec hackathon?
    • Co-located with XML Prague?
      • Proposal: Two days, in Leipzig 7/8 Feb 2017
      • ACTION: Norm to raise this again in late November
        • Publish actions and agendas
    • Could we have (semi-)regular meetings, perhaps hangouts?
      • Schedule when we have results/actions to discuss?
    • Continue to use xproc-dev mailing list for communications
    • ACTION: Norm to convert pain points into Github issues
      • Are “github projects” relevant?
      • Gitter for xproc11 repo?
      • ACTION: All to categorize issues as “addressed” in the spec; add links as appropriate.
        • ETA mid-November 2016
      • ACTION: Henry to review issues and find relevant working group minutes
    • Self-hosted “slack” for conversations
      • ACTION: Gerrit to send link

Step wish list

  • ODBC/SQL step
  • Working with RDF
  • p:validate-rdf-schema
  • p:validate-json-schema
  • Long running pipeline that listens to a port?
  • Interaction with the user on the command-line?
  • p:until (repeat until no change)
  • p:wait-for-change (to the result of an HTTP GET, for example)

Review of open action items

Which actions need to be done before another meeting?

  • New spec draft (ETA Oct)
  • Issues created and categorized issues (ETA Nov)

Technical discussion

Henry reviews some “code phrases” from the working group.

  • “Metadata pipe” (document properties)
    • Everyone’s favorite problem with character entities: they disappear after you read the document.
      • How do you get &uuml; back in the output?
    • The metadata pipe is a way of passing along information, for example, that you’d like entities.
    • Clearly generalizes to serialization options for p:xslt outputs
    • The media type also needs to be carried along
    • So does the base URI
    • This is all out-of-band information that you’d like to travel with the document.
    • Pipes should be two part: the data pipe and the metadata pipe.
    • In XProc 1.0, there are lots of things that don’t change if you don’t touch them (the step definitions tell you what is and isn’t changed).
    • It’s a little harder to see how that extends to the metadata: if you go through an XSLT step, then the serialization options may have changed.
    • Things that you do that change the base URI may not obviously change the base URI in the metadata pipe.
    • “Getting this right will take some work.”

Any other business?

  • Thanks to Steven Pemberton, CWI, and W3C.nl for hosting us

Meeting adjourned