Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol design #20

Closed
zcorpan opened this issue Apr 26, 2022 · 9 comments
Closed

Protocol design #20

zcorpan opened this issue Apr 26, 2022 · 9 comments

Comments

@zcorpan
Copy link
Member

zcorpan commented Apr 26, 2022

In yesterday's meeting (see minutes) we discussed the initial draft (preview link) and the direction for the protocol. My proposal is to reuse concepts and design from WebDriver BiDi as a starting point. (BiDi is short for bidirectional)

There was no opposition to this in the call, but a request to document this in an issue and ask for input from other parties not present in the call, hence this issue.

Here are a few points to summarize the proposal:

  • Use a WebSocket connection as the communication channel between the local end (the client, the testing API) and the remote end (the server, driving and listening to the screen reader).
  • Messages are JSON encoded.
  • The structure of the JSON messages are specified in CDDL (but could be something else, e.g. OpenAPI or English prose)
  • No need for HTTP messaging support. WebDriver BiDi supports BiDi-only sessions, which would be our only supported mechanism.
  • Commands are grouped into modules. The modules could be: Sessions, Settings, Actions.
  • For commands, the local end sends a message and assigns a command id (also encoded in the message). The remote end sends a message back with the result and the command id, so the local end knows which command the message applies to.
  • The remote end can at any time send events (see events in WebDriver BiDi), not connected to a command.
  • The security considerations should be tracked throughout as per the roadmap.

The protocol is only one part of the overall system we envision. A client API (similar to Selenium or Playwright) is needed as well as design of the remote end. This could be handled in a design document, not in scope for the protocol spec.

We'll continue to iterate on the draft in this direction, but welcome feedback!

cc @mfairchild365 @cookiecrook @jscholes @aleventhal @ggordon-vispero @feerrenrut

@ggordon-vispero
Copy link

This looks really good to me; Both the issue description and the specification document.

@jkva
Copy link
Collaborator

jkva commented May 17, 2022

Hi all,

Since the proposed service architecture explicitly mentions not extending WebDriver or a particular WebDriver service, and WebDriver is primarily used as architectural inspiration, I would propose to use an alternative to CDDL, namely JSON Schema.

The current proposal explicitly mentions the messages being JSON encoded, by which I assume there will be no encoding format used such as CBOR, for which CDDL is primarily a data definition language, although it can be used for JSON as well. CBOR has wide support across languages, yet there are fewer CDDL libraries available, and some seem to have not been maintained in a while.

As the service protocol itself looks like it will consist of a limited set of commands and queries, writing a specification in JSON Schema against which to validate messages seems like it would simplify implementation support, as JSON Schema is widely implemented across languages and the protocol already specifies the use of JSON.

Hence this comment is mainly regarding adoption and the availability of CDDL validation libraries. While CDDL as a spec does not seem terribly complex to write a validator for, using "plain" JSON and JSON Schema might ease service implementation.

The only downside I can see is that JSON Schema is an IETF Draft, while CDDL is a IETF Proposed Standard.

Let me know if I've overlooked something that was perhaps previously discussed; I was not part of that meeting. Thanks!

cc @jscholes

@zcorpan
Copy link
Member Author

zcorpan commented May 31, 2022

I've now looked at JSON Schema and CDDL.

Here are the things WebDriver BiDi uses. I assume our needs are unlikely to exceed those of WebDriver BiDi.

  • Types
    • any
    • integer
    • number (integer or float)
    • string
    • boolean
    • null
  • Features
    • type choice
    • allow only a specific value
    • map (or object)
    • array
    • occurrence: optional, zero or more
    • definitions (specify and reference custom types)

From what I can tell, these are covered by both CDDL and JSON Schema.

I considered implementing a translation from CDDL to JSON Schema as part of the spec generation (so that both are available). But I did not find any off-the-shelf tools that do this, and implementing it seems a bit too much sidetrack and might mean that contributors need to understand both CDDL and JSON Schema. So I think we shouldn't do this.

When looking for specifications using JSON Schema, I searched in in w3c/webref and found https://w3c.github.io/manifest/ and https://w3c.github.io/miniapp-manifest/ which normatively define the JSON structure in terms of the Infra Standard and also provide a non-normative JSON Schema in an appendix. This is an option worth considering I think.

Is there a preference between Infra Standard based specification (similar to Web Application Manifest) and normative JSON Schema?

@mzgoddard
Copy link

@zcorpan Reading your comment, it seems to me that we should restate that we want to specify a normative description of the protocol. In addition we will produce non-normative content. A complete schema based off the specification would be such a non-normative appendix.

BiDi's specification uses CDDL in normative sections, but you don't have software directly read those normative sections of the specification to make software based on it. Software might read a non-normative section containing a complete CDDL file or a complete JSON Schema file.

Given that, I don't think JSON Schema lends itself to use in specification normative sections. JSON Schema does lend itself for non-normative sections.

Does that seem about right?

@zcorpan
Copy link
Member Author

zcorpan commented Jun 1, 2022

Yeah, although it would be possible to use a schema language to specify some of the normative requirements.

I think a benefit of using English prose and the Infra Standard is that readers don't have to learn the syntax and rules of a schema language to understand the requirements.

@s3ththompson
Copy link
Member

I'd like to try to summarize the conversation in the thread so far.

We are discussing the best format to specify the structure of the JSON messages in the protocol. The two best options so far are CDDL and JSON Schema.

In order to compare apples to apples, @mzgoddard translated the Remote End Definition from WebDriver Bidi as an example in both formats:

CDDL

Command = {
  id: uint,
  CommandData,
  *text => any,
}

CommandData = (
  SessionCommand //
  BrowsingContextCommand
)

EmptyParams = { *text }
  • Pros: CDDL is more compact and human readable (and therefore easier to author by hand), especially as it relates to referencing custom types. It is also used by other W3C specifications, including our precedent WebDriver BiDi.
  • Cons: CDDL has a smaller ecosystem of tooling than JSON Schema, with parsers in Rust and Node.js (maintained by Sauce Labs). WebDriver BiDi recently switched to the Rust parser because it was better maintained and allowed validation.

JSON Schema

{
  "$schema": "http://json-schema.org/schema",
  "$ref": "Command",
  "definitions": {
    "Command": {
      "properties": { "id": { "type": "number" } },
      "additionalProperties": { "$ref": "CommandData" },
      "patternProperties": { ".*": {} }
    },
    "CommandData": {
      "oneOf": [
        { "$ref": "SessionCommand" },
        { "$ref": "BrowsingContextCommand" }
      ]
    },
    "EmptyParams": { "items": { "type": "string" } }
  }
}
  • Pros: JSON Schema enjoys broad tooling support across all major programming languages. Parsers likely exist in all programming languages used by ATs, including Python and C/C++.
  • Cons: JSON Schema is more verbose and less human readable (and therefore harder to write and review by hand), since referencing custom types or patterns often requires wrapping information in an extra object. It is used by the W3C specifications for WebApp Manifest and WebApp Mini Manifest, but only as a non-normative appendix (the WebApp Manifest JSON Schema is unofficial and maintained by a third-party).

Other Thoughts

@zcorpan brought up the Infra Standard (used by the WebApp Manifest specification) as a way to describe message schemas using natural language, that doesn't require either CDDL or JSON Schema. I don't think this is necessarily preferable for anyone, but it is an alternative.

Finally, @mzgoddard made the distinction above between the normative parts of the spec and the non-normative parts. The specification needs to unambiguously describe the shape of the JSON messages that are part of the AT Automation protocol. This is the normative part. However, these descriptions (in any schema language) are just snippets of code that are supported by unambiguous natural language that describes when, how, and why certain messages are sent and received. The schema language code snippets alone are not enough to be used to programmatically generate a valid implementation (although they can be used to programmatically validate individual message)

If I understand correctly, @mzgoddard is saying that if we want a machine-readable spec, we should really be discussing writing and maintaining a non-normative appendix, which would be a complete, self-contained file that contains all of the code necessary to describe the full protocol. This is how other specifications handle the separate use-cases of unambiguous, human-readable spec (the normative parts) and parseable machine-readable code for implementation (a non-normative appendix). I think this distinction is useful to keep in mind. (In fact, the WebDriver BiDi group opened an issue about creating an index of all of the CDDL snippets in the appendix.)

@zcorpan
Copy link
Member Author

zcorpan commented Jun 7, 2022

Thank you, @s3ththompson.

I think we should use CDDL for the normative part because it's easier to read and understand for humans (maybe even more so for AT users). Implementations can still use JSON Schema instead, if they so choose, or even write code to validate messages without any schema.

We can also still include a JSON Schema in a non-normative appendix. The conversion can be done manually.

@lolaodelola
Copy link

@jugglinmike This seems like it might be an outdated conversation considering where we are in the protocol design and spec writing process, is it ok to close this issue?

@jugglinmike
Copy link
Contributor

@lolaodelola Indeed it is. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants