Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine readable definitions #21

Closed
foolip opened this issue May 29, 2020 · 31 comments
Closed

Machine readable definitions #21

foolip opened this issue May 29, 2020 · 31 comments

Comments

@foolip
Copy link
Member

foolip commented May 29, 2020

This is the BiDi sibling of issue w3c/webdriver#1510, see that issue's description for the full background.

The solution for REST and BiDi likely won't be the same, and we might do one without the other.

For BiDi specifically, @bwalderman has already put together a openrpc.json proposal.

@foolip
Copy link
Member Author

foolip commented May 29, 2020

@bwalderman was openrpc.json assembled by hand? How about the API Reference, is that generated from openrpc.json?

@jgraham
Copy link
Member

jgraham commented May 29, 2020

All of these api definition formats seem to use JSON Schema for the actual definitions. I'm not convineced that we really care about the value add of the additional layers on top of that; from a skim it looks like the additional features are about service discovery and licensing, which I don't think we particularly care about. In particular I see the following as use cases for machine-readable defintions in the spec:

  • Reduce the spec-text boilerplate describing de(serilaization) of messages
  • Make more of the spec constraints machine verifiable (e.g. ability to cross check that all errors are one of the accepted codes)
  • Give browser authors and client authors definitions they can use directly implement the de(serialization) of messages
  • Provide better documentation of the expected message format compared to having to reverse engineer the browser steps

I see the following as non-goals:

  • Allowing generic RPC clients to connect to WebDriver endpoints and navigate them without specific understanding of the protocol semantics

So I don't think we want endpoints that produce schema documents to allow clients to introspect the API or anything; in practice all the WebDriver and CDP clients are providing significant value-add over the mechanical conversion of protocol endpoints into code, and in any case updates to the spec will be accompanied by updates to the published schema, so we don't also need to allow introspection.

Given that, I think we should just write json schema directly and not try to adopt any of the higher layer stuff like [Async|Open]API which afaict are mostly addressing needs we don't have.

@bwalderman
Copy link
Contributor

@foolip yes, openrpc.json was hand-written and the API reference was generated from it using https://github.com/open-rpc/schema-utils-js and some HTML templates.

@foolip
Copy link
Member Author

foolip commented May 29, 2020

I see. I guess it’s not worth the effort now to put that build step into CI, but if we have a schema file later that’d make sense.

@christian-bromann
Copy link
Member

christian-bromann commented Jun 2, 2020

It seems that the tooling for the OpenRPC spec is quite limited compared to the OpenAPI tools out there. It doesn't seem to be that difficult either to put something together that can:

  • resolve the spec: create one large openrpc.json file based on many Yaml files
  • lint the spec: re-using validateOpenRPCDocument
  • generate an html file

While I don't think it makes sense to have spec text in the OpenRPC document it could be valueable to have parts of the bikeshed document be generated based on the OpenRPC doc.

@jgraham
Copy link
Member

jgraham commented Jun 2, 2020

I've been looking at this some more. Even JSON schema seems like it's focused on something that's not quite perfect for the descriptive part of our needs (although I certainly think we are going to want to be able to generate JSON schema since that's probably the best tooling here). In particular it has quite a low-level focus on matching the on-the-wire representation of types.

It seems like CDP is using some custom pdl format that's more like a high-level description of the various commands and types, and using that to generate at least JSON, TypeScript definitions, and Go bindings. That being some bespoke format is obviously troubling, but it at least looks like it solves some of the problems we have.

For concreteness, let's assume we have a message format like

{
  id: <Integer>
  method: <CommandName>
  params: <CommandParams>
}

And we have some example command like one to enable a set of events for a specific set of browsing contexts

{
  id: <Integer>
  method: "enable",
  params: {
    events: Array<EventName>,
    contexts: Optional<Array<ContextId>>
 } 
}

Then we should ideally be able to express the following properties:

  • CommandName is a string that corresponds to a known command name
  • CommandParams is an object which is represented by a type/schema according to the value of CommandName.
  • "enable" is a valid value of CommandName, corresponding to a CommandParams type with the events and contexts key.
  • EventName is a string corresponding to a known name of an event
  • ContextId is a typedef for an integer that represents a browsing context id

This should scale to hundreds of commands, events and types without significantly violating DRY (e.g. by having to keep a list of strings that are valid command names separate from the list of commands themselves).

@christian-bromann
Copy link
Member

It seems for me that OpenRPC can fulfil these requirements, given the example above such an OpenRPC representation could look like this:

{
    "name": "Network.enable",
    "tags": [
        { "$ref": "#/components/tags/Command" },
        { "$ref": "#/components/tags/Network" }
    ],
    "summary": "Enable notifications for an event.",
    "paramStructure": "by-name",
    "params": [
        {
            "name": "events",
            "summary": "The name of the event to subscribe to. See Events for a full listing.",
            "required": true,
            "schema": {
                "type": "array",
                "items": { "$ref": "#/components/schemas/EventName" }
            }
        },
        {
            "name": "contexts",
            "summary": "A list of context ids to connect the events to",
            "required": false,
            "schema": {
                "type": "array",
                "items": { "$ref": "#/components/schemas/ContextId" }
            }
        }
    ],
    "result": { "$ref": "#/components/contentDescriptors/NullResult" }
}

Note that the method name would be always something like "<domain>.<method>" tagged with Command and its represented domain where events could be tagged with Event and its represented domain.

@foolip
Copy link
Member Author

foolip commented Jun 4, 2020

For the purpose of making it easy to define commands/responses/errors/notifications I'd like make a concrete proposal for a syntax similar to Web IDL, which is already familiar to many spec authors. Illustrated with a bunch of random examples:

domain Page {
  // like https://w3c.github.io/webdriver/#navigate-to
  command navigate {
    // the command parameters:
    required string url;
    optional string referrer;
  }; // no response parameters

  // events caused by but not a response to navigate:
  event navigationStart {
    timestamp startTime;
  };
  event navigationEnd {
    // just making things up...
    timestamp startTime;
    timestamp endTime;
  };

  // an event for when modal dialogs are opened
  enum ModalDialogType { "alert", "confirm", "prompt" };
  event modalDialogOpen {
    ModalDialogType type;
    string message;
  };

  // like https://w3c.github.io/webdriver/#accept-alert
  command acceptModalDialog {
    string promptText; // for "prompt" only
  };

  // like https://w3c.github.io/webdriver/#dismiss-alert
  command dismissModalDialog {};
};

  // like https://w3c.github.io/webdriver/#print-page
  enum PrintOrientation { "portrait", "landscape" };
  command printToPDF {
    optional PrintOrientation orientation = "portrait";
    // lots more
  } => {
    // this is a response parameter:
    bytes pdfData;
  };
};

I write this down not because I think it's urgent that we do something like this, but because #26 brought it to mind. A few observations:

  • This doesn't give a way to specify the error data/parameters
  • This doesn't provide all the information needed to produce an OpenRPC file

@jgraham
Copy link
Member

jgraham commented Jun 23, 2020

CDDL is another IDL variant we could use here. It has the advantage that there's an RFC to point at and some existing tooling. It's also way easier to read/write by hand than JSON Schema. It's definitely not perfect, but might be better than inventing something entirely new.

@jgraham
Copy link
Member

jgraham commented Jul 1, 2020

In direct response to @foolip's suggestion, I am wary of inventing something entirely new. We don't want to be side tracked into specifying a schema format rather than specifying an actual protocol :) That said, the more I think about it the more opposed I am to writing JSON schema directly; I think the format is too verbose and ugly, and doesn't really have the primitives we wanted in the sense that it's very focused on on-the-wire values and doesn't provide the formatlism for describing things as types.

One idea I had today is to define things as TypeScript interfaces. That has the advantage that there are several TypeScript-to-JSON-schema tools available, and it's also familiar to many web devs. The main problem is that afaik there isn't a standard to point at for the syntax, so we might have to handwave a bit. I would certainly expect us to have (generated) JSON Schema as an appendix, since that seems like it's going to be most useful for implementors.

@foolip
Copy link
Member Author

foolip commented Jul 1, 2020

Taking a look at https://www.typescriptlang.org/docs/handbook/interfaces.html, that seems like a reasonable fit for describing things that are JSON objects on the wire, so the parameters primarily. I suspect what we'll run into very soon is that we need more types and perhaps subsets of existing types, but perhaps that's all supported in TypeScript.

@jgraham, with this approach, how do you see the name of the command itself and the domain (if we have those) being represented? Namespaces and functions, perhaps?

@foolip
Copy link
Member Author

foolip commented Jul 1, 2020

I suggested in our meeting just now to first try to pin down the "model" of what our machine readable definitions are expressing. If we agree on that, the rest will "just" be syntax which does matter for spec authoring ergonomics, but many alternatives that aren't too verbose could work.

With that said, I think the (nested) model is roughly:

  • Domains, which have:
    • A name
    • Commands, which have:
      • A name
      • Parameters
      • Return type (not strictly needed for the formalism to be useful for spec authors)
      • Errors types (maybe, it's debatable whether it's useful to have this)
    • Events, which have a name and:
      • A name
      • Parameters

An individual parameter is defined by its name, its type and its optionality.

Is this missing anything? Other than the return and error types, is anything else possible to cut?

@bwalderman
Copy link
Contributor

@foolip that model looks good. I would say return types are important for us to be able to write tests.

@jgraham
Copy link
Member

jgraham commented Jul 1, 2020

I think types, and the ability to define things as types, are important. For example we might have something like

type EventSelector = String;

enum TargetId {
    BrowsingContextId,
    RealmId
}

Enable extends Command {
    name: "enable",
    targets: Array<TargetId>,
    events: Array<EventSelector>
}

and then prose to define that e.g. EventSelector is a pattern that matches event names or something.

@jgraham
Copy link
Member

jgraham commented Jul 1, 2020

For comparison what CDP uses looks pretty good, but it is custom and very tied to CDP concepts: https://github.com/ChromeDevTools/devtools-protocol/blob/master/pdl/browser_protocol.pdl

@foolip
Copy link
Member Author

foolip commented Jul 1, 2020

From #21 (comment) it's clear I forgot one thing, which we discussed in today's meeting. Commands should probably list the targets they can be sent to, and events should list the targets they can be emitted from.

@foolip
Copy link
Member Author

foolip commented Jul 1, 2020

A non-trivial amount of complexity, I expect, will be in defining types for parameter/return/error types. This is somewhat connected to #16, but I think at the very least we'll need:

  • strings
  • numbers (likely both float and int)
  • booleans
  • enums (likely with string values)
  • sequences (likely parameterized like Web IDL's sequence<T>)
  • dictionaries or interfaces, something to define objects

@jgraham also mentioned union types. An example of where we might end up using that would be helpful. I can't tell if CDP has that, but I'm not sure what keyword to search for :)

@bwalderman
Copy link
Contributor

bwalderman commented Jul 1, 2020

I'm thinking some more about how the machine readable definitions will be integrated into the spec prose. I'm assuming there will still be ordinary spec text describing each command's behavior, so it would make sense to keep the machine readable type definitions for a command near its spec text. At a minimum each command spec would need a machine readable definition for it's parameter type (an object), and it's return type (also an object). Events would just need a parameter type. Common types that are used in more than one place (e.g. browsing context or realm IDs) could defined in a separate section.

As a concrete example:

Navigate To

The command causes a browsing context to navigate to a new location.

Parameters

interface NavigateParams {
    url: string
}

Returns

null

Remote End Steps

... Remote ends steps go here...

The remote end steps assume the existence of a parameters variable that has already been deserialized and validated as a NavigateParams object by the command processing algorithm. The remote end steps return a value and the command processing algorithm is responsible for validating this object matches the command's stated return value type (in this case, null) and serializing that value to send back over the wire.

@foolip
Copy link
Member Author

foolip commented Jul 1, 2020

@bwalderman something along those lines is precisely what I've been envisioning, where the machine readable bits can be split into many small code blocks, and one only needs to define remote end steps which can use the parameters with the correct types directly.

@jgraham
Copy link
Member

jgraham commented Jul 1, 2020

I also agree that's how we want the spec to look in the end.

@bwalderman
Copy link
Contributor

TypeScript and Web IDL seem like the best options since they are both easy for humans to read/write and have readily available tooling. Also, both have expressive enough type systems to cover our scenario and look more or less the same for the subset of functionality we'll likely be using.

I'm leaning towards Web IDL. The standard provides some useful algorithms such as default steps for converting an IDL value to JSON and [checking if an object implements an interface]. These will come in handy for specify how commands/events are serialized/deserialized over the wire.

@jgraham
Copy link
Member

jgraham commented Jul 2, 2020

I don't see how Web IDL as-such would work. There's a big assumption in WebIDL that you're making DOM APIs and a lot of the tooling around the platform assumes that too. We could do something WebIDLish, but it's not going to be exactly the same.

Regarding sum types, if I was modelling this in a language supporting that I might start from

enum Message {
    CommandMsg(Command),
    ResponseMsg(Response),
    EventMsg(Event),
    ErrorMsg(Error)
}

struct Command {
    id: uint,
    data: CommandData
}

enum CommandData {
     Enable(EnableCommand),
     Navigate(NavigateCommand),
    […]
}

typedef RealmId = String
typedef ContextId = String

enum TargetId {
    Context(ContextId),
    Ream(RealmId)
}

EnableCommand {
    targets: Array<TargetId>,
    commands: Array<String>
}

NavigateCommand {
    context: ContextId,
    url: String
}

It's not the only way to do it of course, but being able to say things like "a target id is either a realm id or a context id, serialized in a way that allows dsicriminating the two" seems useful.

@bwalderman
Copy link
Contributor

Web IDL is theoretically language-agnostic. In practice, half the spec is dedicated to the ECMAScript binding and there are no other bindings mentioned, so yeah I agree there's a big assumption that this is for DOM APIs today. However, we might be able to add a "WebDriver" binding to that specification and fill in any gaps we need.

While TypeScript interfaces are more than suitable for our needs, I'm not sure how we'd make use of it without some "handwaving" as you pointed out. From what I can tell, in the TypeScript specification, there's no straightforward algorithm we can point to that says "steps for checking if an object implements an interface". We also won't have as many (any??) options to change that spec if needed because at the end of the day, it's a programming language and not an interface definition language. We're not their target audience.

Having said that, I'm not opposed to using TypeScript as our IDL if we can avoid relying too heavily on the TypeScript spec, and avoid having to (re-)?invent compliex algorithms for validating an object against a TypeScript interface. In other words, if we can simply say things like "if params does not implement TypeScript interface X return an error and terminate these steps". Another thing to keep in mind is that TypeScript is in active development so being explicit about which version we'd be using is important.

@bwalderman
Copy link
Contributor

@jgraham, since the example language above doesn't correspond exactly to the wire representation (i.e. JSON), do you expect it would be accompanied by spec text explaining how to (de)serialize it?

@jgraham
Copy link
Member

jgraham commented Jul 22, 2020

For some context on #44 I chatted with @bwalderman about formats and we came to the conclusion that although there's nothing perfect available, CDDL is probably the best available option. WebIDL doesn't really match the use case of defining a wire protocol. JSON schema is pretty verbose to write, and could be problematic if we ever have a binary form in the future. Doing something custom like PDL or something that looks like TypeScript would probably give the best outcome, but in practice the amount of work required to specify the syntax is itself going to be large. CDDL gives us a fairly compact representation that's already seen usage in W3C specs defining protocols, and some degree of future compatibility if we ever add a CBOR transport. It's not perfect, but it doesn't seem worth blocking for longer on deciding something here.

@foolip
Copy link
Member Author

foolip commented Aug 13, 2020

Thanks for picking something pragmatic and getting it done!

I think we could close this, or keep it open to track a few final bits, which is markup conventions which would make it possible to get the domain and command name and group the parameter and return type definitions.

Also, I'm curious if you found good JS or Python libraries for parsing CDDL while working on this?

@bwalderman
Copy link
Contributor

For libraries, the RFC mentions a ruby gem. The source code for that is here https://github.com/cabo/cddlc. There's also a rust library which seems to be more actively maintained and documented at https://github.com/anweiss/cddl. I didn't find any native JS or Python libraries.

@jgraham
Copy link
Member

jgraham commented Aug 13, 2020

I've been using the rust library locally and it seems reasonable (there's a cli to validate that specific json matches the proposed schema, which was useful for debugging). I could imagine writing Python bindings for it if we want to use it from bikeshed or similar (related: I have some changes in the works for how we structure the schema so the spec ends up with something that could be extracted into a complete schema for endpoints to use, but things are being held up right now so no PR yet).

@anweiss
Copy link

anweiss commented Oct 19, 2020

hey all! I'm new to the WebDriver BiDi effort, but happy to help on the CDDL front. I'm the maintainer of https://github.com/anweiss/cddl, so let me know if if there's anything I can do to improve the library and tooling for your use case.

@gsnedders
Copy link
Member

What if anything is still needed here?

@jgraham
Copy link
Member

jgraham commented Apr 13, 2021

I think we can close this and open more specific issues for the remaining problems.

@jgraham jgraham closed this as completed Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants