diff --git a/README.md b/README.md new file mode 100644 index 00000000..dca278df --- /dev/null +++ b/README.md @@ -0,0 +1,225 @@ +# Explainer: Trusted Types for DOM Manipulation + +## The Problem + +As described in Christoph Kern's "[Securing the Tangled Web](https://research.google.com/pubs/pub42934.html), +Google has been fairly successful at combating DOM-based XSS attacks by relying on a set of +[typed objects](https://github.com/google/safe-html-types/blob/master/doc/index.md) instead of +strings to represent HTML snippets, URLs, etc. Compilation-time analysis ensures that only these +types can be assigned to various DOM APIs that can be used as DOM-based XSS sinks (`el.innerHTML`, +`location.href`, and so on). These types do not mitigate XSS in themselves, but instead aim for a +state where security reviewers don't need to deeply understand and review each and every usage of +a given sink, but can instead focus their efforts on the code that generates the typed objects. As +long as these "trusted" types are always generated by safe templating libraries, sanitizers, +constants, and so on, developers can have a high degree of confidence that the risk of DOM-based +XSS remains low. + +Google's internal implementation has a number of bells and whistles (and makes a number of +assumptions about requirements) that probably aren't suitable for the world at large. It would be +interesting to explore how we might extract some more generic version of this concept from those +internal tools in order to bring this kind of functionality to the web in a generic fashion. +For example, different applications might have different opinions about what makes a particular +HTML snippet "safe", but regardless of the definition, it seems clear that the browser is +well-positioned to enforce type constraints dynamically at runtime. That would be a substantial +improvement over the tight link between the type system and the compiler. + +## A Possible Approach + +While we could jam all sorts of sanitization functionality into such a system, it seems reasonable +to start small until we know how existing templating systems and sanitizers will layer any +primitives we introduce into their existing systems. The following three-pronged approach seems +compelling as a first step: + +1. Introduce a number of types that correspond to the XSS sinks we wish to protect. For example, + we could define a `TrustedHTML` object that would automatically escape interesting characters, + making it suitable for injection via `innerHTML`. + + These types should be pretty minimal in nature, making them polyfillable in browsers that don't + support them natively. + +2. Enumerate all the XSS sinks we wish to protect, and overload each of them with a variant that + accepts a safe type. For example, `Element.innerHTML`'s setter could accept `(DOMString or TrustedHTML)`, + and we could overload `document.write(DOMString)` with `document.write(TrustedHTML)`. + + As above, this mechanism should be polyfillable; the polyfilled types could define stringifiers + which would enable them to be automatically cast into strings when called on existing setters. + +3. Introduce a mechanism for disabling the raw string version of each of the sinks identified + above. For example, something like a theoretical `Content-Security-Policy: require-trusted-types` + header could cause the `innerHTML` setter to throw a `TypeError` if a raw string was passed in. + + This is a little more difficult to polyfill, but should be possible for many (all?) setters and + methods that aren't marked as [`[Unforgeable]`](https://heycam.github.io/webidl/#Unforgeable). + +### Trusted Types + +* **TrustedHTML**: This type would be used to represent a trusted snippet that could be passed + into an HTML context. + + ``` + interface TrustedHTML { +  static TrustedHTML escape(DOMString html); +  static TrustedHTML unsafelyCreate(DOMString html); + + stringifier; + } + ``` + + * The static `escape` method would produce a `TrustedHTML` object that neutered the string + provided by entity-encoding all instances of `&`, `<`, `>`, `"`, and `'`. + + * The static `unsafelyCreate` method would produce a `TrustedHTML` object that accepted the + provided string as-is. + +* **TrustedURL**: This type would be used to represent a trusted URL that could be used to load + resources or navigate a frame. + + ``` + interface TrustedURL { + static TrustedURL sanitize(DOMString url); + static TrustedURL unsafelyCreate(DOMString url); + + stringifier; + } + ``` + + * The static `sanitize` method would produce a `TrustedURL` object that would resolve the + given string against the document's base URL, and ensure that result was a valid URL, and + that it had an `http` or `https` scheme (blocking things like `javascript:` or external + protocol handlers). String that didn't make the cut would be replaced with `about:invalid`. + + * The static `unsafelyCreate` method would produce a `TrustedURL` object that accepted the + provided string as-is, producing a URL by resolving the given string against the document's + base URL. + +* **TrustedTODO**: TODO(koto@) + +### DOM Sinks + +* **HTML Contexts**: Given something like `typedef (DOMString or TrustedHTML) HTMLString`, we'd + poke at a number of methods and attribute setters to accept the new type: + + ``` + partial interface Element { + attribute HTMLString innerHTML; + attribute HTMLString outerHTML; + void insertAdjacentHTML(DOMString position, HTMLString text); + }; + ``` + + ``` + partial interface Document { + void write(HTMLString text); + void writeln(HTMLString text); + }; + ``` + + ``` + partial interface DOMParser { + Document parseFromString(HTMLString str, SupportedType type); + }; + ``` + + ``` + partial interface Range { + DocumentFragment createContextualFragment(HTMLString fragment); + }; + ``` + + ``` + partial interface HTMLIFrameElement { + DOMString srcdoc; + }; + ``` + +* **URL Contexts**: Given something like `typedef (USVString or TrustedURL) URLString`, we'd poke + at a number of methods and attribute setters to accept the new type: + + ``` + partial interface Location { + stringifier attribute URLString href; + void assign(URLString url); + void replace(URLString url); + + // (These aren't `URLString`, but they should be something) + DOMString pathname; + DOMString search; + }; + ``` + + ``` + // A few element types go here. `HTMLBaseElement`, `HTMLLinkElement` + // `HTMLHyperlinkElementUtils` from a quick skim through HTML. + partial interface HTMLXXXElement : HTMLElement { + attribute URLString href; + }; + ``` + + ``` + // A few element types go here. `HTMLSourceElement`, `HTMLImageElement`, + // `HTMLIFrameElement`, `HTMLEmbedElement`, `HTMLTrackElement`, + // `HTMLMediaElement`, `HTMLInputElement`, `HTMLScriptElement`, `HTMLFrameElement` + // from a quick skim through HTML. + // + // The same applies to their SVG variants. + partial interface HTMLXXXElement : HTMLElement { + attribute URLString src; + attribute URLString srcset; // Only `HTMLSourceElement` and `HTMLImageElement` + }; + ``` + + ``` + partial interface HTMLObjectElement : HTMLElement { + attribute URLString data; + attribute URLString codebase; + }; + ``` + ``` + partial interface Document { + attribute URLString location; + }; + ``` + + ``` + partial interface Window { + attribute URLString location; + void open(URLString location); + }; + ``` + + ``` + partial interface WorkerGlobalScope { + void importScripts(URLString... urls); + }; + ``` + +* **JavaScript Contexts**: Replace `DOMString` in the following with something + reasonable. + + ``` + partial interface Window { + void eval(DOMString code); + void setTimeout(DOMString code, int timeout); + void setInterval(DOMString code, int timeout); + }; + ``` + + ``` + partial interface HTMLScriptElement : HTMLElement { + attribute DOMString innerText; + attribute DOMString text; + attribute DOMString textContent; + }; + ``` + +## Open Questions + +1. Sebastian doesn't like `Content-Security-Policy`, so maybe we should spell the flag in #3 above + differently. He proposed `Disable-Unsafe-APIs: True`. + +2. Artur and Koto suggest that we'll need something more granular than the global flag, however + we spell it, in order to deal with piecemeal migrations. + +3. Define more types. + +4. Document more sinks.