diff --git a/.github/workflows/words-to-ignore.txt b/.github/workflows/words-to-ignore.txt index c971691..17513f5 100644 --- a/.github/workflows/words-to-ignore.txt +++ b/.github/workflows/words-to-ignore.txt @@ -1,37 +1,71 @@ ABI +ACM +Agoric Antigoals +AquaVM +Atomics +Auth Autocodec BYOL Bacalhau +Berkley BucketVM +CHa CIDs CLA CapTP +Config +DNS +DSLs +Ericsson FHE FS FVM Filecoin +Frans HydroLogic +IAM IPC IPFS IPFS-FAN IPLD IPLI +IPNS IPVM JIT +JSON +Kaashoek +Lampson +Lemmer-Webber +Lifecycle Linearizability MERCHANTABILITY Memoized Memoizing OCAP +OCaml +OCapN +Perlis +Plotkin PoPs Pre-Draft Prenegotiated README +Requestor +Riise Roadmap SPDX-License-Identifier SPKI +STM +Saltzer +SemVer +Spiritely +TTL +Transactionality +UC UCAN +UI +URI VM WASI Wasm @@ -39,6 +73,7 @@ Wasm-on-IPFS WebAssembly Zelenka acceptor +acyclic behaviours codec codecs @@ -47,11 +82,16 @@ cron dataflow de decrypt +decrypted defunctionalization +dereference +dereferencing +effectful effectfulness enqueuing expede facto +hardcoded idempotence individuals' inspectable @@ -60,25 +100,38 @@ md memoization merchantability micropayment +microservice middleware modelled +namespace no_good_woman non-effectful non-sublicensable +nontermination ocap others' +patentable pipelining +pre-resolved +preimage prenegotiated +repos +requestor runtimes +serverless sexualized signalling socio-economic spiral_calendar +struct subjobs +subtype tradeoff transferee +trustless ucan-chan untrusted +v0 wasm-ipfs woman_scientist world_map diff --git a/README.md b/README.md index e772086..6db99ac 100644 --- a/README.md +++ b/README.md @@ -1,217 +1,309 @@ -# IPVM +# Interplanetary Virtual Machine (IPVM) Spec v0.1.0 -This document currently describes the high-level project goals. +## Editors -# πŸ€Ήβ€β™€οΈ Separate Projects +* [Brooklyn Zelenka](https://github.com/expede), [Fission](https://fission.codes) -1. Autocodec -2. wasm-ipfs -3. IPVM +## Authors -These are related, but separate. The relatedness comes from the fact that if we have one Wasm engine in IPFS, then the other components can rely on it too. +* [Blaine Cook](https://github.com/blaine), [Fission](https://fission.codes) +* [Zeeshan Lakhani](https://github.com/zeeshanlakhani), [Fission](https://fission.codes) +* [Brooklyn Zelenka](https://github.com/expede), [Fission](https://fission.codes) + +## Language -# :no_good_woman: Antigoals: What An IPVM Is Not +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). -* A replacement of IPFS internals β€” that's wasm-ipfs -* A language for distributed applications -* A WASI interface -* A new blockchain +## Depends On -# What Is An IPVM? +* [Multiformats](https://multiformats.io) +* [UCAN Capabilities](https://github.com/ucan-wg/spec) +* [UCAN Invocation](https://github.com/ucan-wg/invocation) -To date, IPFS has been a project focused on data. IPVM is the codename for an attempt to bring execution to IPFS. To this end, IPVM aims to be the easiest, fastest, most secure, and open way to run Wasm anywhere. +## Subspecs -Another way to describe IPVM would be "an open, decentralized, and local-first competitor to AWS Lambda". +* Description Formats + * [Workflow](./workflow/README.md) + * [Task](./workflow/README.md#4-task-configuration) + * Host-Managed Effects + * Capabilty Model + * SPKI + * OCap +* Runtime + * Distributed Scheduler + * Planner + * Execution +* Lifecycle + * Request + * Negotiation + * Verification + * Payment Channel +* Wasm ΞΌKernel + * IPFS + * Atomics and STM + * Actor +* First-Class Effects + * Randomness + * HTTP + * FVM -The project leverages Wasm, content addressing, SPKI, and object capabilities to liberate computation from depending on specific prenegotiated services like the big cloud providers. Execution should scale flexibly from on-device by default all the way up to edge PoPs and data centers. +# 0 Abstract -## πŸ“‹ High-Level Attributes +IPVM brings content addressing to computation, via a distributed runtime built on top of IPFS. It leverages the deterministic subset of WebAssembly plus a ukernel that supports IPFS content resolution, memoization, and adaptive optomization. -* Wasm-on-IPFS -* Declarative invocation (not an ABI, see section below) -* Captured results, re-ingested to IPFS -* A distributed scheduler - * Single vs cron, local vs remote vs either, etc -* Matchmaking - * Poster/acceptor - * Push-to-remote -* (Global) memoization table & adaptive optimization -* Managed effects (via an IPVM runtime) -* Mobile (ambient) computing, compute-to-data, and data-to-compute -* "The HTTP of Compute" -* Stretch: autonomous agents +# 1 Motivation -## :woman_scientist: Research Questions +IPVM provides a deterministic-by-default, content addressed execution environment. Computation MAY be run locally or remotely. While local operation has zero latency, there are many cases where remote exection is desirable: access to large data, faster processors, trusted execution environments, or access to specialized hardware, among others. -### Pragmatics +> Because he was talking (mainly) to a set of platform folks he admonished us to think about how we can build platforms that lead developers to write great, high performance code such that developers just fall into doing the β€œright thing”. Rico called this the Pit of Success. +> +> β€” Brad Abrams, [The Pit of Success](https://learn.microsoft.com/en-us/archive/blogs/brada/the-pit-of-success) -Projects like IPVM live and die by balancing power with ease of use. What are the easiest models for users to interact with the system? Clearly BYOL (bring-you-own-language) is a major advantage of Wasm. Are the specific Wasm runtimes that are easier to work with than others? +## 1.1 Minimizing Complexity -The current hypothesis is "convention over configuration" for defining attributes such as when effects should be run, if subjobs are run concurrently, and so on. One direction is to ask developers to configure a small-but-consistent input and output data structure (e.g. as you often see in middleware). This would open up dataflow analysis and (likely pessimistic) call-by-need semantics ("forcing the effect thunks"), removing the need for complex configuration and strategies from the developer, unless they want to overwrite them. +> Every application has an inherent amount of irreducible complexity. The only question is: Who will have to deal with it β€” the user, the application developer, or the platform developer? +> -- [Tesler's Law](https://en.wikipedia.org/wiki/Law_of_conservation_of_complexity) -### Cost semantics +With "jobs" as the unit of execution, programmers gain flexible cache granularity, parallelism, and ___. -* Can it run locally vs incur network to kick over to a more powerful machine? -* To what degree is it possible to hide this complexity from the programmer? +Configuration DSLs like IPVM jobs can become very complex. By their nature, jobs specs are responsible for describing as many -### Trust & Verification Model +By having to account for a huge number of possible cases, the burden is placed on the programmer in exchange for a high degree of control. Sensible defaults, [convention over configuration](https://en.wikipedia.org/wiki/Convention_over_configuration), and scoped settingshelp aleviate this problem. -* Who can decrypt this data? -* How do I know that this is the right answer? -* Who are bad actors in the system? -Are there opportunities for correct-by-construction code? +Partial failure in a deterministic system is simplified by using transactional semantics for the job as a whole. The difficult case lies with any effects that destructively update the real world. -### Matchmaking & Discovery +# Stack Diagrram -* How quickly can a posted job be executed -* Who is equipped to run specific functionality or effects? -* How do we prevent wasted duplicate computation? -* What does a push model look like in this model (e.g. to a specific provider) -### Effect Systems +## 1.2 Humane Design -* "Don't accidentally send the Tweet twice" -* What's the best way to fit effects into a pure invocation model? -* How can we check that it actually ran? -* Should continuations be modelled as delimited effects? - * If so, perhaps the runtime should have low-ish level `shift` & `reset` calls -* How hard should we lean into defunctionalization? -* Should we support message passing and/or IPC? +> People are part of the system. The design should match the user's experience, expectations, and mental models. +> +> β€” Jerome Saltzer & M. Frans Kaashoek, Principles of Computer System Design -### Cooperative Optimization +> 8. A programming language is low level when its programs require attention to the irrelevant. +> +> β€”Β Alan Perlis, Epigrams on Programming -* Do typical compiler and VM techniques like JIT and memoization apply? - * Can we share intermediate results of a computation and store them in a global substrate that anyone can participate in? +While higher-level interfaces over IPVM Workflows MAY be used, ultimately configuration is the UI at this level of abstraction. The core use cases are moving workflows and tasks between machines, logging, and execution. IPVM Workflows aim to provide a computational model with a clear contract ("few if any surprises") for the programmer, while limiting verbosity. IPVM workflows follow the [convention over configuration](https://en.wikipedia.org/wiki/Convention_over_configuration) philosophy with defaults and cascading configuration. -## πŸ“ƒ Declarative Invocation +## 1.3 Security Considerations -The current hypothesis is that invocations can be configured as a declarative description containing at least: +> A program can create a controlled environment within which another, possible untrustworthy program, can be run safely [but] may leak, i.e., transmit [...] the input data which the customer gives it [...] We will call the problem of constraining a service [from leaking sensitive data] the confinement problem. +> +> Butler W. Lampson, A Note on the Confinement Problem, Communications of the ACM -* The CID of the Wasm blob to execute -* The CIDs of the argument -* Any configuration that overrides defaults - * Max gas - * When to run (i.e. cron) - * Associated UCAN or CapTP - * ...and so on... +IPVM runs in trustless ("mutually suspicious") environments. Conceivably either a workflow proposer or service provider could be mallicious. To limit ___. -![](./assets/dag-invocation.png) +Working with encrypted data and application secrets (section X.Y) is common practice for many workflows. IPVM treats these as effects and affinities. As it is intended to operate on a public network, secrets MUST NOT be hardcoded into an IPVM Workflow. Any task that involves a dereferenced secret or decrypted data β€”Β including its downstream consumers β€” MUST be marked as secret and not distributed. -## 🧾 Captured Session / Receipts +While it is tempting to push authorization concerns to a serapate layer, this has historically lead systems to be built on fundamentally insecure primitives. As such, IPVM Workflows include security considerations directly. It is not possible to control the security model of external effects, but it is possible to secure the inbound boundary to IPVM. -![](./assets/dag-results.png) +Pure computation is always allowed as long as it terminates in a fixed number of steps. An executor -The output may include instructions to run further computation (e.g. continuations or other effects). Represented in the diagram below as a dashed line, sending email or enqueuing a new job are handled by the IPVM runtime. [NB: The exact mechanism is not settled, the exact mechanisms are all subject to change] +Shared-nothing architecture. Even if shared memory is used, it MUST be controlled externally via the effect system (i.e. an outside agent). -![](./assets/dag-effect.png) +# 2 Effect System -## :spiral_calendar: Scheduler +The core restrictions enforced by the design of IPVM Workflows are: -IPVM needs a way of scheduling computation, signalling made matches, returning control to the job queue, and starting continuations? Linearizability is possibly required in the general case; what's the easiest way to signal weaker consistency? How does the scheduler handle failure of nodes, network partitions, etc? +1. Execution MUST terminate in finite time +2. Workflow tasks MUST form a partial order +3. Effects MUST be decalared ahead of time and controlled by the IPVM host -What are the correct default behaviours? Should IPVM computation always operate by (concurrent) graph reduction, or do we need to specify evaluation (and restart) strategies a la Erlang? +While effects MUST be declared up front, they MAY also be emitted as output from pure computation (see the core spec for more). This provides a "legal" escape hatch for building higher-level abstraction that incorporate effects. -## :handshake: Trust Model -UCAN & OCAP/CapTP -Execution Metering +## 2.1 Pure Functions -* https://github.com/ucan-wg/spec -* https://spritelyproject.org/news/what-is-captp.html +## 2.2 Nondestructive Effects -IPVM will often (not always!) execute on remote machines controlled by untrusted third parties. This is potentially precarious for all involved. Some trust is required between participants in all cases. +## 2.3 Destructive Effects -In offline scenarios, such trust may be provided via SPKI. In live systems, ocap (likely CapTP) should be preferred. +# 3 Job Anatomy -### Open Questions +An IPVM job MUST be composed of the following parts: -* Is gas granted directly by capability? (i.e. is gas first-class or an effect?) -* Can SPKI and CapTP interoperate directly? +* Header +* Jobs +* Signature -## :dollar: Payments +## 3.1 Header -Computation is a scarce resource. IPVM is not anti-money; while altruistic computation is _highly encouraged,_ charging for computation is going to quickly become a de facto requirement. The current hypothesis is to bake micropayment capabilities directly into the platform to avoid the immediate capture by prenegotiated providers. +## 3.2 Jobs -IPVM aims to not have an "IPVM token" or similar. Prenegotiated providers paid in fiat and metered in credits SHOULD be supported, as should a "spot market" of compute resources on an ad hoc basis. To maximize user choice, this system should be kept _out or on the fringes_ of the IPVM kernel as much as possible. IPVM MUST still allow for running compute yourself, or pushing compute to machines that you or a friendly agent controls "for free". +The `jobs` field MUST describe a series of jobs that are expected to run in the session. Jobs MUST be one of the following: -* State channels - * ucan-chan (ユーキャンけゃん!) -* Hierarchical consensus and/or Filecoin +1. A pure computation described by pure (content-addressed) inputs to a Wasm binary +2. A named effect with pure (content-addressed) inputs to be executed by the runtime +3. One of the above, with an input that is the result of a previous step -## :rocket: Managed Effects -Effects are the things that happen outside of pure functions: sending email, retrying a failed execution, reading from a database, playing a bell, firing the missiles. +### 3.2.1 Web Assembly Job -Managed effects are handled by the compiler, VM, or runtime. The current hypothesis contains two levels of effect: pure ("platform") effects and impure effects. For completeness, you could say that pure functions are "non-effectful" and also exist on this spectrum of effectfulness. +``` json +{ + "type": "wasm/1.0", + "with": "bafkreie53mk3duiynh5pzmhuzadaif6hpizod5wr6dt34canmxo7j7jfcu", + "input": [ + { "firstName": "Boris" }, + { "lastName": "Mann" } + ], + "maxGas": 4600, + "on": { + "error": [], + "success": [] + } +} +``` -![](./assets/stream-effects.jpeg) +### 3.2.2 Effect Job -Pure effects are ones that "merely" paper over pure functions with helpful abstractions (e.g. implicit state, error handling, continuations). Another way of thinking about them is that they stay "contained" in the system. We can roll back any of these operations, replay them, etc and aside from your CPU generating some extra heat, no one would be the wiser. +### 3.2.3 Pipelining -Impure effects alter the world itself. If I send an email, I'm unable to reverse that action. +Each job MUST be labelled with a string. This label MUST be treated as local to the enclosing workflow. Jobs MAY reference each other's output by label in the `from` field. In the case of multiple return values, the index of the output may be further selected with the `out` field. For exammple: -Pure effects are much more convenient to reason about, compose cleanly, and are inspectable. We want to capture as much of an impure effect as possible as pure descriptions. Returning "receipts" from an impure effect can turn a "request for effect" to a pure description of "...and it returned this specific result", which is a pure tuple. Treating it this way allows for idempotence on a stream of effects over time, capture and reuse of intermediate results, and so on. +```json +{ + "fullName": { + "type": "wasm/1.0", + "with": "bafkreie53mk3duiynh5pzmhuzadaif6hpizod5wr6dt34canmxo7j7jfcu", + "input": [ + { "firstName": "Boris" }, + { "lastName": "Mann" } + ] + }, + "count": { + "type": "wasm/1.0", + "with": "bafkreiegbnixdoqsohfz5oninnhpcpwsf7rg6ewnx2lvhp7p5axejrph64", + "input": [ + { "name": {"from": "fullName", "out": 0 } } + ] + } +} +``` -Impure effects are very powerful, but with great power comes great responsibility... and also often fewer levers for performance optimization. One example is that impure effects often need exactly-once semantics, which requires gaining a global lock on the job (individual effects may be run in parallel, but need to be the only execution for that specific effect). This requires consensus, sometimes even global distributed consensus, which is always slower than being able to work from a distributed queue. +The above is roughly equivalent to the (local) function call: -In general, the number of hosts that can provide a particular effect are smaller than the hosts that can compute pure functions. There may simply be a tradeoff for the programmer to say "yes I really need this effect, even though I'll have fewer options". +```js +fullName({firstName: "Boris", lastName: "Mann"})[0].count() +``` -### What About WASI? + -We will almost certainly need to enable WASI at some stage. This is much more complex, as service discovery becomes a larger, and much more nuanced problem as arbitrary effects can be very difficult to make safe and deterministic. For example, what if the executable fills your disk with garbage? +All resulting graphs MUST be acyclic. The parser MUST check for any cycles and fail immeditely. -Should the core IPVM runtime provide "blessed" effects that operate over the shared memory interface? For example, I see no problem with providing a source of randomness as an external effect because it's useful, and probably safe. It "just" needs to come from "outside" the computation. These can even be captured in the trace. -## :telescope: Sources of Inspiration -We can learn a lot from adjacent projects. Some of these include: -* WASI -* FVM -* Bacalhau -* BucketVM -* IPFS-FAN -* Bloom^L -* PACT/HydroLogic -* Nix -* Dialog -## :world_map: Roadmap -### A. Learning Phase -1. Bash-script store/load/run Wasm from IPFS -1. Memoization table - * Local - * Remote, incl demo "look how fast it is on a remote machine now" -1. Bash-script module pipelining, capturing intermediate results -1. Experimentation with ABI (C conventions?) -1. Pure effects (e.g. atomic FS or DB read/write) -1. Initial attempt to run adaptive optimization on common partial applications -### B. Specs +* Automatic (and deterministic) parallelism +* Dataflow / job graph +* Effects System +* Partial Failure & Transactionality +* Auth: SPKI & object capabilities -* IPLI & session receipts -* Scheduler & matchmaking -* Memoized result format & lookup -* Capability model - * Offline (SPKI) - * Online (ocap/CapTP) -* Verification mechanisms -* Compute on encrypted data (trusted & FHE) +* Wasm execution in depth +* Spec format IPLD + * Input addressing -# FAQ +## 2.2 Implicit Parallelism -## How Does This Differ From Autocodec and wasm-ipfs? +IPVM does not allow programmer control over parallelism. The resources available to the scheulder MAY be very different from run to run. -Autocodec, IPVM, and wasm-ipfs all involve Wasm and IPFS. They are distinct projects, though sharing modules and learning between them is a nice-to-have. Having a Wasm interpreter in every IPFS node makes the argument for all of these projects much easier. +The concurrency plan MUST be derived from the dataflow dependencies. -wasm-ipfs is the replacement of IPFS internals with IPFS, to help share high-quality components across implementations and platforms. -Autocodec is an attempt to replace in-built IPFS codecs with an ad hoc mechanism at read-time. The basic idea is "what if the codec executable was wrapped directly around the IPLD to interpret?" +# 3 Higher Abstractions -IPVM is a distributed execution engine, scheduler, service discovery layer / matchmaking, and memoization system. It is possibly the largest of the three projects. If wasm-ipfs is "IPFS _as_ Wasm", then IPVM is "Wasm _on_ IPFS" +At the lowest level, IPVM jobs only describe the loading of immutible data. -## Resources +* Actors +* Vats +* Map/reduce + +``` ipldsch +type Verification union { + | Oracle + | Consensus + | Optimistic + | ZKP +} representation keyed + +type Oracle union { + | Attestation "attestation" + | ThirdParty(DID) +} + +type Optimistic struct { + confirmations Integer + referee Referee +} + +type Referee enum { + | ZK(ZeroKnowledge) + | Trusted(DID) +} + +type Consensus struct { + agents [DID] +} + +type ZKP enum { + | Groth16 + | Nova + | Nova2 +} +``` + + +# 3 Acknowledgments + +* [Joe Armstrong](https://joearms.github.io/), Ericsson +* [Mark Miller](https://github.com/erights), Agoric +* [Peter Alvaro](https://github.com/palvaro), UC Santa Cruz +* [Joe Hellerstein](https://github.com/jhellerstein), UC Berkley +* [Juan Benet](https://github.com/jbenet/), Protocol Labs +* [Christine Lemmer-Webber](https://github.com/cwebber), Spiritely Institute +* [Quinn Wilton](https://github.com/QuinnWilton), Fission +* [Luke Marsden](https://github.com/lukemarsden), Protocol Labs +* [David Aronchick](https://www.davidaronchick.com/), Protocol Labs +* [Eric Myhre](https://github.com/warpfork), Protocol Labs +* [Irakli Gozalishvili](https://github.com/Gozala), DAG House +* [Hugo Dias](https://github.com/hugomrdias), DAG House +* [Mikeal Rogers](https://github.com/mikeal/), DAG House +* Steven Allen +* Melanie Riise +* Christine Lemmer-Webber +* Peter Alvaro +* Juan Benet + +# 4 Prior Art + +* [Docker Job Controller](https://kubernetes.io/docs/concepts/workloads/controllers/job/) +* BucketVM (UCAN Invocation) +* [WarpForge "Formula" v1](https://github.com/warpfork/warpforge/blob/master/examples/110-formula-usage/example-formula-exec.md) +* [Bacalhau Job Spec](https://github.com/filecoin-project/bacalhau/blob/8568239299b5881bc90e3d6be2c9aa06c0cb3936/pkg/model/job.go#L192) +Bloom +AquaVM +PACT/HydroLogic + +It is not possible to mention the separation of effects from computation without mentioning the algebraic effect lineage from Haskell, OCaml, and Eff. While the overall system looks quite different from the their type-level effects, this work owes a debt to at least Gordon Plotkin and John Power's work on [computational effects](https://homepages.inf.ed.ac.uk/gdp/publications/Overview.pdf), + +# FIXME STASH + +https://www.tweag.io/blog/2020-09-10-nix-cas/ + +https://www.ams.org/journals/tran/1936-039-03/S0002-9947-1936-1501858-0/S0002-9947-1936-1501858-0.pdf + +* confluence +* differential dataflow +* map/reduce +* actors & loops +* captp/ocapn +* Enqueuing new jobs in output +IPVM implements a capability model based on keys, linked certificates, and CapTP. Executor certificate negotiation MUST happen during negotiation, -https://www.youtube.com/watch?v=rzJWk1nlYvs diff --git a/meta/README.md b/meta/README.md new file mode 100644 index 0000000..e772086 --- /dev/null +++ b/meta/README.md @@ -0,0 +1,217 @@ +# IPVM + +This document currently describes the high-level project goals. + +# πŸ€Ήβ€β™€οΈ Separate Projects + +1. Autocodec +2. wasm-ipfs +3. IPVM + +These are related, but separate. The relatedness comes from the fact that if we have one Wasm engine in IPFS, then the other components can rely on it too. + +# :no_good_woman: Antigoals: What An IPVM Is Not + +* A replacement of IPFS internals β€” that's wasm-ipfs +* A language for distributed applications +* A WASI interface +* A new blockchain + +# What Is An IPVM? + +To date, IPFS has been a project focused on data. IPVM is the codename for an attempt to bring execution to IPFS. To this end, IPVM aims to be the easiest, fastest, most secure, and open way to run Wasm anywhere. + +Another way to describe IPVM would be "an open, decentralized, and local-first competitor to AWS Lambda". + +The project leverages Wasm, content addressing, SPKI, and object capabilities to liberate computation from depending on specific prenegotiated services like the big cloud providers. Execution should scale flexibly from on-device by default all the way up to edge PoPs and data centers. + +## πŸ“‹ High-Level Attributes + +* Wasm-on-IPFS +* Declarative invocation (not an ABI, see section below) +* Captured results, re-ingested to IPFS +* A distributed scheduler + * Single vs cron, local vs remote vs either, etc +* Matchmaking + * Poster/acceptor + * Push-to-remote +* (Global) memoization table & adaptive optimization +* Managed effects (via an IPVM runtime) +* Mobile (ambient) computing, compute-to-data, and data-to-compute +* "The HTTP of Compute" +* Stretch: autonomous agents + +## :woman_scientist: Research Questions + +### Pragmatics + +Projects like IPVM live and die by balancing power with ease of use. What are the easiest models for users to interact with the system? Clearly BYOL (bring-you-own-language) is a major advantage of Wasm. Are the specific Wasm runtimes that are easier to work with than others? + +The current hypothesis is "convention over configuration" for defining attributes such as when effects should be run, if subjobs are run concurrently, and so on. One direction is to ask developers to configure a small-but-consistent input and output data structure (e.g. as you often see in middleware). This would open up dataflow analysis and (likely pessimistic) call-by-need semantics ("forcing the effect thunks"), removing the need for complex configuration and strategies from the developer, unless they want to overwrite them. + +### Cost semantics + +* Can it run locally vs incur network to kick over to a more powerful machine? +* To what degree is it possible to hide this complexity from the programmer? + +### Trust & Verification Model + +* Who can decrypt this data? +* How do I know that this is the right answer? +* Who are bad actors in the system? +Are there opportunities for correct-by-construction code? + +### Matchmaking & Discovery + +* How quickly can a posted job be executed +* Who is equipped to run specific functionality or effects? +* How do we prevent wasted duplicate computation? +* What does a push model look like in this model (e.g. to a specific provider) + +### Effect Systems + +* "Don't accidentally send the Tweet twice" +* What's the best way to fit effects into a pure invocation model? +* How can we check that it actually ran? +* Should continuations be modelled as delimited effects? + * If so, perhaps the runtime should have low-ish level `shift` & `reset` calls +* How hard should we lean into defunctionalization? +* Should we support message passing and/or IPC? + +### Cooperative Optimization + +* Do typical compiler and VM techniques like JIT and memoization apply? + * Can we share intermediate results of a computation and store them in a global substrate that anyone can participate in? + +## πŸ“ƒ Declarative Invocation + +The current hypothesis is that invocations can be configured as a declarative description containing at least: + +* The CID of the Wasm blob to execute +* The CIDs of the argument +* Any configuration that overrides defaults + * Max gas + * When to run (i.e. cron) + * Associated UCAN or CapTP + * ...and so on... + +![](./assets/dag-invocation.png) + +## 🧾 Captured Session / Receipts + +![](./assets/dag-results.png) + +The output may include instructions to run further computation (e.g. continuations or other effects). Represented in the diagram below as a dashed line, sending email or enqueuing a new job are handled by the IPVM runtime. [NB: The exact mechanism is not settled, the exact mechanisms are all subject to change] + +![](./assets/dag-effect.png) + +## :spiral_calendar: Scheduler + +IPVM needs a way of scheduling computation, signalling made matches, returning control to the job queue, and starting continuations? Linearizability is possibly required in the general case; what's the easiest way to signal weaker consistency? How does the scheduler handle failure of nodes, network partitions, etc? + +What are the correct default behaviours? Should IPVM computation always operate by (concurrent) graph reduction, or do we need to specify evaluation (and restart) strategies a la Erlang? + +## :handshake: Trust Model + +UCAN & OCAP/CapTP +Execution Metering + +* https://github.com/ucan-wg/spec +* https://spritelyproject.org/news/what-is-captp.html + +IPVM will often (not always!) execute on remote machines controlled by untrusted third parties. This is potentially precarious for all involved. Some trust is required between participants in all cases. + +In offline scenarios, such trust may be provided via SPKI. In live systems, ocap (likely CapTP) should be preferred. + +### Open Questions + +* Is gas granted directly by capability? (i.e. is gas first-class or an effect?) +* Can SPKI and CapTP interoperate directly? + +## :dollar: Payments + +Computation is a scarce resource. IPVM is not anti-money; while altruistic computation is _highly encouraged,_ charging for computation is going to quickly become a de facto requirement. The current hypothesis is to bake micropayment capabilities directly into the platform to avoid the immediate capture by prenegotiated providers. + +IPVM aims to not have an "IPVM token" or similar. Prenegotiated providers paid in fiat and metered in credits SHOULD be supported, as should a "spot market" of compute resources on an ad hoc basis. To maximize user choice, this system should be kept _out or on the fringes_ of the IPVM kernel as much as possible. IPVM MUST still allow for running compute yourself, or pushing compute to machines that you or a friendly agent controls "for free". + +* State channels + * ucan-chan (ユーキャンけゃん!) +* Hierarchical consensus and/or Filecoin + +## :rocket: Managed Effects + +Effects are the things that happen outside of pure functions: sending email, retrying a failed execution, reading from a database, playing a bell, firing the missiles. + +Managed effects are handled by the compiler, VM, or runtime. The current hypothesis contains two levels of effect: pure ("platform") effects and impure effects. For completeness, you could say that pure functions are "non-effectful" and also exist on this spectrum of effectfulness. + +![](./assets/stream-effects.jpeg) + +Pure effects are ones that "merely" paper over pure functions with helpful abstractions (e.g. implicit state, error handling, continuations). Another way of thinking about them is that they stay "contained" in the system. We can roll back any of these operations, replay them, etc and aside from your CPU generating some extra heat, no one would be the wiser. + +Impure effects alter the world itself. If I send an email, I'm unable to reverse that action. + +Pure effects are much more convenient to reason about, compose cleanly, and are inspectable. We want to capture as much of an impure effect as possible as pure descriptions. Returning "receipts" from an impure effect can turn a "request for effect" to a pure description of "...and it returned this specific result", which is a pure tuple. Treating it this way allows for idempotence on a stream of effects over time, capture and reuse of intermediate results, and so on. + +Impure effects are very powerful, but with great power comes great responsibility... and also often fewer levers for performance optimization. One example is that impure effects often need exactly-once semantics, which requires gaining a global lock on the job (individual effects may be run in parallel, but need to be the only execution for that specific effect). This requires consensus, sometimes even global distributed consensus, which is always slower than being able to work from a distributed queue. + +In general, the number of hosts that can provide a particular effect are smaller than the hosts that can compute pure functions. There may simply be a tradeoff for the programmer to say "yes I really need this effect, even though I'll have fewer options". + +### What About WASI? + +We will almost certainly need to enable WASI at some stage. This is much more complex, as service discovery becomes a larger, and much more nuanced problem as arbitrary effects can be very difficult to make safe and deterministic. For example, what if the executable fills your disk with garbage? + +Should the core IPVM runtime provide "blessed" effects that operate over the shared memory interface? For example, I see no problem with providing a source of randomness as an external effect because it's useful, and probably safe. It "just" needs to come from "outside" the computation. These can even be captured in the trace. + +## :telescope: Sources of Inspiration + +We can learn a lot from adjacent projects. Some of these include: + +* WASI +* FVM +* Bacalhau +* BucketVM +* IPFS-FAN +* Bloom^L +* PACT/HydroLogic +* Nix +* Dialog + +## :world_map: Roadmap + +### A. Learning Phase + +1. Bash-script store/load/run Wasm from IPFS +1. Memoization table + * Local + * Remote, incl demo "look how fast it is on a remote machine now" +1. Bash-script module pipelining, capturing intermediate results +1. Experimentation with ABI (C conventions?) +1. Pure effects (e.g. atomic FS or DB read/write) +1. Initial attempt to run adaptive optimization on common partial applications + +### B. Specs + +* IPLI & session receipts +* Scheduler & matchmaking +* Memoized result format & lookup +* Capability model + * Offline (SPKI) + * Online (ocap/CapTP) +* Verification mechanisms +* Compute on encrypted data (trusted & FHE) + +# FAQ + +## How Does This Differ From Autocodec and wasm-ipfs? + +Autocodec, IPVM, and wasm-ipfs all involve Wasm and IPFS. They are distinct projects, though sharing modules and learning between them is a nice-to-have. Having a Wasm interpreter in every IPFS node makes the argument for all of these projects much easier. + +wasm-ipfs is the replacement of IPFS internals with IPFS, to help share high-quality components across implementations and platforms. + +Autocodec is an attempt to replace in-built IPFS codecs with an ad hoc mechanism at read-time. The basic idea is "what if the codec executable was wrapped directly around the IPLD to interpret?" + +IPVM is a distributed execution engine, scheduler, service discovery layer / matchmaking, and memoization system. It is possibly the largest of the three projects. If wasm-ipfs is "IPFS _as_ Wasm", then IPVM is "Wasm _on_ IPFS" + +## Resources + +https://www.youtube.com/watch?v=rzJWk1nlYvs diff --git a/workflow/README.md b/workflow/README.md new file mode 100644 index 0000000..d61dff1 --- /dev/null +++ b/workflow/README.md @@ -0,0 +1,506 @@ +# IPVM Workflow Specification v0.1.0 + +## Editors + +* [Brooklyn Zelenka](https://github.com/expede), [Fission](https://fission.codes) + +## Authors + +* [Brooklyn Zelenka](https://github.com/expede), [Fission](https://fission.codes) +* [Simon Worthington](https://github.com/simonwo), [Bacalhau Project](https://www.bacalhau.org/) + +## Language + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Dependencies + +* [DAG-CBOR](https://ipld.io/specs/codecs/dag-cbor/spec/) +* [UCAN Invocation](https://github.com/ucan-wg/invocation/) +* [VarSig](https://github.com/ChainAgnostic/varsig/) + +# 0 Abstract + +An IPVM Workflow is a declarative cofiguration that extends a [UCAN Invocation](https://github.com/ucan-wg/invocation). A Workflow provides everything required to execute one or more tasks: defaults, tasks and their dependencies, authorization, metadata, signatures, and so on. + +# 1 Introduction + +> In late 1970 or early ’71 I approached IBM Canada’s Intellectual Property department to see if we could take out a patent on the basic idea [of dataflow]. Their recommendation, which I feel was prescient, was that this concept seemed to them more like a law of nature, which is not patentable. +> +> J. Paul Morrison, [Flow-Based Programming](https://jpaulm.github.io/fbp/book.html) + +The potential complexity of a fully distributed execution by untrusted peers is very high. IPVM Workflows reduce the number of possible states by forcing explicit handling of any dangerous effects. The IPVM Workflow spec is a declarative document that MAY be inspected, transmitted, logged, and negotiated. Unlike s systems like WASI, there is a strict separation of effects from pure data, an emphasis on verifiability, and [promise pipelining](http://erights.org/elib/distrib/pipeline.html). + +IPVM Workflows MUST be suitable for the proposal of workflows and negotiation with providers on a discovery layer (ahead of credential delegation), execution on untrusted peer machines, and verification. Workflows SHOULD provide a sufficiently expressive base to build more complex models such as actors, event-driven systems, map-reduce, and so on. + +## 1.1 Design Philosophy + +While IPVM in aggregate is capable of executing arbitrary programs, individual IPVM Workflows are specified declaratively, and tasks workflows MUST be acyclic. Invocation in the declarative style liberates the programmer from worrying about explicit sequencing, parallelism, memoization, distribution, and nontermination in a trustless settings. Such constraints also grants the runtime control and flexibility to schedule tasks in an efficient and safe manner. + +These constraints impose specific practices. There is no first-class concept of persistent objects or loops. Loops, actors, vats, concurrent objects, and so on MAY be implemented on top of IPVM Workflows by enqueuing new workflows with the effect system (much like a [mailbox receive loop](https://www.erlang.org/doc/efficiency_guide/processes.html)). + +## 1.2 Foundational Authority + +IPVM workflows are built on top of cryptographic capabilities, providing a strong basis for distributed computation in trustless networks. This even provides a clear basis for crossing Web 2.0 and Web3 systems, other computation networks, local operation (in full or part). + +The IPVM Workflow spec extends on several other specs that have been developed to provide this basis: + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ β”‚ β”‚ +β”‚ Human Configuration: β”‚ β”‚ +β”‚ Defaults, Exception Handling, Comments, Tags β”‚ β”‚ +β”‚ (IPVM Workflow) β”‚ β”‚ +β”‚ β”‚ Multi-Request Pipelining β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ (UCAN Invocation) β”‚ +β”‚ β”‚ β”‚ +β”‚ IPVM Config, Verification Level, etc β”‚ β”‚ +β”‚ (IPVM Task) β”‚ β”‚ +β”‚ β”‚ β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ β”‚ +β”‚ Call Graph β”‚ +β”‚ (UCAN Invocation) β”‚ +β”‚ β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ β”‚ +β”‚ Authority β”‚ +β”‚ (UCAN Core) β”‚ +β”‚ β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +# 2 Envelope + +The outer wrapper of a workflow MUST contain the following fields: + +| Field | Type | Description | Required | +|-----------------|------------|-------------------------------------------------------------------------|----------| +| `ipvm/workflow` | `Workflow` | IPVM Workflow | Yes | +| `signature` | `VarSig` | [VarSig](https://github.com/ChainAgnostic/varsig/) of serialized fields | Yes | + +| Field | Type | Description | Required | Default | +|------------|------------------------------|------------------------------------------------------------------------|----------|---------| +| `v` | `"0.1.0"` | IPVM workflow version | Yes | | +| `meta` | `{String : Any}` | User-defined object (tags, comments, etc) | No | `{}` | +| `parent` | `[&Workflow, Label] or Null` | The workflow & task label that initiated the current workflow (if any) | No | `Null` | +| `config` | `Config` | Global configuration (e.g. timeout for the entire workflow) | No | `{}` | +| `defaults` | `Config` | Individual task config defaults | No | `{}` | +| `tasks` | `UCAN.Invocation` | UCAN Invocation | Yes | | +| `catch` | `&WasmTask` | Deterministic Wasm that fires on exceptions | No | `{}` | + +## 2.1 Fields + +## 2.1.1 Version + +The `v` field MUST contain the IPVM Workflow version. + +## 2.1.2 Metadata + +The OPTIONAL `meta` field contains a user-definable JSON object. This is useful for including things like tags, comments, and so on. + +## 2.1.3 Parent + +The OPTIONAL `parent` field contains the CID of the IPVM Task that initiated it (if any). + +## 2.1.4 Config + +The OPTIONAL global [`config` object](#3-configuration) sets the configuration for the workflow itself, and defaults for tasks. + +## 2.1.5 Defaults + +The OPTIONAL `defaults` field configures default [configs](#3-configuration) for tasks. + +## 2.1.6 Tasks + +The `tasks` field contains all of the IPVM [Tasks](#4-task-configuration) set to run in this Workflow, each labelled by a human-readable key. + +## 2.1.7 Exception Handler + +The OPTIONAL `catch` field contains a Task with predefined inputs. See the [Exception Handling](#7-exception-handling) section for more deatil. + +## 2.2 IPLD Schema + +``` ipldsch +type SignedWorkflow struct { + work Workflow (rename "ipvm/workflow") + sig VarSig +} + +type Workflow struct { + v SemVer + meta {String : Any} (implicit {}) + parent nullable TaskRef (implicit Null) + global Config (implicit {}) + defauts Config (implicit {}) + tasks UCAN.Invocation + catch nullable &Wasm (implicit Null) +} + +type TaskRef struct { + inv &Invocation + task String -- Label for the task +} representation tuple +``` + +## 2.3 JSON Exmaples + +``` json +{ + "ipvm/workflow": { + "v": "0.1.0", + "meta": { + "tags": ["fission", "bacalhau", "dag-house"] + }, + "global": { + "time": [10, "minutes"], + }, + "defaults": { + "gas": 1000, + "memory": [10, "mega", "bytes"] + }, + "catch": "bafkreifsaaztjgknuha7tju6sugvrlbiwbyx5jf2pky2yxx5ifrpjscyhe", + "tasks": "ucan/invoke": { + "v": "0.1.0", + "nnc": "02468", + "prf": [ + {"/": "bafkreie2cyfsaqv5jjy2gadr7mmupmearkvcg7llybfdd7b6fvzzmhazuy"}, + {"/": "bafkreibbz5pksvfjyima4x4mduqpmvql2l4gh5afaj4ktmw6rwompxynx4"} + ], + "run": { + "notify-bob": { + "with": "mailto://alice@example.com", + "do": "msg/send", + "inputs": [ + { + "to": "bob@example.com", + "subject": "DNSLink for example.com", + "body": "Hello Bob!" + } + ], + "meta": { + "ipvm/config": { + "time": {"minutes": "30"}, + "secret": true + } + } + }, + "log-as-done": { + "with": "https://example.com/report" + "do": "crud/update" + "inputs": { + "from": "mailto://alice@exmaple.com", + "to": ["bob@exmaple.com"], + "event": "email-notification", + "value": {"ucan/promise": ["/", "notify-bob"]} // Pipelined promise + } + } + } + } + }, + "sig": {"/": {"bytes:": "5vNn4--uTeGk_vayyPuNTYJ71Yr2nWkc6AkTv1QPWSgetpsu8SHegWoDakPVTdxkWb6nhVKAz6JdpgnjABppC7"}} +} +``` + +# 3 Configuation + +The IPVM configuration struct defines secrecy, quotas, and verification strategy: + +| Field | Type | Description | Required | Default | +|----------|-------------------|-----------------------------------------|----------|--------------------------| +| `secret` | `Boolean or null` | Whether the output is unsafe to publish | No | `null` | +| `check` | `Verification` | Verification strategy | No | `"attestation"` | +| `time` | `TimeInterval` | Timeout | No | `[5, "minutes"]` | +| `memory` | `InfoSize` | Memory limit | No | `[100, "kilo", "bytes"]` | +| `disk` | `InfoSize` | Disk limit | No | `[10, "mega", "bytes"]` | +| `gas` | `Integer` | Gas limit | No | `1000` | + +This MAY be set globally or configured on [individual Tasks](#4-task-configuration). + +## 3.1 Fields + +### 3.1.1 Secret Flag + +The `secret` flag marks a task as being unsuitable for publication. + +If the `sceret` field is explicitely set, the task MUST be treated per that setting. If not set, the `secret` field defaults to `null`, which behaves as a soft `false`. If such a task consumes input from a `secret` source, it is also marked as `secret`. + +Note: there is no way to enforce secrecy at the task-level, so such tasks SHOULD only be negotiated with runners that are trusted. If secrecy must be inviolable, consider with [multi-party computation (MPC)](https://en.wikipedia.org/wiki/Secure_multi-party_computation) or [fully homomorphic encryption (FHE)](https://en.wikipedia.org/wiki/Homomorphic_encryption#Fully_homomorphic_encryption) inside the task. + +### 3.1.2 Verification Strategy + +The OPTIONAL `check` field MUST supply a verification strategy if present. If omitted, it MUST default to `"attestation"`. + +### 3.1.4 Time Quota + +The OPTIONAL `time` field configures the upper limit in wall-clock time that the executor SHOULD allow. + +### 3.1.5 Memory Quota + +The OPTIONAL `memory` field configures the upper limit in system memory that the executor SHOULD allow. + +### 3.1.6 Disk Quota + +The OPTIONAL `disk` field configures the upper limit in system memory that the executor SHOULD allow. + +### 3.1.7 Gas Quota + +The OPTIONAL `disk` field configures the upper limit in Wasm gas that the executor SHOULD allow. + +## 3.2 IPLD Schema + +``` ipldsch +type SystemConfig struct { + secret Boolean (implicit False) + check Verification (implicit Attestation) + gas Integer (implicit 0) + time optional TimeInterval + memory optional InfoSize + disk optional InfoSize +} +``` + +## 3.3 JSON Examples + +``` json +{ + "secret": true, + "check": {"optimistic": {"confirmations": 2, "referee": "did:key:zStEZpzSMtTt9k2vszgvCwF4fLQQSyA15W5AQ4z3AR6Bx4eFJ5crJFbuGxKmbma4"}}, + "gas": 5000, + "time": [45, "minutes"], + "memory": [500, "kilo", "bytes"], + "disk": [20, "mega", "bytes"] +} +``` + +# 4 Task Configuration + +> With hands of iron, there's not a task we couldn't do +> +> β€”Β [The Protomen](https://en.wikipedia.org/wiki/The_Protomen), The Good Doctor + +Tasks are the smallest level of work granularity a workflow. Tasks describe everything required to the negotate and execute all of the of work. IPVM Tasks are defined as a subtype of [UCAN Tasks](https://github.com/ucan-wg/invocation/blob/main/README.md#32-ipld-schema). Task types MAY require specific fields in the `inputs` field. Timeouts, gas, credits, transactional guarantees, result visibility, and so on MAY be separately confifured in the `ipvm/config` field. + +Tasks MAY be configured in aggragate in the [global defaults](#215-defaults). Individual Task configuration MUST be embedded inside of a [UCAN Action](https://github.com/ucan-wg/invocation)'s `meta['ipvm/confg']` field. + +## 4.1 Fields + +Recall UCAN Invocation Tasks: + +| Field | Type | Description | Required | Default | +|----------|------------------|------------------------------------------------|----------|---------| +| `with` | `URI` | | Yes | | +| `do` | `Ability` | | Yes | | +| `inputs` | `Any` | | Yes | | +| `meta` | `{String : Any}` | Fields that will be ignored during memoization | No | `{}` | + +An OPTIONAL IPVM `Config` MAY be included at the `meta['ipvm/config']` path. The `meta` field SHOULD not captured as part of task memoization, so this informtaion will be omitted from the distributed invocation table. If included, the `Config` MUST set the IPVM configuration for this Task, overwriting any of the fields on the envelope's top-level `defaults` field, or system-wide defaults. + +## 4.3 JSON Examples + +``` json +{ + "simple": { + "with": "dns://example.com?TYPE=TXT", + "do": "crud/update", + "inputs": { + "value": "hello world" + }, + "meta": { + "ipvm/config": { + "secret": false + "timeout": [500, "milli", "seconds"], + "verification": "attestation" + } + } + } +} +``` + +``` js +{ + "some-wasm": { + "with": "wasm:1:Qm12345", // Or something... wasm:Qm12345? + "do": "ipvm/run", + "inputs": { + "func": "calculate", + "args": [ + 1, + "hello world", + {"c": {"ucan/promise": ["/", "some-other-action"]}}, + {"a": 1, "b": 2, "c": 3} + ] + }, + "ipvm/config": { + "v": "0.1.0", + "secret": false, + "check": { + "optimistic": 17, + "referee": "did:key:zStEZpzSMtTt9k2vszgvCwF4fLQQSyA15W5AQ4z3AR6Bx4eFJ5crJFbuGxKmbma4" + } + } + } +} +``` + +``` json +{ + "with": "ipfs://bafkreidvq3uqoxcxr44q5qhgdk5zk6jvziipyxguirqa6tkh5z5wtpesva", + "do": "docker/run", + "inputs": { + "func": "calculate", + "args": [ + 1, + "hello world", + {"c": {"ucan/promise": ["/", "some-other-action"]}}, + {"a": 1, "b": 2, "c": 3} + ], + "container": { + "entry": "/", + "workdir": "/", + }, + "env": { + "$FOO": "bar" + } + }, + "meta": { + "ipvm/config": { + "v": "0.1.0", + "secret": false, + "check": { + "optimistic": 2, + "referee": "did:key:zStEZpzSMtTt9k2vszgvCwF4fLQQSyA15W5AQ4z3AR6Bx4eFJ5crJFbuGxKmbma4" + } + } + } +} +``` + +# 5 Exception Handler + +If present, the OPTIONAL `catch` field MUST be run in response to a `Task` returning on the `Failure` branch. The determinitsic & pure Wasm module MUST take a `Failure` object as input, and MUST return data in the following shape: + +``` ipldsch +type Handle union { + | Success "ok" -- End task with Success object + | String "rewire" -- Task name inside the current Workflow + | String "msg" -- Format the error message and panic +} respresentation keyed +``` + +If the `msg` branch is returned, the invocation MUST immedietly rethrow with the update message. + +Note that while IPVM MUST treat the pure tasks together as transactional. It is not possible to roll back any destructive effects that have already been run. As such, it is RECOMMENDED to have few (if any) tasks depend on the output of a destructive effect, so they can be scheduled at the end of the workflow. + +# 6 Receipt Output + +| Field | Type | Description | Required | Default | +|--------|-----------------|-----------------------------------------------------------------------|----------|---------| +| `inv` | `&Invocation` | CID of the Invocation that generated this response | Yes | | +| `out` | `{String: Any}` | The results of each call, the task's label. MAY contain sub-receipts. | Yes | | +| `meta` | `Any` | Non-normative extended fields | No | `null` | + +If the `catch` field is set on the outer `Workflow`, The `out` field MAY include the output under the `ipvm/catch` key + +# 7 Appendix + +## 7.1 Support Types + +``` ipldsch +type TimeUnit enum { + | Seconds + | Minutes + | Days + | Weeks + | Years +} + +type InfoUnit enum { + | Bits + | Nibble + | Bytes +} + +type Unit union { + | TimeUnit + | InfoUnit +} + +type SubPrefix enum { + | Pico "p" + | Nano "n" + | Micro "u" + | Milli "m" + | Centi "c" + | Deci "d" +} + +type SuperPrefix enum { + | Deca "da" + | Hecto "ha" + | Kilo "k" + | Mega "M" + | Giga "G" + | Tera "T" + | Peta "P" + | Exa "E" +} + +type SIPrefix union { + | SubPrefix + | SuperPrefix +} + +type TimeInterval struct { + magnitude Integer + prefix optional SIPrefix + unit TimeUnit +} representation tuple + +type InfoSize struct { + magnitude Integer + prefix optional SubPrefix + unit InfoUnit +} representation tuple + +type Measure union { + | TimeInterval + | InfoSize +} +``` + +## 7.1.1 JSON Examples + +``` json +[400, "nano", "seconds"] +[5, "seconds"] +[378, "exa", "bytes"] +``` + +# 8 Related Work and Prior Art + +The [Bacalhau Job (Alpha)](https://github.com/filecoin-project/bacalhau/blob/8568239299b5881bc90e3d6be2c9aa06c0cb3936/pkg/model/job.go#L113-L126) spec is a complete runner spec for Docker, Wasm, and Python source. At time of writing, it runs on a volunteer network, and has plans to integrate an authority layer. + +BucketVM and [`w3-machines`](https://github.com/web3-storage) are two approaches from [DAG House](https://dag.house) to extend UCAN to invocations and workflows. At time of writing, both approaches are focused on invocation inside a cloud microservice deployment. Configuration is not required, as jobs are not negotiated. + +[Cloud Native Builpacks](https://buildpacks.io/) are descriptions of an environment that stack together. They output an [OCI](https://opencontainers.org/) container. + +[GitHub Workflows](https://docs.github.com/en/actions/using-workflows) is a configuration to run one or more jobs, hooked into events on their platform. Workflows can be composed out of discrete actions or from other workflows. + +[Lambda Workflows](https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-create-workflow.html) are a serverless workflow layer built on top of AWS, and thus integrates with their other offerings like IAM, S3, payments, and so on. + +[Project Naiad](https://www.microsoft.com/en-us/research/video/introducing-project-naiad-and-differential-dataflow/) and its lineage (e.g. [Timely Dataflow](https://timelydataflow.github.io/timely-dataflow/), [Differential Dataflow](https://timelydataflow.github.io/differential-dataflow/)) offer an extremely powerful dataflow model, including differential updates, control flow cycles, long running processes, but in a trusted environment. Such features could be supported for a subset of task types in IPVM in the future. + +[WarpForge Formulas](https://github.com/warptools/warpforge/blob/master/examples/100-formula-parse/example-formulas.md) describe how to reproducably build and cache packages. The functionality is a specialization of IPVM workflows, and may be configurable with IPVM in the future. + +# 9 Acknowledgments + +[Luke Marsden](https://github.com/lukemarsden) for a long fateful discussion while [stuck on a tarmac](https://www.theguardian.com/world/2022/nov/04/spanish-airspace-partially-closed-as-chinese-rocket-debris-falls-to-earth) about how to make IPVM and Bacalhau work more closely together. + +Thanks to [James Walker](https://github.com/walkah) for helping draw parallels between CIDs, IPLD, and raw bytes for promises in support of complex data pipelines. + +Many thanks to [Quinn Wilton](https://github.com/QuinnWilton) for her review of the spec, suggesting terms that would be most familiar to developers, talking through how to make the exception handling useful-but-safe in a static workflow, and suggesting further work involving fixed point computation. + +Many thanks to [Irakli Gozalishvili](https://github.com/Gozala) for the long discussions about invocation needs at [DAG House](https://dag.house), keeping the conversation grounded in a capabilities worldview, and for the many, many comments on various iterations of this spec across two repos. + +Thanks to [Blaine Cook](https://github.com/blaine) for several conversations about developer expectations and effect systems.