-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactoring of the OCA Specification #86
base: master
Are you sure you want to change the base?
Changes from 3 commits
4cee8fd
90bd1dc
735d059
5dacdac
55b790d
ab3c357
d80cfb7
edf67d3
fc1d713
e2ab835
422939f
34bb431
e425402
2cc67c1
7f1d5ae
39e345b
9dd8c13
8077b64
ded80eb
941971c
6fbbcbe
edef45a
c4812de
699e68a
c4969f6
beed75a
942a5ba
b97cc78
98de0db
bfe4e56
9a66aa3
d0cc8c3
f6572dd
f9fe56a
2b7aa6c
4235bf9
f103747
54ad392
6fa7276
852e689
1aea159
7d27eba
4c204ba
7a3d187
ae632b0
f2c42d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,8 +6,14 @@ description: Official OCA specification | |
# OCA Technical Specification | ||
|
||
<dl> | ||
<dt> | ||
Version: | ||
</dt> | ||
<dd> | ||
v1.0.2 | ||
</dd> | ||
<dt> | ||
Latest published version: | ||
Latest published version: | ||
</dt> | ||
<dd> | ||
|
||
|
@@ -826,41 +832,150 @@ _Example 19. Code snippet for a Sensitive Overlay_ | |
|
||
### Bundle | ||
|
||
An OCA Bundle contains a set of OCA objects consisting of a Capture Base and bound Overlays. An encoded cryptographic digest of the contained objects produces a deterministic identifier for the bundle. | ||
An OCA Bundle is a set of OCA objects which MUST included a `Capture Base` and MAY consist of any number of `Overlays`. An encoded cryptographic digest of the contained objects produces a | ||
deterministic identifier for the bundle. | ||
|
||
The following object types are REQUIRED in any OCA bundle to preserve the minimum amount of structural, definitional, and contextual information to capture the meaning of inputted data. | ||
#### Canonical form | ||
|
||
- Capture base | ||
- Character encoding overlay | ||
- Format overlay | ||
OCA Bundles MUST be serializable to be transferred over the network. The | ||
serialization algorithm MUST be deterministic and operate on the canonical form | ||
of the Bundle, which ensures proper ordering of the attributes within OCA | ||
Objects. The serialization algorithm consists of the following rules: | ||
|
||
The cardinality of several overlay types, particularly the language-specific ones (Entry, Information, Label, and Meta), can be multiple depending on the number of defined supported languages. | ||
- MUST consist of following attributes in this order: `v`, `d`, `capture_base`, `overlays` | ||
- `v` - version string defined per section [Bundle Version](#bundle-version) | ||
- `d` - deterministic identifier of the bundle | ||
- `capture_base` - the `Capture Base` object defined as per section [Capture Base](#capture-base) | ||
- `overlays` - an array, containing all the overlays, sorted ASC by the `d` attribute | ||
|
||
##### Bundle Version | ||
|
||
To ensure proper versioning and identification of bundles within the OCA | ||
Specification, we define a standardized string format for the bundle version. | ||
This format encodes critical metadata about the bundle, allowing for consistent | ||
interpretation and management across implementations. | ||
|
||
*Bundle Version String Format* | ||
|
||
The bundle version string must adhere to the following format: | ||
|
||
``` | ||
EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis.json | ||
├── E3SAKe0z83pfBnhhcZl19PGGKBheb35WeCJ3V6RdqwY8.json | ||
├── Ejx0o0yuwp99vi0V-ssP6URZIXRMGj1oNKIZ1BXi4sHU.json | ||
├── EZv1B5nNl4Rty8CXFTALhr8T6qXeO0CcKliM03sdrkRA.json | ||
├── Eri3NLi1fr4QrKoFfTlK31KvWpwrSgGaZ0LLuWYQaZfI.json | ||
├── EY0UZ8aYAPusaWk_TON8c20gHth2tvZs4eWh7XAfXBcY.json | ||
├── E1mqEb4f6eOMgu5zR857WWlMUwGYwPzZgiM6sWRZkQ0M.json | ||
├── ESEMKWoKKIf5qvngKecV-ei8MwcQc_pPWCH1FrTWajAM.json | ||
├── EyzKEWuMs8kspj4r70_Lc8sdppnDx-hb9QqUQywjmDRY.json | ||
├── EIGknekgJFqjgQ8ah2NwL8zNWbFrllvXVLqezgB6U3Yg.json | ||
├── EgBxL29VsxoZso7YFirlMP334ZuC1mkel-lO7TxPxEq8.json | ||
├── ED9PH0ZBaOci-nbnYfPgYZWGQdkyWxA-nW3REmB3vhu0.json | ||
├── ElJEQGfAvfJEuB7JeNIcvmAPO2DIOaKkpkZyvxO-gQoc.json | ||
├── EpW9bQGs0Lk6k5cJikN0Ep-DN6z29fwZIsbVzMBgTlWY.json | ||
├── EIGj0LQKT9-6gCLV2QZVgi4YQZhrUl0-GKbN7sFTCSAI.json | ||
├── EHDwC_Ucuttrsxh2NVptgBnyG4EMbG5D8QsdbeF9G9-M.json | ||
└── meta.json | ||
OCAS<major><minor><format><size>_ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is the size calculated? It's orthogonal to versioning? I can't think of a strong motivation to include it at this layer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TL;DR: this is for effective streaming. If you go beyond HTTP protocol and merely focus on streams of bytes, consuming the whole chunk (Bundle) out of a stream is simply taking the FWIW, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Streaming = a concern for the messaging layer, not for the application layer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And that's why we have Bundles. If there's no need for exchange , there's no need for a Bundle concept. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @blelump I'm confused by your response. Message streaming information belongs inside an exchange packet, not inside a schema. As it stands, the "version" format (e.g., OCA is solely for defining passive objects, nothing else. It is not a messaging protocol. Messaging should be defined in exchange packets, not in the data schema itself. If you follow the Informatics Domain Model. This separation is clearly defined: Capture = Objects = Schema There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, see below for further explanation.
Yeah, we'd need to add this information.
Thanks to the Bundle canonical form, we know where to look for specific bytes. We specifically know where to look for
This is precisely what we're doing when applying CESR, that is, suffixing the JSON with a sophistically structured text that at first glance looks like garbage. Adding layering here in the context of other components we use the same way and join them, that is: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My $0.02CDN. I’d definitely like to leave off the size of the bundle in the version as it is a pain. Doable if the calculation is well-defined, but annoying at the application layer. I agree that if anyone wants to stream OCA data (which really doesn’t make sense to me), they are welcome to do that by putting a minimal wrapper / prefix that has the size. But it should be outside of the OCA specification. I definitely agree that a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Informatics Domain Model (IDM) should be the blueprint for data-centric modeling, not the OCA Bundle itself. The OCA Bundle belongs strictly in the Blurring these domain boundaries creates two major issues:
Each domain supports a distinct type of search: a.) Attribute search (Object) → Finds structural attributes in an OCA Bundle.
Embedding value-based search parameters in the OCA Bundle mixes passive structure (attributes) with active mechanics (values), making searches imprecise.
Keeping domains separate ensures granular access control: In the case of the two domains in question (i.e., Object & Action) ... If message metadata is stored inside the OCA Bundle, Schema Guardians would have access to exchange intelligence, violating need-to-know governance. My suggestion would be to use an envelope for message/transmission metadata, and remove the "v" attribute (Versioning, Encoding Format & Message Size) from the OCA core specification. This would ensure: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@swcurran major and minor on this level of specification of data containers are likely to be what change in practice? In essence, you don't patch smth as critical as Bundle. It always has an impact. How about we make it optional? I mean the Such a change shall make both worlds happy. cc: @ryanbnl There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's an esoteric specification which, given his it makes very specific demands - requiring specific metadata to be added to a message payload - appears to break the fundamental principle of separation of concerns. OCA is a building block and as such we should not be making assumptions on usage. That means that we can't add metadata (the size) which is only relevant to a specific nice use-case. It's bad design. CESR looks at first glance to be like HL7 v2 and that was a disaster. |
||
``` | ||
|
||
_Example 20. A representation of an OCA Bundle as a ZIP file containing a Capture Base (first row), multiple Overlays, and a metafile (meta.json) that provides key-value mappings between the file names and the names of the OCA object types. Apart from the metafile, each file name directly represents the encoded cryptographic digest of the file._ | ||
Where: | ||
|
||
See [ Appendix A ](#appendix-a-an-example-of-metafile-content) for more information on the content of a metafile (`meta.json` in the above example). | ||
- `OCAS`: A fixed prefix indicating "OCA Structure". This identifies the string as conforming to the OCA Specification's versioning scheme. | ||
- `<major>`: A single-digit integer (0-9) representing the major version of the specification. A change in the major version indicates backward-incompatible updates to the structure. | ||
- `<minor>`: A single-digit integer (0-9) representing the minor version of the specification. A change in the minor version indicates backward-compatible updates. | ||
- `<format>`: A string denoting the serialization format of the bundle. Supported format is: `JSON`: JavaScript Object Notation | ||
- `<size>`: A six-digit, zero-padded integer representing the size of the object in hex notation, size of the object is calculated with `d` field with dummy characters the same lenght as the eventual derived value. The dummy character is #, that is, ASCII 35 decimal (23 hex). | ||
- '_': A version string terminator. | ||
|
||
*Example*: | ||
|
||
A valid bundle version string: | ||
``` | ||
OCAS11JSON000646_ | ||
``` | ||
|
||
This indicates: | ||
- `OCAS` it is a OCA Bundle. | ||
- The major version is 1. | ||
- The minor version is 1. | ||
- The serialization format is JSON. | ||
- The object size in base64 encoding is 646 bytes. | ||
|
||
*Validation* | ||
|
||
Consumers of the OCA Specification must implement validation logic to ensure the bundle version string: | ||
- Matches the defined format and structure. | ||
- Uses only supported serialization formats. | ||
- Accurately represents the object's size in base64 encoding. | ||
|
||
Validation failure must result in the rejection of the bundle as non-compliant with the specification. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a big concern of mine, as I’ve expressed multiple times. Once more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing from my comment above is the (I think unnecessary) calculation of the length of the OCA Bundle before calculating the the
If any consumer of an OCA Bundle cares, they would need to repeat the length calculation and verify it against the length. They are unlikely to do that, because the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Line 1043 ( section: Deterministic Identifier) does exactly that, is that not clear enough? or missing anything? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
the point of SAID is to not enforce any algorithms since they can be use case specific or required to be rotated at any point of time. OCA should not enforce that for every use case. It is up to the ecosystem creator to decide what they want to use e.g. if you creating verifiable credential ecosystem you can agree within ecosystem to use only sha-256 (maybe you need something NIST approved) and where use case in medical care where it needs to run on IoT devices with constrained resources would pick blake3 from practical reasons. remember OCA is not about use case is about meta semantic which allow others build their own use cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This means that there must be a spec on top of OCA for every community to use OCA so that consumers know what cryptography they will need to include to be able to use the OCA. Since only two things need to be defined (hashing and encoding), I think it is very reasonable pick in the OCA spec, a finite number of options for those things — ideally just 1 for each, but several choices is fine Use cases will not be impacted by those selections, but all implementations will be MUCH easier with those choices made. With no guardrails, a consumer has to assume a producer could use anything — or just hope that pick the right ones. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing me to Line 1043 — added a comment there. Very happy to see it. Just to link what I said to this — the spec. does limit the hash and encoding algorithms used per the Deterministic IDs. The permitted hash algorithms are in the CESR spec (bad idea — just put them here — even if they are the same list), and only one encoding algorithm is allowed (with no multibase to allow for a future change…). Good stuff! |
||
|
||
*Example*: | ||
TODO update example | ||
``` | ||
{ | ||
"bundle": { | ||
"v": "OCAS11JSON000646_", | ||
"d": "EKHBds6myKVIsQuT7Zr23M8Xk_gwq-2SaDRUprvqOXxa", | ||
"capture_base": { | ||
"d": "EBnF9U9XW1EqteIW0ucAR4CsTUqojvfIWkeifsLRuOUW", | ||
"type": "spec/capture_base/1.0", | ||
"attributes": { | ||
"d": "Text", | ||
"i": "Text", | ||
"passed": "Boolean" | ||
}, | ||
"classification": "", | ||
}, | ||
"overlays": { | ||
"character_encoding": { | ||
"d": "ED6Eio9KG2jHdFg3gXQpc0PX2xEI7aHnGDOpjU6VBfjs", | ||
"capture_base": "EBnF9U9XW1EqteIW0ucAR4CsTUqojvfIWkeifsLRuOUW", | ||
"type": "spec/overlays/character_encoding/1.0", | ||
"attribute_character_encoding": { | ||
"d": "utf-8", | ||
"i": "utf-8", | ||
"passed": "utf-8" | ||
} | ||
}, | ||
"conformance": { | ||
"d": "EJSRe8DnLonKf6GVT_bC1QHoY0lQOG6-ldqxu7pqVCU8", | ||
"capture_base": "EBnF9U9XW1EqteIW0ucAR4CsTUqojvfIWkeifsLRuOUW", | ||
"type": "spec/overlays/conformance/1.0", | ||
"attribute_conformance": { | ||
"d": "M", | ||
"i": "M", | ||
"passed": "M" | ||
} | ||
}, | ||
"information": [ | ||
{ | ||
"d": "EIBXpVvka3_4lheeajtitiafIP78Ig8LDMVX9dXpCC2l", | ||
"capture_base": "EBnF9U9XW1EqteIW0ucAR4CsTUqojvfIWkeifsLRuOUW", | ||
"type": "spec/overlays/information/1.0", | ||
"language": "eng", | ||
"attribute_information": { | ||
"d": "Schema digest", | ||
"i": "Credential Issuee", | ||
"passed": "Enables or disables passing" | ||
} | ||
} | ||
], | ||
"label": [ | ||
{ | ||
"d": "ECZc26INzjxVbNo7-hln6xN3HW3e1r6NGDmA5ogRo6ef", | ||
"capture_base": "EBnF9U9XW1EqteIW0ucAR4CsTUqojvfIWkeifsLRuOUW", | ||
"type": "spec/overlays/label/1.0", | ||
"language": "eng", | ||
"attribute_categories": [], | ||
"attribute_labels": { | ||
"d": "Schema digest", | ||
"i": "Credential Issuee", | ||
"passed": "Passed" | ||
}, | ||
"category_labels": {} | ||
} | ||
], | ||
"meta": [ | ||
{ | ||
"d": "EOxvie-zslkGmFzVqYAzTVtO7RyFXAG8aCqE0OougnGV", | ||
"capture_base": "EBnF9U9XW1EqteIW0ucAR4CsTUqojvfIWkeifsLRuOUW", | ||
"type": "spec/overlays/meta/1.0", | ||
"language": "eng", | ||
"description": "Entrance credential", | ||
"name": "Entrance credential" | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
_Example 20. Code snippet for an OCA Bundle._ | ||
|
||
If well-structured, the metadata in an OCA bundle can facilitate many ways for users to search for information, present results, and even manipulate and present information objects without compromising their integrity. | ||
|
||
### Code Tables | ||
|
||
|
@@ -1125,7 +1240,7 @@ Internet Assigned Numbers Authority (IANA) [https://www.iana.org/](https://www.i | |
</dd> | ||
|
||
<dt id="ref-ICAO"> | ||
[ICAO] | ||
[ICAO] | ||
</dt> | ||
<dd> | ||
|
||
|
@@ -1325,29 +1440,3 @@ United Nations. Sustainable Development Goals (SDGs) [https://sdgs.un.org/goals] | |
</div> | ||
|
||
## Appendices | ||
|
||
### Appendix A. An example of Metafile content | ||
|
||
```json | ||
{ | ||
"files": { | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] character_encoding": "E3SAKe0z83pfBnhhcZl19PGGKBheb35WeCJ3V6RdqwY8", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] conditional": "Ejx0o0yuwp99vi0V-ssP6URZIXRMGj1oNKIZ1BXi4sHU", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] conformance": "EZv1B5nNl4Rty8CXFTALhr8T6qXeO0CcKliM03sdrkRA", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] entry (en)": "Eri3NLi1fr4QrKoFfTlK31KvWpwrSgGaZ0LLuWYQaZfI", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] entry (fr)": "EY0UZ8aYAPusaWk_TON8c20gHth2tvZs4eWh7XAfXBcY", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] entry_code": "E1mqEb4f6eOMgu5zR857WWlMUwGYwPzZgiM6sWRZkQ0M", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] format": "ESEMKWoKKIf5qvngKecV-ei8MwcQc_pPWCH1FrTWajAM", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] information (en)": "EyzKEWuMs8kspj4r70_Lc8sdppnDx-hb9QqUQywjmDRY", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] information (fr)": "EIGknekgJFqjgQ8ah2NwL8zNWbFrllvXVLqezgB6U3Yg", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] label (en)": "EgBxL29VsxoZso7YFirlMP334ZuC1mkel-lO7TxPxEq8", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] label (fr)": "ED9PH0ZBaOci-nbnYfPgYZWGQdkyWxA-nW3REmB3vhu0", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] layout": "ElJEQGfAvfJEuB7JeNIcvmAPO2DIOaKkpkZyvxO-gQoc", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] meta (en)": "EpW9bQGs0Lk6k5cJikN0Ep-DN6z29fwZIsbVzMBgTlWY", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] meta (fr)": "EIGj0LQKT9-6gCLV2QZVgi4YQZhrUl0-GKbN7sFTCSAI", | ||
"[EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis] unit": "EHDwC_Ucuttrsxh2NVptgBnyG4EMbG5D8QsdbeF9G9-M", | ||
"capture_base-0": "EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis" | ||
}, | ||
"root": "EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis" | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be serializable? During the calculation of the digests, an interium, deterministic form of the data being hashed needs to be created, but that is not a reason to canonicalize the “at rest” representation of the Bundle. Much better to say that the ordering of items SHOULD NOT be relied upon. It is fighting against nature to try to force an ordering on moving data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I follow your question you start with
serialization
and then you speaking aboutordering
and SHOULD NOT be relied upon. If you could elaborate a bit would be helpful.Generally serialization (with specific ordering) is required to calculate the hash as soon as that is done the format how you present, store or move bundle does not matter as soon as there is clear way to convert it (serialize it) to the form on which you can validate the hash. And this is what the specs describes. Tell you how serialized version should look like and in which order attributes should be to make sure that the hash can be calculated in deterministic way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main question is why is the statement in the spec? We know they are JSON objects. I assume the statement is there for some reason and I’d like to know what it is? Can it be removed from the spec?
I should have left it at that. I was guessing on the answer, but I should wait to hear the answer.