Major refactoring of the OCA Specification #86

mitfik · 2025-02-08T19:23:04Z

This is a major update of the specification bring clarity and structure.

Below the summary list of the major changes which should be discussed and review with care:

Features:

Introduce community overlays
New representation of the Bundle (as single JSON file instead of zip file)
Allow for linking overlays to other overlays (previously each overlay needed to be linked to capture base only)

Others

Sensitive overlay took over PII's from capture base
Remove categories from Label overlay (and move it to presentation layer)
"Remove" of overlays - all overlays which were removed are nominated as community overlays and would be hosted in separate repository (Overlays Registry) this would allow
Introduce OCA WG as a new governance of the specification
Editorial changes and updating all code snippets

- add missing `d` field - remove PII and classification which would be introduced as separate overlays

Allow to link overlay to overlay instead of only capture base. This allows to have linking mechanism of entry overlay to entry codes overlay assuring consistancy.

Withotu additional references the model does not bring any value for external reader. The model should be presented as additional recource.

Remove information overlay in favour of presentation layer. See ORFg findings.

Move transformation overlay to community overlays.

It is not needed, and we may add some explanation how implementation handles time internally but should not be relevant for the specification.

- add example - update 639 iso link to point to latest spec version - enforce 639-3 as a main codes for languges in overlays - clean dead references from ISO which are not used

Categories are part of the presentation layer and should not be part of the label overlay. Signed-off-by: Robert Mitwicki <[email protected]>

Signed-off-by: Robert Mitwicki <[email protected]>

mitfik · 2025-02-09T09:11:24Z

What is the current status of both mapping overlays (attribute-to-attribute transformation) and framing overlays (attribute-to-term contextualization)? Should these be implemented as community overlays?

attribute mapping is still part of the core spec - it is relatively simple and commonly used but there was proposition to move it out as community overlay, it would be really good if the core spec would be kept very simple and light weight.
I would vote to move it out as community overlay as well.

framing on other side is quite extensive, I would propose to make it right away as community overlay to keep core spec very light - framing is quite extensive and detail functionality, it consist of multiple functions which would significantly increase complexity of the core spec, there is a lot of things which needs to be explain and address for people to understand it clearly

For the purpose of the community overlays started preparing overlays registry which would be proposition for storing all the community overlays:

https://github.com/the-human-colossus-foundation/overlays-repository/

it is still a draft but ready to be work out. Open for suggestion and propositions.

I would suggest to get that ready before we merge this so we can have already solid place for community overlays.

carlyh-micb

Currently the technology of OCA-repo doesn't allow .'s in the attribute name.
If code is king then this should be documented in this section.
If not, then OCA-repo should be adjusted to allow .'s Or OCA-repo should document its deviation from this requirement.
However, I would strongly support allowing .'s into attribute names as it is described here "The string can be any valid Unicode code point." (technically other symbols such as , / \ = etc. can also be expressed as a valid Unicode code point).

carlyh-micb · 2025-02-10T15:35:52Z

docs/specification/README.md

+distill the most relevant aspects of SAIDs in the context of the OCA
+specification.
+
+#### How to calculate SAID:


The simplified concept of calculating a SAID.

Review to the CESR specification for complete details of SAID calculation to ensure correct SAID calculations. The summary steps described here are insufficient to correctly calculate a SAID.

carlyh-micb

For the SAID calculations there was quite a long discovery and confirmation that this method does not work. Conceptually this is the idea but following it does not result in verifying a calculated SAID.
#58

From Kent Bull: https://kentbull.com/2024/09/22/keri-series-understanding-self-addressing-identifiers-said/

bit boundaries and alphabetic choice also are needed to get the correct SAID.

pknowl · 2025-02-11T15:26:41Z

In an OCA Bundle, can we change "d" to "digest"?

It looks strange to have mixed attribute naming methodology in the bundle.

carlyh-micb · 2025-02-11T15:35:26Z

For the standard overlay, the current standard has limitations. More information, such as provided below would make this standard overlay much more useful. You can include both links that machines can read and follow and be more specific about versions etc.

      "standard_id": "https://doi.org/10.1515/iupac",
      "standard_label": "IUPAC nomenclature",
      "standard_location": "https://iupac.org/what-we-do/nomenclature/",
      "standard_version": ""

ryanbnl · 2025-02-11T15:43:35Z

docs/specification/README.md

- Character encoding overlay
- Format overlay
+```
+OCAS<major><minor><format><size>_


Why is the size calculated? It's orthogonal to versioning? I can't think of a strong motivation to include it at this layer.

TL;DR: this is for effective streaming.

If you go beyond HTTP protocol and merely focus on streams of bytes, consuming the whole chunk (Bundle) out of a stream is simply taking the <size> of bytes off the stream, effectively enabling the transfer of the Bundles over the wire along with other chunks. Furthermore, because we precisely know where to look for particular information in the stream (that's why OCA always had custom canonical form and is not RFC 8785 compliant — Bundle JSON starts with the v attribute), we can immediately decide which parser can handle this chunk. In this case, OCAS<major><minor><format> enables us to unambiguously apply the appropriate parser for further handling this chunk.

FWIW, the <size> is CESR-Base64 encoded.

Streaming = a concern for the messaging layer, not for the application layer.

And that's why we have Bundles. If there's no need for exchange , there's no need for a Bundle concept.

@blelump I'm confused by your response. Message streaming information belongs inside an exchange packet, not inside a schema. As it stands, the "version" format (e.g., "v": "OCAA11JSON00714b_") contains the byte size of the messaging stream (OCAS<major><minor><format><size>_). This is in the wrong place.

OCA is solely for defining passive objects, nothing else. It is not a messaging protocol. Messaging should be defined in exchange packets, not in the data schema itself.

If you follow the Informatics Domain Model. This separation is clearly defined:
https://zenodo.org/records/14525852

Capture = Objects = Schema
Exchange = Actions = Packet

OCA is a messaging protocol? In your vision then - schema bundles only exist at the time of transit? There will never be a need for a library of schema bundles? That bundles wouldn't be stored next to data to help describe that data as it is stored? OCA as you envision it exists only for transit? And once a schema with data reaches its destination then is turned into something else?

@carlyh-micb The Bundle concept exists because of the need for exchange. It couples all the tiny pieces, CB and a set of overlays, into one whole that the Bundle author found cohesive and exhaustive.

Bundle became a first-class citizen in the OCA ecosystem precisely because of the data that flows. Bundles and OCA aren't needed if the data doesn't flow. In essence, if there's some data that is stored on a local computer and this is the only copy, OCA is not needed because this data doesn't flow.

In data-that-flow use cases, using or keeping the Bundles by the data receiver is perfectly fine.

I hope the reasoning doesn't require further elaboration and is now clear, especially regarding the need for a Bundle and its lifecycle.

@blelump I'm confused by your response. Message streaming information belongs inside an exchange packet, not inside a schema. As it stands, the "version" format (e.g., "v": "OCAA11JSON00714b_") contains the byte size of the messaging stream (OCAS<major><minor><format><size>_). This is in the wrong place.

OCA is solely for defining passive objects, nothing else. It is not a messaging protocol. Messaging should be defined in exchange packets, not in the data schema itself.

If you follow the Informatics Domain Model. This separation is clearly defined: https://zenodo.org/records/14525852

Capture = Objects = Schema Exchange = Actions = Packet

@pknowl the term messaging protocol is quite broad—could you clarify what aspect you're referring to?

As explained above, a Bundle exists merely for the exchange. It's a data container that is well-defined within the protocol for exchange purposes. Specifically, because the OCA brings additional value merely when data flows, Bundle, as the enabler for proper flow, became a first-class citizen in the protocol (defined in the spec).

OCAS<major><minor><format><size>_ is part of this data container to unambiguously find out with what container variant we're dealing with when reading it. Versioning data containers enables proper parsing and backward compatibility and is no different than versioning any other representation or file format. It concerns the OCAS<major><minor> part. The <format><size> is then appended to it (in the above example, it is JSON00714b) to find the representation and msg size unambiguously. This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.

Therefore, from the Bundle receiver perspective, we use the v attribute to narrow the context specifically to avoid any ambiguity on how to read this Bundle.

Finally, it can be tempting to separate Bundle, a distinct concept serving as a data container for the exchange, from the core spec. However, due to OCA's inherent nature and applicability in ecosystems where data flows, Bundle is a first-class citizen and part of the core spec.

This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.

You want to use the Bundle as a wire format? To make that actually viable the JSON serialization and encoding must be specified in detail. We haven't even specified at the Bundle itself needs to be encoded as UTF-8, let alone the subset (with/without BOM), the role of spacing, line endings etc.

Given the current specification the specification you have to parse the JSON itself in order to extract the value from "v". and thus do anything useful with the length. If we do what everyone else does and implement this on a different layer than you can do things like reserve the first N bytes for this metadata, which enables a lot of fun stuff. I've even seen people do this by prefixing the JSON with an a 16-character string.

This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.

You want to use the Bundle as a wire format?

Yes, see below for further explanation.

To make that actually viable the JSON serialization and encoding must be specified in detail. We haven't even specified at the Bundle itself needs to be encoded as UTF-8, let alone the subset (with/without BOM), the role of spacing, line endings etc.

Yeah, we'd need to add this information.

Given the current specification the specification you have to parse the JSON itself in order to extract the value from "v". and thus do anything useful with the length.

Thanks to the Bundle canonical form, we know where to look for specific bytes. We specifically know where to look for <format><size> counting from the start of the stream. Therefore, we don't need to deserialize the potentially valid JSON string to extract v.

If we do what everyone else does and implement this on a different layer than you can do things like reserve the first N bytes for this metadata, which enables a lot of fun stuff. I've even seen people do this by prefixing the JSON with an a 16-character string.

This is precisely what we're doing when applying CESR, that is, suffixing the JSON with a sophistically structured text that at first glance looks like garbage. Adding layering here in the context of other components we use the same way and join them, that is: <some payload, i.e., OCA Bundle in JSON><attachments><a VC in JSON><attachments><JSON><attachments><JSON><attachments> enable us to unambiguously find with what type of document we're dealing in this chain. Enveloping any of these would add more complexity — in most cases; these attachments are digital signatures; therefore, verifying information would first require de-enveloping. Going further, OCA primarily serves as a DDE enabler. When considering its features, we also consider the broader concept of DDE and how to integrate them effectively. At the same time, by providing universal tooling, we relax the entry point to OCA and let people join the ecosystem without the need to implement all this stuff on their own, but instead consume it and use it.

My $0.02CDN. I’d definitely like to leave off the size of the bundle in the version as it is a pain. Doable if the calculation is well-defined, but annoying at the application layer. I agree that if anyone wants to stream OCA data (which really doesn’t make sense to me), they are welcome to do that by putting a minimal wrapper / prefix that has the size. But it should be outside of the OCA specification.

I definitely agree that a digest and version at the same level as capture_base and overlays are needed. I’d like the version defined as simply a semver.

The Informatics Domain Model (IDM) should be the blueprint for data-centric modeling, not the OCA Bundle itself. The OCA Bundle belongs strictly in the Object domain (passive) [i.e., no mechanics], and must maintain distinct separation from event logs (Event domain), active execution algorithms (Intelligence domain), and framed concepts (Knowledge domain).

Blurring these domain boundaries creates two major issues:

Search & Discovery Breakdown

Each domain supports a distinct type of search:

a.) Attribute search (Object) → Finds structural attributes in an OCA Bundle.
b.) Field search (Event) → Queries recorded fields in an event history.
c.) Term search (Concept) → Searches by ontological terms or controlled vocabulary.
d.) Value search (Action) → Retrieves explicit exchange metadata and execution values, which may include:

Message size (byte length of the payload);

Location (where the bundle is stored/fetched);

Routing details (if streaming applies).

Embedding value-based search parameters in the OCA Bundle mixes passive structure (attributes) with active mechanics (values), making searches imprecise.

Role-Based Access Control (RBAC) Violations

Keeping domains separate ensures granular access control:

In the case of the two domains in question (i.e., Object & Action) ...
a.) Schema Guardians may be appointed to protect structural semantics in an OCA Bundle.
b.) Packet Trackers may be appointed to track message execution in transit (message size, location, routing).

If message metadata is stored inside the OCA Bundle, Schema Guardians would have access to exchange intelligence, violating need-to-know governance.

My suggestion would be to use an envelope for message/transmission metadata, and remove the "v" attribute (Versioning, Encoding Format & Message Size) from the OCA core specification. This would ensure:
✅ Schema Bundles remain purely structural (i.e., made up of passive structural attributes).
✅ Message metadata stays in the Action domain (i.e., within packet headers).
✅ RBAC integrity is preserved.

swcurran · 2025-02-12T20:09:56Z

docs/specification/README.md


 #### What are Overlays?

-Overlays are task-specific objects that provide cryptographically-bound layers of definitional or contextual metadata to a Capture Base. Any actor interacting with a published Capture Base can use Overlays to transform how inputted data and metadata are displayed to a viewer or guide an agent in applying a custom process to captured data.
+[Overlays](#overlays) are task-specific objects that provide cryptographically-bound layers of definitional or contextual metadata to a [Capture Base](#capture-base). Any actor interacting with a published [Capture Base](#capture-base) can use [Overlays](#overlays) to enrich meaning of the data, transform how inputted data and metadata are displayed to a viewer or guide an agent in applying a custom process to captured data.


Typo — “to enrich the meaning of the data"

swcurran · 2025-02-12T20:12:12Z

docs/specification/README.md

-    "fullName",
-    "dateOfBirth",
-    "photoImage"
-  ]
 }
 ```

 _Example 1. Code snippet for a Capture Base._

 #### Type


Suggest that “type” should be defined outside of the context of the Capture base, since it applies to all overlays. In the context of Capture Base, the fixed value of its type is all that needs to be defined.

swcurran · 2025-02-12T20:14:23Z

docs/specification/README.md

-  "type": "spec/capture_base/1.0",
-  "classification": "GICS:45102010",
+  "d": "EFEDyA__ap51wscacOwATP3c51icUeHT6D0tTbInQI9G",
+  "type": "spec/capture_base/1.0.0",


I think the version in the Capture Base type should be bumped to 2.0.0, since this is a breaking change definition.

I’d add that I’m not in favour of changing the Capture Base data model. While it makes sense to not have the PII flags in the capture base, the value of moving them out now, and breaking all existing implementations is questionable.

The use of “d” vs. “digest” is also a breaking change, so that would also force a 2.0.0 update — again with little added value. I don’t think changing anything is worth it.

I strongly believe the "v" attribute in it's current format should be removed from the OCA spec entirely. Its format hinders adoption and breaks multiple use cases.

This messaging information belongs in a packet header, not within a passive schema. It’s not even an overlay—it should only function as an envelope, so its inclusion in the core spec is unnecessary.

The Capture Base must support 100% of use cases. If we can’t ensure that foundational flexibility, we’re compromising adoption at the very first hurdle.

I think that v or ver or version is crucial to the spec — but it only needs to be a semver that tells an OCA Bundle consumer “This OCA Bundle is using version x.y.z of the OCA specification”. That is crucial to being able to smoothly transition deployments from one version of the specification to the next — in short, to new add features, and remove old ones. It is relatively easy to write an implementation that can handle multiple versions of the specification if there is a version. It’s really hard if the software has to “sniff” (check) arbitrary data here and there to determine the version the OCA Bundle Producer used.

Yes, I concur!
#86 (comment)

swcurran · 2025-02-12T20:22:24Z

docs/specification/README.md

+An attribute name is a string that uniquely identifies an attribute within an OCA layer and is used to reference that attribute by other layers throughout the OCA bundle. The string can be any valid Unicode code point.
+Example of a valid attribute name:
+- `FullName`
+- `person/name/fullName`


I would like to see a reference made in the spec to the effect that "the use of / separators in the attribute name MAY indicate a hierarchy in the capture base, indicating that the capture base attributes define a flattening of that hierarchy”. That will be very useful for many use cases where the data is represented in a (for example) JSON data model, where the nodes above the attributes are necessary to know, but are not relevant in the OCA Bundle.

That note would be an alternative to the use of using a reference to the SAID of another OCA Bundle for representing hierarchical data.

swcurran · 2025-02-12T20:27:39Z

docs/specification/README.md

+data items or elements of the same data type. When you want to store many pieces
+of data that are related and have the same data type, it is often better to use
+an array instead of many separate variables (e.g., `Array[Text]`,
+`Array[Numeric]`, etc.).


Not crucial, but a data type that might be worth adding at this level is a Data URL. Currently, an OCA Bundle creator would use “Text” (although has that disappeared from the list?), and specify the Data URL standard, perhaps with the media type as the “format". Having Data URL as a first class entity would seem to be more useful.

swcurran · 2025-02-12T20:29:34Z

docs/specification/README.md


-Any attributes defined in a Capture Base that may contain identifying information about entities (i.e., personally identifiable information (PII) or quasi-identifiable information (QII)) can be flagged.
+`Overlay` as a task-specific object provides layers of definitional or contextual metadata. OCA specification recognize two core types of overlays:


Is it necessary to specify that there are two (at least) classes of overlays? I think that is just confusing. Knowing the “core type” (that is not technically defined in the spec) and who’s definition is blurry for many overlays, makes it confusing. I recommend just removing this.

swcurran · 2025-02-12T20:31:02Z

docs/specification/README.md

- [ Type ](#type-1)
- Overlay-specific attributes
+Overlays `MUST` comprises the following attributes, listed in order to form its canonical serialization:
+- `d` - [deterministic identifier](#deterministic-identifier) of the overlay


Changing to d is a breaking change (meaning all semvers will need to be bumped) with little additional value. Suggest leaving as digest.

swcurran · 2025-02-12T20:33:12Z

docs/specification/README.md

+
+##### Overlay
+
+The `overlay` attribute contains the [SAID](#ref-SAID) of the [Overlay](#overlays) to cryptographically anchor to that parent object.


Is it the literal overlay that is used, or some sort of reference. Don’t care either way, but its not clear to me given the use of capture_base if Capture Base is meant.

swcurran · 2025-02-12T20:36:50Z

docs/specification/README.md

-type = "spec/overlay/" overlay_name "/" sem_ver
-overlay_name = ALPHA
-sem_ver = DIGIT "." DIGIT
+type = "("spec" / "community)/overlays/" overlay_name "/" sem_ver


There should be space for the name of the community that defines the overlay. The term “community” provides no value, and requires all communities to find out if the overlay name they want to use is already in use. Much better to have a per community namespace, and then any overlay name can be used within each community.

Once the community overlays are proposed for promotion into the “core” overlays, the assurance that there is only one “spec” overlay of a given name can be handled by the Working Group.

swcurran · 2025-02-12T20:40:16Z

docs/specification/README.md

    "capture_base": "EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis",
-    "type": "spec/overlays/character_encoding/1.0",
+    "type": "spec/overlays/character_encoding/1.0.2",


What changed and why the “patch” server? I guess because even though the tools emitted a digest, it is the addition of the d? I think by semver rules, that is a minor update (additional field). That said, in practice, this is a major change — renaming digest to d.

swcurran · 2025-02-12T20:44:26Z

docs/specification/README.md

@@ -318,65 +345,24 @@ The inputted format values are dependent on the following core data types as def

 _Example 3. Code snippet for a Format Overlay._

-##### Information Overlay


Is the information overlay being removed? We include it in all our use cases. Reasoning?

swcurran · 2025-02-12T20:48:04Z

docs/specification/README.md

@@ -405,7 +384,7 @@ _[language-specific object]_

 A Meta Overlay defines any language-specific information about a schema. It is used for discovery and identification and includes elements such as the schema name and description.

-In addition to the `capture_base`, `type`, and `language` attributes (see [Common attributes](#common-attributes)), the Meta Overlay SHOULD include the following attributes:
+In addition to the [Mandatory attributes](#mandatory-attributes) and [language](#language), the Meta Overlay SHOULD include the following attributes:


We make use of the Meta overlay to include a number of other, use case specific values. I think it would be valuable to include that "other name/value pairs MAY be included, and consumers of OCA Bundles MUST ignore any unexpected items”. This allows a producer of an OCA Bundle to include additional Meta data, and consumers to use the data they expect, and ignore the rest (vs. rejecting the overlay because of the extra, unexpected data).

swcurran · 2025-02-12T20:49:41Z

docs/specification/README.md

-  "type": "spec/overlays/meta/1.0",
-  "language": "en",
+  "type": "spec/overlays/meta/1.0.2",
+  "language": "en_UK",


Shouldn’t that be a - not an _?

swcurran · 2025-02-12T20:54:00Z

docs/specification/README.md


 A Conformance Overlay indicates whether data entry for each attribute is mandatory or optional.

-In addition to the `capture_base` and `type` attributes (see [Common attributes](#common-attributes)), the Conformance Overlay MAY include the following attributes:
+In addition to the [Mandatory attributes](#mandatory-attributes), the Conformance Overlay MAY include the following attributes:


Just as a matter of interest — in this context, does “Mandatory” mean that the data element MUST be present, or does it mean the data element MUST have a value? In our context (credentials), ALL attributes must be present, but an Issuer might routinely not populate an attribute in a credential. Not crucial, but perhaps that might be clarfied.

swcurran · 2025-02-12T20:58:39Z

docs/specification/README.md

-Transformation overlays provide information to convert data from one format or structure to another, such as raw data to processed, or unstructured to structured.
-
-##### Attribute Mapping Overlay
+#### Attribute Mapping Overlay


We don’t often use this, but how does the OCA Producer and consumer communicate what is the “other” set of attributes — the ones that are not in the Capture Base? For example, should this have a reference (SAID) for another bundle?

swcurran · 2025-02-12T21:00:27Z

docs/specification/README.md

-##### Unit Mapping Overlay
-
-A Unit Mapping Overlay defines target units for quantitative data when converting between different units of measurement. Conversion of units is the conversion between different units of measurement for the same quantity, typically through multiplicative conversion factors (see [Code Table for Unit mappings](#code-table-for-unit-mappings) for more information on conversion factors) which change the measured quantity value without changing its effects. The process of conversion depends on the specific situation and the intended purpose. This may be governed by regulation, contract, technical specifications or other published standards.
+#### Sensitive Overlay


As noted in an earlier comment — while I think it makes sense to have this as an overlay vs. built into the Capture Base, is it really worth the confusion that is going to be caused by moving it out?

swcurran · 2025-02-12T21:04:55Z

docs/specification/README.md


-A Sensitive Overlay defines attributes not necessarily flagged in the Capture Base that need protecting against unwarranted disclosure. For example, data that requires protection for legal or ethical reasons, personal privacy, or proprietary considerations.
+OCA Bundles MUST be serializable to be transferred over the network. The


Why does it have to be serializable? During the calculation of the digests, an interium, deterministic form of the data being hashed needs to be created, but that is not a reason to canonicalize the “at rest” representation of the Bundle. Much better to say that the ordering of items SHOULD NOT be relied upon. It is fighting against nature to try to force an ordering on moving data.

swcurran · 2025-02-12T21:27:56Z

docs/specification/README.md

-├── EHDwC_Ucuttrsxh2NVptgBnyG4EMbG5D8QsdbeF9G9-M.json
-└── meta.json
-```
+Validation failure must result in the rejection of the bundle as non-compliant with the specification.


This is a big concern of mine, as I’ve expressed multiple times. Once more.

Please add to the spec the algorithm for creating a digest for a given “chunk” of JSON.

Please do not refer to the CESR/SAID spec for that, but to put the algorithm in the spec. The algorithm is short, and easily defined.

Set the value of the digest item to a string # characters of the length the digest willl be

Calculate digest = remove_padding (encode ( prefix + hash ( JCS(JSON) ) ) )

Note that the OCA Bundle does NOT need to be stored canonicalized — the algoirthm to calculate the SAID will canonicalize the relevant JSON in doing the SAID calculation.

In doing that, please require that the hash and encoding algorithms used are embedded in the digest (the SAID prefix is fine, although I would prefer the more standard multiformats (multi base and multi hash).

Please specify the specific, and ideally very, hashing and encoding schemes. I would recommend only sha-256 and b58btc encoding, but am fine if others are specifically allowed. Without limiting the algorithms allowed (by version of the OCA specification), it is impossible to write an OCA Consumer that handles whatever a algorithms are used by producers. There are just too many options.

Please document the process for calculating the digests for an OCA Bundle. Notably, it must be calculated as follows:

Calculate the digest for the Capture Base, and set the value of its digest to the SAID.

For each overlay:

Set the capture_base value to be the digest of the capture base.

Calculate the digest for the overlay, and set the value of its digest to the SAID.

Calculate the digest for the entire OCA Bundle, and set the value of the root digest to that SAID.

Missing from my comment above is the (I think unnecessary) calculation of the length of the OCA Bundle before calculating the the digest of the entire bundle. Thus the last step I have above (Calculate the digest for the entire OCA Bundle…), with the steps:

Set the value of the root digest to a string of # characters the length the digest will be.

Determine the length of the OCA Bundle by doing this calculation: insert calculation of length of the bundle

Set the OCA Version string to be prefix + length of OCA Bundle + suffix (prefix and suffix are hardcoded per OCA Specification Version.

If any consumer of an OCA Bundle cares, they would need to repeat the length calculation and verify it against the length. They are unlikely to do that, because the digest verification would also fail if the OCA Bundle length has been changed.

swcurran · 2025-02-12T21:31:48Z

docs/specification/README.md

@@ -1038,6 +969,17 @@ Smith, S. Self-Addressing IDentifier (SAID) (2022) [ https://datatracker.ietf.or
 </dd>
 </dl>

+<dl>


Why does there need to be a reference to CESR here? The OCA Spec has nothing to do with CESR. If there answer is the need to find the SAID spec — I again urge us to NOT out-source the SAID / digest calculation to another spec. That is just complicating life for everyone — especially implementers. While I’d prefer the use of multihash and multibase for the digest, I’m fine with the use of the SAID algorithm and prefix — I just don’t want to have to try to dig into the spec. for that little bit I need to use in the OCA Spec.

swcurran · 2025-02-12T21:59:16Z

I’d also like to see added to the specification an explanation of the concept of OCA Bundle Producers and Consumers. Or perhaps more accurately:

OCA Bundle Publishers — entities that create OCA Bundles.
Producers of data to which an OCA Bundle applies.
Consumers of data to which an OCA Bundle applies.

Hope I helped. I suspect not, but I have to try...

pknowl · 2025-02-12T22:50:32Z

"d" should be written as "digest". The mixed attribute naming convention looks ugly. My OCD will keep triggering!

mitfik added 21 commits January 17, 2025 10:20

chore: remove old rc version which is outdated

4cee8fd

chore: Archive 1.0.1 version

90bd1dc

docs: Improve OCA bundle normative description

735d059

docs: Editorial changes

5dacdac

docs: Specify non-normative vs normative parts

55b790d

feat: Allow for SemVer in object type

ab3c357

feat: Remove PII and classification from capture base

d80cfb7

- add missing `d` field - remove PII and classification which would be introduced as separate overlays

feat: Introduce linking to overlay to overlay

edf67d3

Allow to link overlay to overlay instead of only capture base. This allows to have linking mechanism of entry overlay to entry codes overlay assuring consistancy.

docs: remove rugby model reference

fc1d713

Withotu additional references the model does not bring any value for external reader. The model should be presented as additional recource.

feat: remove information overlay

e2ab835

Remove information overlay in favour of presentation layer. See ORFg findings.

feat: remove transformation overlay

422939f

Move transformation overlay to community overlays.

feat: remove presentation overlay

34bb431

feat: remove layout overlay

e425402

feat: enahnce sensitive overlay to replace flagging from capture base

2cc67c1

docs: remove non-normative section about basic concept

7f1d5ae

docs: move conventions section to the begining

39e345b

docs: editorial changes to improve clarity

9dd8c13

docs: improve attribute name description and add ABNF

8077b64

docs: remove note about ISO datetime recomendation.

ded80eb

It is not needed, and we may add some explanation how implementation handles time internally but should not be relevant for the specification.

docs: improve overlay type description and align with SemVer

941971c

feat: Use 639-3 for language codes

6fbbcbe

- add example - update 639 iso link to point to latest spec version - enforce 639-3 as a main codes for languges in overlays - clean dead references from ISO which are not used

mitfik force-pushed the major_update branch 2 times, most recently from b8ee6cd to 3b7e080 Compare February 8, 2025 19:48

mitfik added 7 commits February 8, 2025 20:49

chore: Fix links and improve identation

edef45a

feat: remove categories from label overlay

c4812de

Categories are part of the presentation layer and should not be part of the label overlay. Signed-off-by: Robert Mitwicki <[email protected]>

feat: move conditional overlay as community overlay

699e68a

Signed-off-by: Robert Mitwicki <[email protected]>

chore: clarity about language common attribute

c4969f6

Signed-off-by: Robert Mitwicki <[email protected]>

chore: fix versioning in examples

beed75a

Signed-off-by: Robert Mitwicki <[email protected]>

chore: align description with new structure

942a5ba

Signed-off-by: Robert Mitwicki <[email protected]>

feat: add community overlay section

b97cc78

Signed-off-by: Robert Mitwicki <[email protected]>

chore: fix section levels

1aea159

mitfik force-pushed the major_update branch from 390025c to 1aea159 Compare February 9, 2025 08:57

carlyh-micb reviewed Feb 10, 2025

View reviewed changes

ryanbnl reviewed Feb 11, 2025

View reviewed changes

swcurran reviewed Feb 12, 2025

View reviewed changes


		Any attributes defined in a Capture Base that may contain identifying information about entities (i.e., personally identifiable information (PII) or quasi-identifiable information (QII)) can be flagged.
		`Overlay` as a task-specific object provides layers of definitional or contextual metadata. OCA specification recognize two core types of overlays:


		##### Overlay

		The `overlay` attribute contains the [SAID](#ref-SAID) of the [Overlay](#overlays) to cryptographically anchor to that parent object.

		@@ -318,65 +345,24 @@ The inputted format values are dependent on the following core data types as def

		_Example 3. Code snippet for a Format Overlay._

		##### Information Overlay


		A Sensitive Overlay defines attributes not necessarily flagged in the Capture Base that need protecting against unwarranted disclosure. For example, data that requires protection for legal or ethical reasons, personal privacy, or proprietary considerations.
		OCA Bundles MUST be serializable to be transferred over the network. The

               </dd>
               </dl>
+              <dl>

Major refactoring of the OCA Specification #86

Are you sure you want to change the base?

Major refactoring of the OCA Specification #86

Conversation

mitfik commented Feb 8, 2025

Features:

Others

mitfik commented Feb 9, 2025

carlyh-micb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

The simplified concept of calculating a SAID.

carlyh-micb left a comment

Choose a reason for hiding this comment

pknowl commented Feb 11, 2025 • edited Loading

carlyh-micb commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pknowl Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pknowl Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swcurran Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pknowl Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swcurran commented Feb 12, 2025

pknowl commented Feb 12, 2025

pknowl commented Feb 11, 2025 •

edited

Loading

pknowl Feb 11, 2025 •

edited

Loading

pknowl Feb 12, 2025 •

edited

Loading

swcurran Feb 12, 2025 •

edited

Loading

pknowl Feb 12, 2025 •

edited

Loading