Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring of the OCA Specification #86

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

mitfik
Copy link
Contributor

@mitfik mitfik commented Feb 8, 2025

This is a major update of the specification bring clarity and structure.

Below the summary list of the major changes which should be discussed and review with care:

Features:

  • Introduce community overlays
  • New representation of the Bundle (as single JSON file instead of zip file)
  • Allow for linking overlays to other overlays (previously each overlay needed to be linked to capture base only)

Others

  • Sensitive overlay took over PII's from capture base
  • Remove categories from Label overlay (and move it to presentation layer)
  • "Remove" of overlays - all overlays which were removed are nominated as community overlays and would be hosted in separate repository (Overlays Registry) this would allow
  • Introduce OCA WG as a new governance of the specification
  • Editorial changes and updating all code snippets

mitfik added 21 commits January 17, 2025 10:20
- add missing `d` field
- remove PII and classification which would be introduced as separate
  overlays
Allow to link overlay to overlay instead of only capture base. This
allows to have linking mechanism of entry overlay to entry codes overlay
assuring consistancy.
Withotu additional references the model does not bring any value for
external reader. The model should be presented as additional recource.
Remove information overlay in favour of presentation layer. See ORFg
findings.
Move transformation overlay to community overlays.
It is not needed, and we may add some explanation how implementation
handles time internally but should not be relevant for the
specification.
- add example
- update 639 iso link to point to latest spec version
- enforce 639-3 as a main codes for languges in overlays
- clean dead references from ISO which are not used
@mitfik mitfik force-pushed the major_update branch 2 times, most recently from b8ee6cd to 3b7e080 Compare February 8, 2025 19:48
Categories are part of the presentation layer and should not be part of
the label overlay.

Signed-off-by: Robert Mitwicki <[email protected]>
Signed-off-by: Robert Mitwicki <[email protected]>
@mitfik
Copy link
Contributor Author

mitfik commented Feb 9, 2025

What is the current status of both mapping overlays (attribute-to-attribute transformation) and framing overlays (attribute-to-term contextualization)? Should these be implemented as community overlays?

attribute mapping is still part of the core spec - it is relatively simple and commonly used but there was proposition to move it out as community overlay, it would be really good if the core spec would be kept very simple and light weight.
I would vote to move it out as community overlay as well.

framing on other side is quite extensive, I would propose to make it right away as community overlay to keep core spec very light - framing is quite extensive and detail functionality, it consist of multiple functions which would significantly increase complexity of the core spec, there is a lot of things which needs to be explain and address for people to understand it clearly

For the purpose of the community overlays started preparing overlays registry which would be proposition for storing all the community overlays:

https://github.com/the-human-colossus-foundation/overlays-repository/

it is still a draft but ready to be work out. Open for suggestion and propositions.

I would suggest to get that ready before we merge this so we can have already solid place for community overlays.

Copy link
Collaborator

@carlyh-micb carlyh-micb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the technology of OCA-repo doesn't allow .'s in the attribute name.
If code is king then this should be documented in this section.
If not, then OCA-repo should be adjusted to allow .'s Or OCA-repo should document its deviation from this requirement.
However, I would strongly support allowing .'s into attribute names as it is described here "The string can be any valid Unicode code point." (technically other symbols such as , / \ = etc. can also be expressed as a valid Unicode code point).

distill the most relevant aspects of SAIDs in the context of the OCA
specification.

#### How to calculate SAID:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplified concept of calculating a SAID.

Review to the CESR specification for complete details of SAID calculation to ensure correct SAID calculations. The summary steps described here are insufficient to correctly calculate a SAID.

Copy link
Collaborator

@carlyh-micb carlyh-micb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the SAID calculations there was quite a long discovery and confirmation that this method does not work. Conceptually this is the idea but following it does not result in verifying a calculated SAID.
#58

From Kent Bull: https://kentbull.com/2024/09/22/keri-series-understanding-self-addressing-identifiers-said/

bit boundaries and alphabetic choice also are needed to get the correct SAID.

@pknowl
Copy link
Collaborator

pknowl commented Feb 11, 2025

In an OCA Bundle, can we change "d" to "digest"?

It looks strange to have mixed attribute naming methodology in the bundle.

@carlyh-micb
Copy link
Collaborator

For the standard overlay, the current standard has limitations. More information, such as provided below would make this standard overlay much more useful. You can include both links that machines can read and follow and be more specific about versions etc.

      "standard_id": "https://doi.org/10.1515/iupac",
      "standard_label": "IUPAC nomenclature",
      "standard_location": "https://iupac.org/what-we-do/nomenclature/",
      "standard_version": ""

- Character encoding overlay
- Format overlay
```
OCAS<major><minor><format><size>_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the size calculated? It's orthogonal to versioning? I can't think of a strong motivation to include it at this layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: this is for effective streaming.

If you go beyond HTTP protocol and merely focus on streams of bytes, consuming the whole chunk (Bundle) out of a stream is simply taking the <size> of bytes off the stream, effectively enabling the transfer of the Bundles over the wire along with other chunks. Furthermore, because we precisely know where to look for particular information in the stream (that's why OCA always had custom canonical form and is not RFC 8785 compliant — Bundle JSON starts with the v attribute), we can immediately decide which parser can handle this chunk. In this case, OCAS<major><minor><format> enables us to unambiguously apply the appropriate parser for further handling this chunk.

FWIW, the <size> is CESR-Base64 encoded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming = a concern for the messaging layer, not for the application layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's why we have Bundles. If there's no need for exchange , there's no need for a Bundle concept.

Copy link
Collaborator

@pknowl pknowl Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blelump I'm confused by your response. Message streaming information belongs inside an exchange packet, not inside a schema. As it stands, the "version" format (e.g., "v": "OCAA11JSON00714b_") contains the byte size of the messaging stream (OCAS<major><minor><format><size>_). This is in the wrong place.

OCA is solely for defining passive objects, nothing else. It is not a messaging protocol. Messaging should be defined in exchange packets, not in the data schema itself.

If you follow the Informatics Domain Model. This separation is clearly defined:
https://zenodo.org/records/14525852

Capture = Objects = Schema
Exchange = Actions = Packet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCA is a messaging protocol? In your vision then - schema bundles only exist at the time of transit? There will never be a need for a library of schema bundles? That bundles wouldn't be stored next to data to help describe that data as it is stored? OCA as you envision it exists only for transit? And once a schema with data reaches its destination then is turned into something else?

@carlyh-micb The Bundle concept exists because of the need for exchange. It couples all the tiny pieces, CB and a set of overlays, into one whole that the Bundle author found cohesive and exhaustive.

Bundle became a first-class citizen in the OCA ecosystem precisely because of the data that flows. Bundles and OCA aren't needed if the data doesn't flow. In essence, if there's some data that is stored on a local computer and this is the only copy, OCA is not needed because this data doesn't flow.

In data-that-flow use cases, using or keeping the Bundles by the data receiver is perfectly fine.

I hope the reasoning doesn't require further elaboration and is now clear, especially regarding the need for a Bundle and its lifecycle.


@blelump I'm confused by your response. Message streaming information belongs inside an exchange packet, not inside a schema. As it stands, the "version" format (e.g., "v": "OCAA11JSON00714b_") contains the byte size of the messaging stream (OCAS<major><minor><format><size>_). This is in the wrong place.

OCA is solely for defining passive objects, nothing else. It is not a messaging protocol. Messaging should be defined in exchange packets, not in the data schema itself.

If you follow the Informatics Domain Model. This separation is clearly defined: https://zenodo.org/records/14525852

Capture = Objects = Schema Exchange = Actions = Packet

@pknowl the term messaging protocol is quite broad—could you clarify what aspect you're referring to?

As explained above, a Bundle exists merely for the exchange. It's a data container that is well-defined within the protocol for exchange purposes. Specifically, because the OCA brings additional value merely when data flows, Bundle, as the enabler for proper flow, became a first-class citizen in the protocol (defined in the spec).

OCAS<major><minor><format><size>_ is part of this data container to unambiguously find out with what container variant we're dealing with when reading it. Versioning data containers enables proper parsing and backward compatibility and is no different than versioning any other representation or file format. It concerns the OCAS<major><minor> part. The <format><size> is then appended to it (in the above example, it is JSON00714b) to find the representation and msg size unambiguously. This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.

Therefore, from the Bundle receiver perspective, we use the v attribute to narrow the context specifically to avoid any ambiguity on how to read this Bundle.

Finally, it can be tempting to separate Bundle, a distinct concept serving as a data container for the exchange, from the core spec. However, due to OCA's inherent nature and applicability in ecosystems where data flows, Bundle is a first-class citizen and part of the core spec.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.

You want to use the Bundle as a wire format? To make that actually viable the JSON serialization and encoding must be specified in detail. We haven't even specified at the Bundle itself needs to be encoded as UTF-8, let alone the subset (with/without BOM), the role of spacing, line endings etc.

Given the current specification the specification you have to parse the JSON itself in order to extract the value from "v". and thus do anything useful with the length. If we do what everyone else does and implement this on a different layer than you can do things like reserve the first N bytes for this metadata, which enables a lot of fun stuff. I've even seen people do this by prefixing the JSON with an a 16-character string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.

You want to use the Bundle as a wire format?

Yes, see below for further explanation.

To make that actually viable the JSON serialization and encoding must be specified in detail. We haven't even specified at the Bundle itself needs to be encoded as UTF-8, let alone the subset (with/without BOM), the role of spacing, line endings etc.

Yeah, we'd need to add this information.

Given the current specification the specification you have to parse the JSON itself in order to extract the value from "v". and thus do anything useful with the length.

Thanks to the Bundle canonical form, we know where to look for specific bytes. We specifically know where to look for <format><size> counting from the start of the stream. Therefore, we don't need to deserialize the potentially valid JSON string to extract v.

If we do what everyone else does and implement this on a different layer than you can do things like reserve the first N bytes for this metadata, which enables a lot of fun stuff. I've even seen people do this by prefixing the JSON with an a 16-character string.

This is precisely what we're doing when applying CESR, that is, suffixing the JSON with a sophistically structured text that at first glance looks like garbage. Adding layering here in the context of other components we use the same way and join them, that is: <some payload, i.e., OCA Bundle in JSON><attachments><a VC in JSON><attachments><JSON><attachments><JSON><attachments> enable us to unambiguously find with what type of document we're dealing in this chain. Enveloping any of these would add more complexity — in most cases; these attachments are digital signatures; therefore, verifying information would first require de-enveloping. Going further, OCA primarily serves as a DDE enabler. When considering its features, we also consider the broader concept of DDE and how to integrate them effectively. At the same time, by providing universal tooling, we relax the entry point to OCA and let people join the ecosystem without the need to implement all this stuff on their own, but instead consume it and use it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My $0.02CDN. I’d definitely like to leave off the size of the bundle in the version as it is a pain. Doable if the calculation is well-defined, but annoying at the application layer. I agree that if anyone wants to stream OCA data (which really doesn’t make sense to me), they are welcome to do that by putting a minimal wrapper / prefix that has the size. But it should be outside of the OCA specification.

I definitely agree that a digest and version at the same level as capture_base and overlays are needed. I’d like the version defined as simply a semver.

Copy link
Collaborator

@pknowl pknowl Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Informatics Domain Model (IDM) should be the blueprint for data-centric modeling, not the OCA Bundle itself. The OCA Bundle belongs strictly in the Object domain (passive) [i.e., no mechanics], and must maintain distinct separation from event logs (Event domain), active execution algorithms (Intelligence domain), and framed concepts (Knowledge domain).

Blurring these domain boundaries creates two major issues:

  1. Search & Discovery Breakdown

Each domain supports a distinct type of search:

a.) Attribute search (Object) → Finds structural attributes in an OCA Bundle.
b.) Field search (Event) → Queries recorded fields in an event history.
c.) Term search (Concept) → Searches by ontological terms or controlled vocabulary.
d.) Value search (Action) → Retrieves explicit exchange metadata and execution values, which may include:

  • Message size (byte length of the payload);
  • Location (where the bundle is stored/fetched);
  • Routing details (if streaming applies).

Embedding value-based search parameters in the OCA Bundle mixes passive structure (attributes) with active mechanics (values), making searches imprecise.

  1. Role-Based Access Control (RBAC) Violations

Keeping domains separate ensures granular access control:

In the case of the two domains in question (i.e., Object & Action) ...
a.) Schema Guardians may be appointed to protect structural semantics in an OCA Bundle.
b.) Packet Trackers may be appointed to track message execution in transit (message size, location, routing).

If message metadata is stored inside the OCA Bundle, Schema Guardians would have access to exchange intelligence, violating need-to-know governance.

My suggestion would be to use an envelope for message/transmission metadata, and remove the "v" attribute (Versioning, Encoding Format & Message Size) from the OCA core specification. This would ensure:
✅ Schema Bundles remain purely structural (i.e., made up of passive structural attributes).
✅ Message metadata stays in the Action domain (i.e., within packet headers).
✅ RBAC integrity is preserved.


#### What are Overlays?

Overlays are task-specific objects that provide cryptographically-bound layers of definitional or contextual metadata to a Capture Base. Any actor interacting with a published Capture Base can use Overlays to transform how inputted data and metadata are displayed to a viewer or guide an agent in applying a custom process to captured data.
[Overlays](#overlays) are task-specific objects that provide cryptographically-bound layers of definitional or contextual metadata to a [Capture Base](#capture-base). Any actor interacting with a published [Capture Base](#capture-base) can use [Overlays](#overlays) to enrich meaning of the data, transform how inputted data and metadata are displayed to a viewer or guide an agent in applying a custom process to captured data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo — “to enrich the meaning of the data"

"fullName",
"dateOfBirth",
"photoImage"
]
}
```

_Example 1. Code snippet for a Capture Base._

#### Type
Copy link

@swcurran swcurran Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest that “type” should be defined outside of the context of the Capture base, since it applies to all overlays. In the context of Capture Base, the fixed value of its type is all that needs to be defined.

"type": "spec/capture_base/1.0",
"classification": "GICS:45102010",
"d": "EFEDyA__ap51wscacOwATP3c51icUeHT6D0tTbInQI9G",
"type": "spec/capture_base/1.0.0",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the version in the Capture Base type should be bumped to 2.0.0, since this is a breaking change definition.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d add that I’m not in favour of changing the Capture Base data model. While it makes sense to not have the PII flags in the capture base, the value of moving them out now, and breaking all existing implementations is questionable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of “d” vs. “digest” is also a breaking change, so that would also force a 2.0.0 update — again with little added value. I don’t think changing anything is worth it.

Copy link
Collaborator

@pknowl pknowl Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe the "v" attribute in it's current format should be removed from the OCA spec entirely. Its format hinders adoption and breaks multiple use cases.

This messaging information belongs in a packet header, not within a passive schema. It’s not even an overlay—it should only function as an envelope, so its inclusion in the core spec is unnecessary.

The Capture Base must support 100% of use cases. If we can’t ensure that foundational flexibility, we’re compromising adoption at the very first hurdle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that v or ver or version is crucial to the spec — but it only needs to be a semver that tells an OCA Bundle consumer “This OCA Bundle is using version x.y.z of the OCA specification”. That is crucial to being able to smoothly transition deployments from one version of the specification to the next — in short, to new add features, and remove old ones. It is relatively easy to write an implementation that can handle multiple versions of the specification if there is a version. It’s really hard if the software has to “sniff” (check) arbitrary data here and there to determine the version the OCA Bundle Producer used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I concur!
#86 (comment)

An attribute name is a string that uniquely identifies an attribute within an OCA layer and is used to reference that attribute by other layers throughout the OCA bundle. The string can be any valid Unicode code point.
Example of a valid attribute name:
- `FullName`
- `person/name/fullName`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a reference made in the spec to the effect that "the use of / separators in the attribute name MAY indicate a hierarchy in the capture base, indicating that the capture base attributes define a flattening of that hierarchy”. That will be very useful for many use cases where the data is represented in a (for example) JSON data model, where the nodes above the attributes are necessary to know, but are not relevant in the OCA Bundle.

That note would be an alternative to the use of using a reference to the SAID of another OCA Bundle for representing hierarchical data.

data items or elements of the same data type. When you want to store many pieces
of data that are related and have the same data type, it is often better to use
an array instead of many separate variables (e.g., `Array[Text]`,
`Array[Numeric]`, etc.).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not crucial, but a data type that might be worth adding at this level is a Data URL. Currently, an OCA Bundle creator would use “Text” (although has that disappeared from the list?), and specify the Data URL standard, perhaps with the media type as the “format". Having Data URL as a first class entity would seem to be more useful.


Any attributes defined in a Capture Base that may contain identifying information about entities (i.e., personally identifiable information (PII) or quasi-identifiable information (QII)) can be flagged.
`Overlay` as a task-specific object provides layers of definitional or contextual metadata. OCA specification recognize two core types of overlays:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to specify that there are two (at least) classes of overlays? I think that is just confusing. Knowing the “core type” (that is not technically defined in the spec) and who’s definition is blurry for many overlays, makes it confusing. I recommend just removing this.

- [ Type ](#type-1)
- Overlay-specific attributes
Overlays `MUST` comprises the following attributes, listed in order to form its canonical serialization:
- `d` - [deterministic identifier](#deterministic-identifier) of the overlay

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to d is a breaking change (meaning all semvers will need to be bumped) with little additional value. Suggest leaving as digest.


##### Overlay

The `overlay` attribute contains the [SAID](#ref-SAID) of the [Overlay](#overlays) to cryptographically anchor to that parent object.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the literal overlay that is used, or some sort of reference. Don’t care either way, but its not clear to me given the use of capture_base if Capture Base is meant.

type = "spec/overlay/" overlay_name "/" sem_ver
overlay_name = ALPHA
sem_ver = DIGIT "." DIGIT
type = "("spec" / "community)/overlays/" overlay_name "/" sem_ver

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be space for the name of the community that defines the overlay. The term “community” provides no value, and requires all communities to find out if the overlay name they want to use is already in use. Much better to have a per community namespace, and then any overlay name can be used within each community.

Once the community overlays are proposed for promotion into the “core” overlays, the assurance that there is only one “spec” overlay of a given name can be handled by the Working Group.

"capture_base": "EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis",
"type": "spec/overlays/character_encoding/1.0",
"type": "spec/overlays/character_encoding/1.0.2",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed and why the “patch” server? I guess because even though the tools emitted a digest, it is the addition of the d? I think by semver rules, that is a minor update (additional field). That said, in practice, this is a major change — renaming digest to d.

@@ -318,65 +345,24 @@ The inputted format values are dependent on the following core data types as def

_Example 3. Code snippet for a Format Overlay._

##### Information Overlay

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the information overlay being removed? We include it in all our use cases. Reasoning?

@@ -405,7 +384,7 @@ _[language-specific object]_

A Meta Overlay defines any language-specific information about a schema. It is used for discovery and identification and includes elements such as the schema name and description.

In addition to the `capture_base`, `type`, and `language` attributes (see [Common attributes](#common-attributes)), the Meta Overlay SHOULD include the following attributes:
In addition to the [Mandatory attributes](#mandatory-attributes) and [language](#language), the Meta Overlay SHOULD include the following attributes:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make use of the Meta overlay to include a number of other, use case specific values. I think it would be valuable to include that "other name/value pairs MAY be included, and consumers of OCA Bundles MUST ignore any unexpected items”. This allows a producer of an OCA Bundle to include additional Meta data, and consumers to use the data they expect, and ignore the rest (vs. rejecting the overlay because of the extra, unexpected data).

"type": "spec/overlays/meta/1.0",
"language": "en",
"type": "spec/overlays/meta/1.0.2",
"language": "en_UK",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t that be a - not an _?


A Conformance Overlay indicates whether data entry for each attribute is mandatory or optional.

In addition to the `capture_base` and `type` attributes (see [Common attributes](#common-attributes)), the Conformance Overlay MAY include the following attributes:
In addition to the [Mandatory attributes](#mandatory-attributes), the Conformance Overlay MAY include the following attributes:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a matter of interest — in this context, does “Mandatory” mean that the data element MUST be present, or does it mean the data element MUST have a value? In our context (credentials), ALL attributes must be present, but an Issuer might routinely not populate an attribute in a credential. Not crucial, but perhaps that might be clarfied.

Transformation overlays provide information to convert data from one format or structure to another, such as raw data to processed, or unstructured to structured.

##### Attribute Mapping Overlay
#### Attribute Mapping Overlay

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t often use this, but how does the OCA Producer and consumer communicate what is the “other” set of attributes — the ones that are not in the Capture Base? For example, should this have a reference (SAID) for another bundle?

##### Unit Mapping Overlay

A Unit Mapping Overlay defines target units for quantitative data when converting between different units of measurement. Conversion of units is the conversion between different units of measurement for the same quantity, typically through multiplicative conversion factors (see [Code Table for Unit mappings](#code-table-for-unit-mappings) for more information on conversion factors) which change the measured quantity value without changing its effects. The process of conversion depends on the specific situation and the intended purpose. This may be governed by regulation, contract, technical specifications or other published standards.
#### Sensitive Overlay

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in an earlier comment — while I think it makes sense to have this as an overlay vs. built into the Capture Base, is it really worth the confusion that is going to be caused by moving it out?


A Sensitive Overlay defines attributes not necessarily flagged in the Capture Base that need protecting against unwarranted disclosure. For example, data that requires protection for legal or ethical reasons, personal privacy, or proprietary considerations.
OCA Bundles MUST be serializable to be transferred over the network. The

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it have to be serializable? During the calculation of the digests, an interium, deterministic form of the data being hashed needs to be created, but that is not a reason to canonicalize the “at rest” representation of the Bundle. Much better to say that the ordering of items SHOULD NOT be relied upon. It is fighting against nature to try to force an ordering on moving data.

├── EHDwC_Ucuttrsxh2NVptgBnyG4EMbG5D8QsdbeF9G9-M.json
└── meta.json
```
Validation failure must result in the rejection of the bundle as non-compliant with the specification.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big concern of mine, as I’ve expressed multiple times. Once more.

  • Please add to the spec the algorithm for creating a digest for a given “chunk” of JSON.
  • Please do not refer to the CESR/SAID spec for that, but to put the algorithm in the spec. The algorithm is short, and easily defined.
    • Set the value of the digest item to a string # characters of the length the digest willl be
    • Calculate digest = remove_padding (encode ( prefix + hash ( JCS(JSON) ) ) )
    • Note that the OCA Bundle does NOT need to be stored canonicalized — the algoirthm to calculate the SAID will canonicalize the relevant JSON in doing the SAID calculation.
  • In doing that, please require that the hash and encoding algorithms used are embedded in the digest (the SAID prefix is fine, although I would prefer the more standard multiformats (multi base and multi hash).
  • Please specify the specific, and ideally very, hashing and encoding schemes. I would recommend only sha-256 and b58btc encoding, but am fine if others are specifically allowed. Without limiting the algorithms allowed (by version of the OCA specification), it is impossible to write an OCA Consumer that handles whatever a algorithms are used by producers. There are just too many options.
  • Please document the process for calculating the digests for an OCA Bundle. Notably, it must be calculated as follows:
    • Calculate the digest for the Capture Base, and set the value of its digest to the SAID.
    • For each overlay:
      • Set the capture_base value to be the digest of the capture base.
      • Calculate the digest for the overlay, and set the value of its digest to the SAID.
    • Calculate the digest for the entire OCA Bundle, and set the value of the root digest to that SAID.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing from my comment above is the (I think unnecessary) calculation of the length of the OCA Bundle before calculating the the digest of the entire bundle. Thus the last step I have above (Calculate the digest for the entire OCA Bundle…), with the steps:

  • Set the value of the root digest to a string of # characters the length the digest will be.
  • Determine the length of the OCA Bundle by doing this calculation: insert calculation of length of the bundle
  • Set the OCA Version string to be prefix + length of OCA Bundle + suffix (prefix and suffix are hardcoded per OCA Specification Version.

If any consumer of an OCA Bundle cares, they would need to repeat the length calculation and verify it against the length. They are unlikely to do that, because the digest verification would also fail if the OCA Bundle length has been changed.

@@ -1038,6 +969,17 @@ Smith, S. Self-Addressing IDentifier (SAID) (2022) [ https://datatracker.ietf.or
</dd>
</dl>

<dl>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does there need to be a reference to CESR here? The OCA Spec has nothing to do with CESR. If there answer is the need to find the SAID spec — I again urge us to NOT out-source the SAID / digest calculation to another spec. That is just complicating life for everyone — especially implementers. While I’d prefer the use of multihash and multibase for the digest, I’m fine with the use of the SAID algorithm and prefix — I just don’t want to have to try to dig into the spec. for that little bit I need to use in the OCA Spec.

@swcurran
Copy link

I’d also like to see added to the specification an explanation of the concept of OCA Bundle Producers and Consumers. Or perhaps more accurately:

  • OCA Bundle Publishers — entities that create OCA Bundles.
  • Producers of data to which an OCA Bundle applies.
  • Consumers of data to which an OCA Bundle applies.

Hope I helped. I suspect not, but I have to try...

@pknowl
Copy link
Collaborator

pknowl commented Feb 12, 2025

"d" should be written as "digest". The mixed attribute naming convention looks ugly. My OCD will keep triggering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants