-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactoring of the OCA Specification #86
base: master
Are you sure you want to change the base?
Conversation
- add missing `d` field - remove PII and classification which would be introduced as separate overlays
Allow to link overlay to overlay instead of only capture base. This allows to have linking mechanism of entry overlay to entry codes overlay assuring consistancy.
Withotu additional references the model does not bring any value for external reader. The model should be presented as additional recource.
Remove information overlay in favour of presentation layer. See ORFg findings.
Move transformation overlay to community overlays.
It is not needed, and we may add some explanation how implementation handles time internally but should not be relevant for the specification.
- add example - update 639 iso link to point to latest spec version - enforce 639-3 as a main codes for languges in overlays - clean dead references from ISO which are not used
b8ee6cd
to
3b7e080
Compare
Categories are part of the presentation layer and should not be part of the label overlay. Signed-off-by: Robert Mitwicki <[email protected]>
Signed-off-by: Robert Mitwicki <[email protected]>
Signed-off-by: Robert Mitwicki <[email protected]>
Signed-off-by: Robert Mitwicki <[email protected]>
Signed-off-by: Robert Mitwicki <[email protected]>
Signed-off-by: Robert Mitwicki <[email protected]>
attribute mapping is still part of the core spec - it is relatively simple and commonly used but there was proposition to move it out as community overlay, it would be really good if the core spec would be kept very simple and light weight. framing on other side is quite extensive, I would propose to make it right away as community overlay to keep core spec very light - framing is quite extensive and detail functionality, it consist of multiple functions which would significantly increase complexity of the core spec, there is a lot of things which needs to be explain and address for people to understand it clearly For the purpose of the community overlays started preparing overlays registry which would be proposition for storing all the community overlays: https://github.com/the-human-colossus-foundation/overlays-repository/ it is still a draft but ready to be work out. Open for suggestion and propositions. I would suggest to get that ready before we merge this so we can have already solid place for community overlays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the technology of OCA-repo doesn't allow .'s in the attribute name.
If code is king then this should be documented in this section.
If not, then OCA-repo should be adjusted to allow .'s Or OCA-repo should document its deviation from this requirement.
However, I would strongly support allowing .'s into attribute names as it is described here "The string can be any valid Unicode code point." (technically other symbols such as , / \ = etc. can also be expressed as a valid Unicode code point).
docs/specification/README.md
Outdated
distill the most relevant aspects of SAIDs in the context of the OCA | ||
specification. | ||
|
||
#### How to calculate SAID: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplified concept of calculating a SAID.
Review to the CESR specification for complete details of SAID calculation to ensure correct SAID calculations. The summary steps described here are insufficient to correctly calculate a SAID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the SAID calculations there was quite a long discovery and confirmation that this method does not work. Conceptually this is the idea but following it does not result in verifying a calculated SAID.
#58
From Kent Bull: https://kentbull.com/2024/09/22/keri-series-understanding-self-addressing-identifiers-said/
bit boundaries and alphabetic choice also are needed to get the correct SAID.
In an OCA Bundle, can we change "d" to "digest"? It looks strange to have mixed attribute naming methodology in the bundle. |
For the standard overlay, the current standard has limitations. More information, such as provided below would make this standard overlay much more useful. You can include both links that machines can read and follow and be more specific about versions etc.
|
- Character encoding overlay | ||
- Format overlay | ||
``` | ||
OCAS<major><minor><format><size>_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the size calculated? It's orthogonal to versioning? I can't think of a strong motivation to include it at this layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: this is for effective streaming.
If you go beyond HTTP protocol and merely focus on streams of bytes, consuming the whole chunk (Bundle) out of a stream is simply taking the <size>
of bytes off the stream, effectively enabling the transfer of the Bundles over the wire along with other chunks. Furthermore, because we precisely know where to look for particular information in the stream (that's why OCA always had custom canonical form and is not RFC 8785 compliant — Bundle JSON starts with the v
attribute), we can immediately decide which parser can handle this chunk. In this case, OCAS<major><minor><format>
enables us to unambiguously apply the appropriate parser for further handling this chunk.
FWIW, the <size>
is CESR-Base64 encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming = a concern for the messaging layer, not for the application layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that's why we have Bundles. If there's no need for exchange , there's no need for a Bundle concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blelump I'm confused by your response. Message streaming information belongs inside an exchange packet, not inside a schema. As it stands, the "version" format (e.g., "v": "OCAA11JSON00714b_"
) contains the byte size of the messaging stream (OCAS<major><minor><format><size>_
). This is in the wrong place.
OCA is solely for defining passive objects, nothing else. It is not a messaging protocol. Messaging should be defined in exchange packets, not in the data schema itself.
If you follow the Informatics Domain Model. This separation is clearly defined:
https://zenodo.org/records/14525852
Capture = Objects = Schema
Exchange = Actions = Packet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCA is a messaging protocol? In your vision then - schema bundles only exist at the time of transit? There will never be a need for a library of schema bundles? That bundles wouldn't be stored next to data to help describe that data as it is stored? OCA as you envision it exists only for transit? And once a schema with data reaches its destination then is turned into something else?
@carlyh-micb The Bundle concept exists because of the need for exchange. It couples all the tiny pieces, CB and a set of overlays, into one whole that the Bundle author found cohesive and exhaustive.
Bundle became a first-class citizen in the OCA ecosystem precisely because of the data that flows. Bundles and OCA aren't needed if the data doesn't flow. In essence, if there's some data that is stored on a local computer and this is the only copy, OCA is not needed because this data doesn't flow.
In data-that-flow use cases, using or keeping the Bundles by the data receiver is perfectly fine.
I hope the reasoning doesn't require further elaboration and is now clear, especially regarding the need for a Bundle and its lifecycle.
@blelump I'm confused by your response. Message streaming information belongs inside an exchange packet, not inside a schema. As it stands, the "version" format (e.g.,
"v": "OCAA11JSON00714b_"
) contains the byte size of the messaging stream (OCAS<major><minor><format><size>_
). This is in the wrong place.OCA is solely for defining passive objects, nothing else. It is not a messaging protocol. Messaging should be defined in exchange packets, not in the data schema itself.
If you follow the Informatics Domain Model. This separation is clearly defined: https://zenodo.org/records/14525852
Capture = Objects = Schema Exchange = Actions = Packet
@pknowl the term messaging protocol
is quite broad—could you clarify what aspect you're referring to?
As explained above, a Bundle exists merely for the exchange. It's a data container that is well-defined within the protocol for exchange purposes. Specifically, because the OCA brings additional value merely when data flows, Bundle, as the enabler for proper flow, became a first-class citizen in the protocol (defined in the spec).
OCAS<major><minor><format><size>_
is part of this data container to unambiguously find out with what container variant we're dealing with when reading it. Versioning data containers enables proper parsing and backward compatibility and is no different than versioning any other representation or file format. It concerns the OCAS<major><minor>
part. The <format><size>
is then appended to it (in the above example, it is JSON00714b
) to find the representation and msg size unambiguously. This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.
Therefore, from the Bundle receiver perspective, we use the v
attribute to narrow the context specifically to avoid any ambiguity on how to read this Bundle.
Finally, it can be tempting to separate Bundle, a distinct concept serving as a data container for the exchange, from the core spec. However, due to OCA's inherent nature and applicability in ecosystems where data flows, Bundle is a first-class citizen and part of the core spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.
You want to use the Bundle as a wire format? To make that actually viable the JSON serialization and encoding must be specified in detail. We haven't even specified at the Bundle itself needs to be encoded as UTF-8, let alone the subset (with/without BOM), the role of spacing, line endings etc.
Given the current specification the specification you have to parse the JSON itself in order to extract the value from "v". and thus do anything useful with the length. If we do what everyone else does and implement this on a different layer than you can do things like reserve the first N bytes for this metadata, which enables a lot of fun stuff. I've even seen people do this by prefixing the JSON with an a 16-character string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information is valuable when Bundle exchanges through a continuous stream of bytes type of protocols instead of discrete messages, which is characteristic of the HTTP protocol.
You want to use the Bundle as a wire format?
Yes, see below for further explanation.
To make that actually viable the JSON serialization and encoding must be specified in detail. We haven't even specified at the Bundle itself needs to be encoded as UTF-8, let alone the subset (with/without BOM), the role of spacing, line endings etc.
Yeah, we'd need to add this information.
Given the current specification the specification you have to parse the JSON itself in order to extract the value from "v". and thus do anything useful with the length.
Thanks to the Bundle canonical form, we know where to look for specific bytes. We specifically know where to look for <format><size>
counting from the start of the stream. Therefore, we don't need to deserialize the potentially valid JSON string to extract v
.
If we do what everyone else does and implement this on a different layer than you can do things like reserve the first N bytes for this metadata, which enables a lot of fun stuff. I've even seen people do this by prefixing the JSON with an a 16-character string.
This is precisely what we're doing when applying CESR, that is, suffixing the JSON with a sophistically structured text that at first glance looks like garbage. Adding layering here in the context of other components we use the same way and join them, that is: <some payload, i.e., OCA Bundle in JSON><attachments><a VC in JSON><attachments><JSON><attachments><JSON><attachments>
enable us to unambiguously find with what type of document we're dealing in this chain. Enveloping any of these would add more complexity — in most cases; these attachments are digital signatures; therefore, verifying information would first require de-enveloping. Going further, OCA primarily serves as a DDE enabler. When considering its features, we also consider the broader concept of DDE and how to integrate them effectively. At the same time, by providing universal tooling, we relax the entry point to OCA and let people join the ecosystem without the need to implement all this stuff on their own, but instead consume it and use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My $0.02CDN. I’d definitely like to leave off the size of the bundle in the version as it is a pain. Doable if the calculation is well-defined, but annoying at the application layer. I agree that if anyone wants to stream OCA data (which really doesn’t make sense to me), they are welcome to do that by putting a minimal wrapper / prefix that has the size. But it should be outside of the OCA specification.
I definitely agree that a digest
and version
at the same level as capture_base
and overlays
are needed. I’d like the version
defined as simply a semver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Informatics Domain Model (IDM) should be the blueprint for data-centric modeling, not the OCA Bundle itself. The OCA Bundle belongs strictly in the Object
domain (passive) [i.e., no mechanics], and must maintain distinct separation from event logs (Event
domain), active execution algorithms (Intelligence
domain), and framed concepts (Knowledge
domain).
Blurring these domain boundaries creates two major issues:
- Search & Discovery Breakdown
Each domain supports a distinct type of search:
a.) Attribute search (Object) → Finds structural attributes in an OCA Bundle.
b.) Field search (Event) → Queries recorded fields in an event history.
c.) Term search (Concept) → Searches by ontological terms or controlled vocabulary.
d.) Value search (Action) → Retrieves explicit exchange metadata and execution values, which may include:
- Message size (byte length of the payload);
- Location (where the bundle is stored/fetched);
- Routing details (if streaming applies).
Embedding value-based search parameters in the OCA Bundle mixes passive structure (attributes) with active mechanics (values), making searches imprecise.
- Role-Based Access Control (RBAC) Violations
Keeping domains separate ensures granular access control:
In the case of the two domains in question (i.e., Object & Action) ...
a.) Schema Guardians may be appointed to protect structural semantics in an OCA Bundle.
b.) Packet Trackers may be appointed to track message execution in transit (message size, location, routing).
If message metadata is stored inside the OCA Bundle, Schema Guardians would have access to exchange intelligence, violating need-to-know governance.
My suggestion would be to use an envelope for message/transmission metadata, and remove the "v" attribute (Versioning, Encoding Format & Message Size) from the OCA core specification. This would ensure:
✅ Schema Bundles remain purely structural (i.e., made up of passive structural attributes).
✅ Message metadata stays in the Action domain (i.e., within packet headers).
✅ RBAC integrity is preserved.
|
||
#### What are Overlays? | ||
|
||
Overlays are task-specific objects that provide cryptographically-bound layers of definitional or contextual metadata to a Capture Base. Any actor interacting with a published Capture Base can use Overlays to transform how inputted data and metadata are displayed to a viewer or guide an agent in applying a custom process to captured data. | ||
[Overlays](#overlays) are task-specific objects that provide cryptographically-bound layers of definitional or contextual metadata to a [Capture Base](#capture-base). Any actor interacting with a published [Capture Base](#capture-base) can use [Overlays](#overlays) to enrich meaning of the data, transform how inputted data and metadata are displayed to a viewer or guide an agent in applying a custom process to captured data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo — “to enrich the meaning of the data"
"fullName", | ||
"dateOfBirth", | ||
"photoImage" | ||
] | ||
} | ||
``` | ||
|
||
_Example 1. Code snippet for a Capture Base._ | ||
|
||
#### Type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest that “type” should be defined outside of the context of the Capture base, since it applies to all overlays. In the context of Capture Base, the fixed value of its type is all that needs to be defined.
"type": "spec/capture_base/1.0", | ||
"classification": "GICS:45102010", | ||
"d": "EFEDyA__ap51wscacOwATP3c51icUeHT6D0tTbInQI9G", | ||
"type": "spec/capture_base/1.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the version in the Capture Base type should be bumped to 2.0.0, since this is a breaking change definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d add that I’m not in favour of changing the Capture Base data model. While it makes sense to not have the PII flags in the capture base, the value of moving them out now, and breaking all existing implementations is questionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of “d” vs. “digest” is also a breaking change, so that would also force a 2.0.0 update — again with little added value. I don’t think changing anything is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly believe the "v" attribute in it's current format should be removed from the OCA spec entirely. Its format hinders adoption and breaks multiple use cases.
This messaging information belongs in a packet header, not within a passive schema. It’s not even an overlay—it should only function as an envelope, so its inclusion in the core spec is unnecessary.
The Capture Base must support 100% of use cases. If we can’t ensure that foundational flexibility, we’re compromising adoption at the very first hurdle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that v
or ver
or version
is crucial to the spec — but it only needs to be a semver that tells an OCA Bundle consumer “This OCA Bundle is using version x.y.z of the OCA specification”. That is crucial to being able to smoothly transition deployments from one version of the specification to the next — in short, to new add features, and remove old ones. It is relatively easy to write an implementation that can handle multiple versions of the specification if there is a version
. It’s really hard if the software has to “sniff” (check) arbitrary data here and there to determine the version the OCA Bundle Producer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I concur!
#86 (comment)
An attribute name is a string that uniquely identifies an attribute within an OCA layer and is used to reference that attribute by other layers throughout the OCA bundle. The string can be any valid Unicode code point. | ||
Example of a valid attribute name: | ||
- `FullName` | ||
- `person/name/fullName` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see a reference made in the spec to the effect that "the use of /
separators in the attribute name MAY indicate a hierarchy in the capture base, indicating that the capture base attributes define a flattening of that hierarchy”. That will be very useful for many use cases where the data is represented in a (for example) JSON data model, where the nodes above the attributes are necessary to know, but are not relevant in the OCA Bundle.
That note would be an alternative to the use of using a reference to the SAID of another OCA Bundle for representing hierarchical data.
data items or elements of the same data type. When you want to store many pieces | ||
of data that are related and have the same data type, it is often better to use | ||
an array instead of many separate variables (e.g., `Array[Text]`, | ||
`Array[Numeric]`, etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not crucial, but a data type that might be worth adding at this level is a Data URL. Currently, an OCA Bundle creator would use “Text” (although has that disappeared from the list?), and specify the Data URL standard, perhaps with the media type as the “format". Having Data URL as a first class entity would seem to be more useful.
|
||
Any attributes defined in a Capture Base that may contain identifying information about entities (i.e., personally identifiable information (PII) or quasi-identifiable information (QII)) can be flagged. | ||
`Overlay` as a task-specific object provides layers of definitional or contextual metadata. OCA specification recognize two core types of overlays: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to specify that there are two (at least) classes of overlays? I think that is just confusing. Knowing the “core type” (that is not technically defined in the spec) and who’s definition is blurry for many overlays, makes it confusing. I recommend just removing this.
- [ Type ](#type-1) | ||
- Overlay-specific attributes | ||
Overlays `MUST` comprises the following attributes, listed in order to form its canonical serialization: | ||
- `d` - [deterministic identifier](#deterministic-identifier) of the overlay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to d
is a breaking change (meaning all semvers will need to be bumped) with little additional value. Suggest leaving as digest
.
|
||
##### Overlay | ||
|
||
The `overlay` attribute contains the [SAID](#ref-SAID) of the [Overlay](#overlays) to cryptographically anchor to that parent object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the literal overlay
that is used, or some sort of reference. Don’t care either way, but its not clear to me given the use of capture_base
if Capture Base is meant.
type = "spec/overlay/" overlay_name "/" sem_ver | ||
overlay_name = ALPHA | ||
sem_ver = DIGIT "." DIGIT | ||
type = "("spec" / "community)/overlays/" overlay_name "/" sem_ver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be space for the name of the community that defines the overlay. The term “community” provides no value, and requires all communities to find out if the overlay name they want to use is already in use. Much better to have a per community namespace, and then any overlay name can be used within each community.
Once the community overlays are proposed for promotion into the “core” overlays, the assurance that there is only one “spec” overlay of a given name can be handled by the Working Group.
"capture_base": "EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis", | ||
"type": "spec/overlays/character_encoding/1.0", | ||
"type": "spec/overlays/character_encoding/1.0.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed and why the “patch” server? I guess because even though the tools emitted a digest
, it is the addition of the d
? I think by semver rules, that is a minor update (additional field). That said, in practice, this is a major change — renaming digest
to d
.
@@ -318,65 +345,24 @@ The inputted format values are dependent on the following core data types as def | |||
|
|||
_Example 3. Code snippet for a Format Overlay._ | |||
|
|||
##### Information Overlay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the information overlay being removed? We include it in all our use cases. Reasoning?
@@ -405,7 +384,7 @@ _[language-specific object]_ | |||
|
|||
A Meta Overlay defines any language-specific information about a schema. It is used for discovery and identification and includes elements such as the schema name and description. | |||
|
|||
In addition to the `capture_base`, `type`, and `language` attributes (see [Common attributes](#common-attributes)), the Meta Overlay SHOULD include the following attributes: | |||
In addition to the [Mandatory attributes](#mandatory-attributes) and [language](#language), the Meta Overlay SHOULD include the following attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make use of the Meta overlay to include a number of other, use case specific values. I think it would be valuable to include that "other name/value pairs MAY be included, and consumers of OCA Bundles MUST ignore any unexpected items”. This allows a producer of an OCA Bundle to include additional Meta data, and consumers to use the data they expect, and ignore the rest (vs. rejecting the overlay because of the extra, unexpected data).
"type": "spec/overlays/meta/1.0", | ||
"language": "en", | ||
"type": "spec/overlays/meta/1.0.2", | ||
"language": "en_UK", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t that be a -
not an _
?
|
||
A Conformance Overlay indicates whether data entry for each attribute is mandatory or optional. | ||
|
||
In addition to the `capture_base` and `type` attributes (see [Common attributes](#common-attributes)), the Conformance Overlay MAY include the following attributes: | ||
In addition to the [Mandatory attributes](#mandatory-attributes), the Conformance Overlay MAY include the following attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a matter of interest — in this context, does “Mandatory” mean that the data element MUST be present, or does it mean the data element MUST have a value? In our context (credentials), ALL attributes must be present, but an Issuer might routinely not populate an attribute in a credential. Not crucial, but perhaps that might be clarfied.
Transformation overlays provide information to convert data from one format or structure to another, such as raw data to processed, or unstructured to structured. | ||
|
||
##### Attribute Mapping Overlay | ||
#### Attribute Mapping Overlay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t often use this, but how does the OCA Producer and consumer communicate what is the “other” set of attributes — the ones that are not in the Capture Base? For example, should this have a reference (SAID) for another bundle?
##### Unit Mapping Overlay | ||
|
||
A Unit Mapping Overlay defines target units for quantitative data when converting between different units of measurement. Conversion of units is the conversion between different units of measurement for the same quantity, typically through multiplicative conversion factors (see [Code Table for Unit mappings](#code-table-for-unit-mappings) for more information on conversion factors) which change the measured quantity value without changing its effects. The process of conversion depends on the specific situation and the intended purpose. This may be governed by regulation, contract, technical specifications or other published standards. | ||
#### Sensitive Overlay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in an earlier comment — while I think it makes sense to have this as an overlay vs. built into the Capture Base, is it really worth the confusion that is going to be caused by moving it out?
|
||
A Sensitive Overlay defines attributes not necessarily flagged in the Capture Base that need protecting against unwarranted disclosure. For example, data that requires protection for legal or ethical reasons, personal privacy, or proprietary considerations. | ||
OCA Bundles MUST be serializable to be transferred over the network. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be serializable? During the calculation of the digests, an interium, deterministic form of the data being hashed needs to be created, but that is not a reason to canonicalize the “at rest” representation of the Bundle. Much better to say that the ordering of items SHOULD NOT be relied upon. It is fighting against nature to try to force an ordering on moving data.
├── EHDwC_Ucuttrsxh2NVptgBnyG4EMbG5D8QsdbeF9G9-M.json | ||
└── meta.json | ||
``` | ||
Validation failure must result in the rejection of the bundle as non-compliant with the specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big concern of mine, as I’ve expressed multiple times. Once more.
- Please add to the spec the algorithm for creating a
digest
for a given “chunk” of JSON. - Please do not refer to the CESR/SAID spec for that, but to put the algorithm in the spec. The algorithm is short, and easily defined.
- Set the value of the
digest
item to a string#
characters of the length the digest willl be - Calculate
digest = remove_padding (encode ( prefix + hash ( JCS(JSON) ) ) )
- Note that the OCA Bundle does NOT need to be stored canonicalized — the algoirthm to calculate the SAID will canonicalize the relevant JSON in doing the SAID calculation.
- Set the value of the
- In doing that, please require that the hash and encoding algorithms used are embedded in the digest (the SAID prefix is fine, although I would prefer the more standard multiformats (multi base and multi hash).
- Please specify the specific, and ideally very, hashing and encoding schemes. I would recommend only sha-256 and b58btc encoding, but am fine if others are specifically allowed. Without limiting the algorithms allowed (by version of the OCA specification), it is impossible to write an OCA Consumer that handles whatever a algorithms are used by producers. There are just too many options.
- Please document the process for calculating the digests for an OCA Bundle. Notably, it must be calculated as follows:
- Calculate the digest for the Capture Base, and set the value of its
digest
to the SAID. - For each overlay:
- Set the
capture_base
value to be thedigest
of the capture base. - Calculate the
digest
for the overlay, and set the value of itsdigest
to the SAID.
- Set the
- Calculate the digest for the entire OCA Bundle, and set the value of the root
digest
to that SAID.
- Calculate the digest for the Capture Base, and set the value of its
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing from my comment above is the (I think unnecessary) calculation of the length of the OCA Bundle before calculating the the digest
of the entire bundle. Thus the last step I have above (Calculate the digest for the entire OCA Bundle…
), with the steps:
- Set the value of the root
digest
to a string of#
characters the length the digest will be. - Determine the length of the OCA Bundle by doing this calculation:
insert calculation of length of the bundle
- Set the OCA Version string to be
prefix
+ length of OCA Bundle +suffix
(prefix and suffix are hardcoded per OCA Specification Version.
If any consumer of an OCA Bundle cares, they would need to repeat the length calculation and verify it against the length. They are unlikely to do that, because the digest
verification would also fail if the OCA Bundle length has been changed.
@@ -1038,6 +969,17 @@ Smith, S. Self-Addressing IDentifier (SAID) (2022) [ https://datatracker.ietf.or | |||
</dd> | |||
</dl> | |||
|
|||
<dl> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does there need to be a reference to CESR here? The OCA Spec has nothing to do with CESR. If there answer is the need to find the SAID spec — I again urge us to NOT out-source the SAID / digest
calculation to another spec. That is just complicating life for everyone — especially implementers. While I’d prefer the use of multihash and multibase for the digest
, I’m fine with the use of the SAID algorithm and prefix — I just don’t want to have to try to dig into the spec. for that little bit I need to use in the OCA Spec.
I’d also like to see added to the specification an explanation of the concept of OCA Bundle Producers and Consumers. Or perhaps more accurately:
Hope I helped. I suspect not, but I have to try... |
"d" should be written as "digest". The mixed attribute naming convention looks ugly. My OCD will keep triggering! |
This is a major update of the specification bring clarity and structure.
Below the summary list of the major changes which should be discussed and review with care:
Features:
Others