Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about how to improve UCUM #648

Open
cmungall opened this issue Nov 8, 2022 · 7 comments
Open

Discussion about how to improve UCUM #648

cmungall opened this issue Nov 8, 2022 · 7 comments
Labels

Comments

@cmungall
Copy link
Contributor

cmungall commented Nov 8, 2022

I mentioned this briefly in #460.

For more background on UCUM see

This is not a straightforward one, but it's very important to get this right.

First, as far as I can tell there is not an official resolver, NLM do offer some services though

https://ucum.nlm.nih.gov/

Second, the syntax of a UCUM code does not necessarily conform to the syntax for CURIEs. This is less important for using bioregistry as a web resolver, but it's important if we want to standardize how UCUM codes are written as CURIEs. Bioregistry is the best hope at achieving consensus on this. At the moment, some groups are starting to simply write pseudo-curies that ignore W3C syntax. E.g. http://phenopackets.org/phenopacket-tools/constants.html#unit

There is a group in OBO (see units channel, cc @jamesaoverton) who have worked for some time to develop a standard way of writing units as URIs, see https://units-of-measurement.org/

Example: https://units-of-measurement.org/dL.g-1

There is no bioregistry entry for this system, but it is registered with w3id as uom, thus: https://w3id.org/uom/dL.g-1

There is a separate issue for this group to spec out the rules for encoding UCUM codes as CURIEs/URIs:

units-of-measurement/units-of-measurement#45

@cthoyt
Copy link
Member

cthoyt commented Nov 8, 2022

Not sure where the confusion was, but this prefix already has been registered: http://bioregistry.io/registry/ucum

@cmungall
Copy link
Contributor Author

cmungall commented Nov 8, 2022

oh wow I strongly recommend you mark this as experimental or something there are a lot of issues here, this confuses authority with resolvers, and the majority of CURIEs don't resolve, see encoding issues above

@cthoyt
Copy link
Member

cthoyt commented Nov 8, 2022

this confuses authority with resolvers

Right now the Bioregistry doesn't explicitly keep track of whether providers are first-party, but if you think this would give records more context then we can start tracking that.

and the majority of CURIEs don't resolve, see encoding issues above

Yup can confirm. Several of them don't resolve on the units-of-measures but I'm not sure if this means that they're invalid within the nomenclature itself. I'll keep up with the discussion here and on slack and try to support whatever solution comes out as good as possible.

@cmungall
Copy link
Contributor Author

cmungall commented Nov 8, 2022

Yup can confirm. Several of them don't resolve on the units-of-measures but I'm not sure if this means that they're invalid within the nomenclature itself.

the examples are all valid UCUM codes but they are not all valid UOM CURIEs (and not valid CURIEs at all). The unofficial u-o-m resolver expects these to be percent-encoded (and normalized to exponent form)

@cthoyt
Copy link
Member

cthoyt commented Nov 9, 2022

Alright, then I'll be happy to accept specific suggestions on improvements to this record!

@cthoyt cthoyt changed the title Add prefix: UCUM Discussion about how to improve UCUM Nov 9, 2022
@kaiiam
Copy link

kaiiam commented Nov 9, 2022

The unofficial u-o-m resolver expects these to be percent-encoded (and normalized to exponent form)

What you are referring to are the UOM final canonical IRIs/CURIEs. Our the software/server allows you to generate them by putting in any UCUM code and then it will create the normalized exponent for you. units-of-measurement/units-of-measurement#48 now merged into UOM clarifies this well enough I think. As for UOM resolving it's not completely finished so not all cases work but for most units it works. E.g. m/d/s becomes -> https://units-of-measurement.org/m.s-1.d-1. In the future this will resolve for all UCUM codes. Hope that helps.

As for you how treat things on bio-registry that's another story I can't comment other than UOM isn't officially endorsed by UCUM, but UOM is allowed to use UCUM.

cthoyt added a commit to gyorilab/mira that referenced this issue Nov 15, 2022
- use proper curies
- skip UMUC until biopragmatics/bioregistry#648 is resolved
- add dual labels
cthoyt added a commit to gyorilab/mira that referenced this issue Nov 15, 2022
- use proper curies
- skip UMUC until biopragmatics/bioregistry#648 is resolved
- add dual labels
@cmungall
Copy link
Contributor Author

Looks like the majority of these still don't work

https://bioregistry.io/registry/ucum

Anything with a slash results in a server error: https://bioregistry.io/reference/ucum:dL/g (note that dL/g is not a canonical UOM serialization but it is valid UCUM)

This resolves
https://bioregistry.io/reference/ucum:%25

But when we try and resolve with a default provider it gets a 404 https://units-of-measurement.org/%

Surprisingly this works: https://bioregistry.io/reference/ucum:[diop], despite ucum:[diop] not being a valid CURIE (biopragmatics/curies#103). This redirects to https://units-of-measurement.org/[diop], which works, but the actual URL that should be used is encoded https://w3id.org/uom/%5Bdiop%5D,

The more correct https://bioregistry.io/reference/ucum:%5Bdiop%5D also works

There is also the issue that the "default provider" for UCUM is UOM. This is a little problematic. I think the current prefix should be UOM not UCUM, and the regex should forbid [diop] as a local/reference id. You may still want to have a separate entry for UCUM that resolves to official UCUM URLs but these are not precisely the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants