-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Section 6.1.2 Taxon Names and Identifiers standard name for biological taxon identifier #308
Comments
Thanks for spotting this inconsistency, @MathewBiddle. What you say seems fine to me, but it would good if @roy-lowry could confirm that this is indeed the case (and this is not indicative of something deeper ....). Once Roy has responded, a pull request would be most welcome, thanks. |
These things are never as simple as they look at first. I've been back to the original discussion on Trac and the history here is that biological_taxon_identifier was the original proposal for the identifier Standard Name allowing identifiers for any standard to be used. This was criticised because users of the data had no way of knowing how to resolve the identifier and so the strategy switched to providing information on how to resolve the identifier through the adoption of LSIDs. Unfortunately, when I set up the Standard Names I screwed up by setting up the request early on in the discussion (circa 2013!) asking for biological_taxon_identifier and then forgetting to update it to reflect the subsequent Trac discussion. So, in a nutshell the Conventions Document is correct but the Standard Name is wrong. The fix would be to deprecate biological_taxon_identifer and alias it to a new Standard Name biological_taxon_lsid. This would require references to 'biological_taxon_identifier' in several Standard Name descriptions changing to 'biological_taxon_lsid'. Shall I set this in motion? @japamment If the answer to the above question is 'yes' can I do it through this ticket or do I need to open a new one. |
Also see #309 |
I am happy for the inconsistency to be fixed with an alias, as @roy-lowry suggests. |
The following is the revised Standard Name specification - I'll put here as a placeholder at least: biological_taxon_lsid "Biological taxon" is a name or other label identifying an organism or a group of organisms as belonging to a unit of classification in a hierarchical taxonomy. The quantity with standard name biological_taxon_lsid is the machine-readable identifier based on a taxon registration system using the syntax convention specified for the Life Science Identifier (LSID) - urn:lsid:::[:]. This includes the reference classification in the element and these are restricted by the LSID governance. It is strongly recommended in CF that the authority chosen is World Register of Marine Species (WoRMS) for oceanographic data and Integrated Taxonomic Information System (ITIS) for freshwater and terrestrial data. See Section 6.1.2 of the CF convention (version 1.8 or later) for information about biological taxon auxiliary coordinate variable. This identifier is a narrower equivalent to the scientificNameID field in the Darwin Core Standard. biological_taxon_lsid should replace biological_taxon_identifer by alias and also as text in the descriptions of Standard Names: colony_forming_unit_number_concentration_of_biological_taxon_in_sea_water
|
My limited GitHub skills have caused the LSID syntax not to render correctly due to embedded chevrons This is it with curly brackets instead of chevrons so it renders correctly. urn:lsid:{Authority}:{Namespace}:{ObjectID}[:{Version}] |
Hi Roy, thanks. (Putting stuff in backticks usually does the trick: `urn:lsid:<Authority>:<Namespace>:<ObjectID>[:<Version>]` renders as |
Hi all, Could you please verify if the following example could be viable? We are planning to include/suggest the following TERM at some point but need some help. Term: number_concentration_of_prochlorococcus_in_sea_water -Definition: "Number concentration" means the number of particles or other specified objects per unit volume. Abundance of Prochlorococcus (ITIS: 610076: WoRMS 345515) per unit volume of the water body by flow cytometry. Number of particles resolved as the cyanobacteria Prochlorococcus cells in a unit volume of any body of fresh or saltwater determined by flow cytometry analysis of unstained samples (NERC-1).’ -Units: [m-3] -References: |
@fcarvalhopacheco That is an invalid Standard Name as it includes a taxon name. What you need is an array with taxon as one of its dimensions containing the abundances with the Standard Name number_concentration_of_biological_taxon_in_sea_water. The taxon co-ordinate has two vectors with Standard Names biological_taxon_name and biological_taxon_lsid (currently erroneously called biological_taxon_identifier - the subject of this defect, which will hopefully be fixed in the near future) carrying the text name and the LSID for each taxon. This means we don't need 200 Standard Names for a dataset with abundances of 200 taxa. WoRMS is the preferred authority for marine organism LSIDs. Think of the data as a spreadsheet with abundances in the cells and columns called biological_taxon_name and biological_taxon_LSID There's a skeleton example in Section 6.1.2 of the Conventions Document version 1.8. There is a complication in cases where the data set contains data for biological entities that aren't taxa such as picophytoplankton. Each of these needs its own Standard Name for each measurement. I'm not totally comfortable with this. When I started setting up the taxon conventions back in 2013 I wanted all biological entities to be allowed, but this was rejected because they would be unconstrained plaintext labels and this was considered too loose for CF. A suggestion to constrain against the S25 vocabulary with BODC as the authority was also not well received. In the past few weeks I looked for support to treat all biological entities as taxa but got none and am not in a position to try to take it forward myself. Does that help? |
Thanks, @roy-lowry for the reply! So we don't need to create anything new, we just need to use the Standard Name: "variables" (still need to be confirmed)
Please, see if the following example for "Prochlorococcus" would be valid for our case: dimension: |
@fcarvalhopacheco I think you've got it!! You can certainly add three more taxa as you suggest - even 30 or 300 more taxa, preventing a massive propagation of new Standard Names that I feared would become unsustainable.. |
One minor point - the name for 160572 should be just Synechococcus (it's the Genus - the Nägeli is part of the name reference for the taxon, not part of the Genus name. |
@roy-lowry. Thanks! thats great. I will pass this information along |
Back to the original question posted above. Which term should we be using for files we are generating now? This is what we have right now, which will pass CF checkers but is not aligned with the guidance:
Updated to include data. |
The slightly embarrassing answer is biological_taxon_lsid. However, this will fail compliance checkers because the defect correction specified above last November still hasn't been actioned. I did issue an e-mail reminder and was promised it would be in the next Standard Name update which I think has has passed. However, I've just checked and nothing has changed. @japamment Could we please get this defect corrected? |
@roy-lowry @japamment Do you know if this will be an adjustment to the existing tables (v71 - v77), or will we have to wait until v78 is released? |
@japamment @feggleton @davidhassell Happy New Year to you too, it’s good to hear from you. Thanks for drawing my attention to this one and apologies for missing it – Fran and I went through all the open standard names issues in the discuss repo on Monday to see which ones could be finalised, but I must admit we didn’t do the same with the conventions repo. I’ll pick this one up from the existing ticket (no need to start a new one) and make sure it gets progressed in time for the next update (i.e. not next week I’m afraid, as I don’t want to add new content after announcing it, but the next update in Feb/March). Hope that’s okay. Actually this is a very useful reminder, as I know there are some other standard name related conventions issues that need tidying up, so it would be good to try and resolve those over the next few weeks. Cheers, Nothing has happened. I have e-mailed several times since to ask about progress, but received no responses making me wonder if my e-mails were falling foul of a spam filter. Consequently, I'm trying a comment here as an alternative form of communication. |
@roy-lowry my apologies and thank you for keeping this one on the radar. This ticket has now been actioned and biological_taxon_identifier will be turned into an alias of biological_taxon_lsid in the next standard names update. I have copied the syntax of the urn from an earlier post by @davidhassell - please can you check the CEDA editor to ensure the definition text contains the correct urn? I have also updated the definitions of the other 'taxon' names to refer to biological_taxon_lsid. |
@japamment Many, many thanks. Yes, David correctly fixed my attempt using unescaped chevrons so what you have in the CEDA editor is correct.. |
In reading Section 6.1.2 Taxon Names and Identifiers, the second paragraph (and skeleton example) describe using the CF standard name
biological_taxon_lsid
. However, in CF standard name table v76, the term isbiological_taxon_identifier
. I assume the documentation should be updated?I can put in a pull request for the change, if that's appropriate.
The text was updated successfully, but these errors were encountered: