Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section 6.1.2 Taxon Names and Identifiers standard name for biological taxon identifier #308

Closed
MathewBiddle opened this issue Nov 19, 2020 · 19 comments
Labels
defect Conventions text meaning not as intended, misleading, unclear, has typos, format or language errors

Comments

@MathewBiddle
Copy link

MathewBiddle commented Nov 19, 2020

In reading Section 6.1.2 Taxon Names and Identifiers, the second paragraph (and skeleton example) describe using the CF standard name biological_taxon_lsid. However, in CF standard name table v76, the term is biological_taxon_identifier. I assume the documentation should be updated?

I can put in a pull request for the change, if that's appropriate.

@davidhassell davidhassell added the defect Conventions text meaning not as intended, misleading, unclear, has typos, format or language errors label Nov 19, 2020
@davidhassell
Copy link
Contributor

Thanks for spotting this inconsistency, @MathewBiddle.

What you say seems fine to me, but it would good if @roy-lowry could confirm that this is indeed the case (and this is not indicative of something deeper ....).

Once Roy has responded, a pull request would be most welcome, thanks.

@roy-lowry
Copy link

These things are never as simple as they look at first. I've been back to the original discussion on Trac and the history here is that biological_taxon_identifier was the original proposal for the identifier Standard Name allowing identifiers for any standard to be used. This was criticised because users of the data had no way of knowing how to resolve the identifier and so the strategy switched to providing information on how to resolve the identifier through the adoption of LSIDs.

Unfortunately, when I set up the Standard Names I screwed up by setting up the request early on in the discussion (circa 2013!) asking for biological_taxon_identifier and then forgetting to update it to reflect the subsequent Trac discussion.

So, in a nutshell the Conventions Document is correct but the Standard Name is wrong. The fix would be to deprecate biological_taxon_identifer and alias it to a new Standard Name biological_taxon_lsid. This would require references to 'biological_taxon_identifier' in several Standard Name descriptions changing to 'biological_taxon_lsid'.

Shall I set this in motion?

@japamment If the answer to the above question is 'yes' can I do it through this ticket or do I need to open a new one.

@roy-lowry
Copy link

Also see #309

@davidhassell
Copy link
Contributor

I am happy for the inconsistency to be fixed with an alias, as @roy-lowry suggests.

@roy-lowry
Copy link

The following is the revised Standard Name specification - I'll put here as a placeholder at least:

biological_taxon_lsid

"Biological taxon" is a name or other label identifying an organism or a group of organisms as belonging to a unit of classification in a hierarchical taxonomy. The quantity with standard name biological_taxon_lsid is the machine-readable identifier based on a taxon registration system using the syntax convention specified for the Life Science Identifier (LSID) - urn:lsid:::[:]. This includes the reference classification in the element and these are restricted by the LSID governance. It is strongly recommended in CF that the authority chosen is World Register of Marine Species (WoRMS) for oceanographic data and Integrated Taxonomic Information System (ITIS) for freshwater and terrestrial data. See Section 6.1.2 of the CF convention (version 1.8 or later) for information about biological taxon auxiliary coordinate variable. This identifier is a narrower equivalent to the scientificNameID field in the Darwin Core Standard.

biological_taxon_lsid should replace biological_taxon_identifer by alias and also as text in the descriptions of Standard Names:

colony_forming_unit_number_concentration_of_biological_taxon_in_sea_water
mass_concentration_of_biological_taxon_expressed_as_nitrogen_in_sea_water
mole_concentration_of_biological_taxon_expressed_as_carbon_in_sea_water
number_concentration_of_biological_taxon_in_sea_water
mass_concentration_of_biological_taxon_expressed_as_chlorophyll_in_sea_water
mass_concentration_of_biological_taxon_expressed_as_carbon_in_sea_water
mole_concentration_of_biological_taxon_expressed_as_nitrogen_in_sea_water

@roy-lowry
Copy link

My limited GitHub skills have caused the LSID syntax not to render correctly due to embedded chevrons

This is it with curly brackets instead of chevrons so it renders correctly.

urn:lsid:{Authority}:{Namespace}:{ObjectID}[:{Version}]

@davidhassell
Copy link
Contributor

Hi Roy, thanks.

(Putting stuff in backticks usually does the trick: `urn:lsid:<Authority>:<Namespace>:<ObjectID>[:<Version>]` renders as urn:lsid:<Authority>:<Namespace>:<ObjectID>[:<Version>]. I used a generous sprinkling of protecting \ to get the chevrons and backticks to appear as plain text in the first version, but perhaps the backticks-only version is quicker)

@fcarvalhopacheco
Copy link

fcarvalhopacheco commented Dec 9, 2020

Hi all,

Could you please verify if the following example could be viable? We are planning to include/suggest the following TERM at some point but need some help.


Term: number_concentration_of_prochlorococcus_in_sea_water

-Definition: "Number concentration" means the number of particles or other specified objects per unit volume. Abundance of Prochlorococcus (ITIS: 610076: WoRMS 345515) per unit volume of the water body by flow cytometry. Number of particles resolved as the cyanobacteria Prochlorococcus cells in a unit volume of any body of fresh or saltwater determined by flow cytometry analysis of unstained samples (NERC-1).’

-Units: [m-3]

-References:
NERC-1:http://vocab.nerc.ac.uk/collection/P01/current/P701A90Z/4/
NERC-2:http://vocab.nerc.ac.uk/collection/F02/current/F0200002/1/


@roy-lowry
Copy link

@fcarvalhopacheco That is an invalid Standard Name as it includes a taxon name. What you need is an array with taxon as one of its dimensions containing the abundances with the Standard Name number_concentration_of_biological_taxon_in_sea_water. The taxon co-ordinate has two vectors with Standard Names biological_taxon_name and biological_taxon_lsid (currently erroneously called biological_taxon_identifier - the subject of this defect, which will hopefully be fixed in the near future) carrying the text name and the LSID for each taxon. This means we don't need 200 Standard Names for a dataset with abundances of 200 taxa. WoRMS is the preferred authority for marine organism LSIDs. Think of the data as a spreadsheet with abundances in the cells and columns called biological_taxon_name and biological_taxon_LSID

There's a skeleton example in Section 6.1.2 of the Conventions Document version 1.8.

There is a complication in cases where the data set contains data for biological entities that aren't taxa such as picophytoplankton. Each of these needs its own Standard Name for each measurement. I'm not totally comfortable with this. When I started setting up the taxon conventions back in 2013 I wanted all biological entities to be allowed, but this was rejected because they would be unconstrained plaintext labels and this was considered too loose for CF. A suggestion to constrain against the S25 vocabulary with BODC as the authority was also not well received. In the past few weeks I looked for support to treat all biological entities as taxa but got none and am not in a position to try to take it forward myself.

Does that help?

@fcarvalhopacheco
Copy link

fcarvalhopacheco commented Dec 10, 2020

Thanks, @roy-lowry for the reply!

So we don't need to create anything new, we just need to use the Standard Name: number_concentration_of_biological_taxon_in_sea_water, including thebiological_taxon_name and the biological_taxon_lsid for each of our "variables"(see below)

"variables" (still need to be confirmed)

"Prochlorococcus" = "urn:lsid:marinespecies.org:taxname:345515"
"Bacteria" = "urn:lsid:marinespecies.org:taxname:6"
"Synechococcus" = "urn:lsid:marinespecies.org:taxname:160572"
"Cyanobacteria " = "urn:lsid:marinespecies.org:taxname:146537"


Please, see if the following example for "Prochlorococcus" would be valid for our case:


dimension:
time = 100 ;
string80 = 80 ;
taxon = 1 ; "Can we include the other 3 taxon here? So total = 4""
variables:
float time(time);
time:standard_name = "time" ;
time:units = "days since 2019-01-01" ;
float abundance(time,taxon) ;
abundance:standard_name = "number_concentration_of_organisms_in_taxon_in_sea_water" ;
abundance:coordinates = "taxon_lsid taxon_name" ;
char taxon_name(taxon,string80) ;
taxon_name:standard_name = "biological_taxon_name" ;
char taxon_lsid(taxon,string80) ;
taxon_lsid:standard_name = "biological_taxon_lsid" ;
data:
time = // 100 values ;
abundance = // 200 values ;
taxon_name = "Prochlorococcus"; "Can we include the other 3 taxon_name here?""
taxon_lsid = "urn:lsid:marinespecies.org:taxname:345515"; "Can we include the other 3 taxon_lsid here?"

@roy-lowry
Copy link

@fcarvalhopacheco I think you've got it!! You can certainly add three more taxa as you suggest - even 30 or 300 more taxa, preventing a massive propagation of new Standard Names that I feared would become unsustainable..

@roy-lowry
Copy link

roy-lowry commented Dec 10, 2020

One minor point - the name for 160572 should be just Synechococcus (it's the Genus - the Nägeli is part of the name reference for the taxon, not part of the Genus name.

@fcarvalhopacheco
Copy link

@roy-lowry. Thanks! thats great. I will pass this information along

@MathewBiddle
Copy link
Author

MathewBiddle commented May 3, 2021

Back to the original question posted above. Which term should we be using for files we are generating now?

This is what we have right now, which will pass CF checkers but is not aligned with the guidance:

	string taxon_lsid(obs) ;
		taxon_lsid:standard_name = "biological_taxon_identifier" ;
		taxon_lsid:long_name = "Namespaced Taxon Identifier" ;
		taxon_lsid:source = "WoRMS (2021). Halichoerus grypus (Fabricius, 1791). Accessed at: http://www.marinespecies.org/aphia.php?p=taxdetails&id=137080 on 2021-04-30" ;
		taxon_lsid:url = "http://www.marinespecies.org/aphia.php?p=taxdetails&id=137080" ;

\\ global attributes:
		:standard_name_vocabulary = "CF Standard Name Table v77" ;

data:

 taxon_lsid = "urn:lsid:marinespecies.org:taxname:137080",

Updated to include data.

@roy-lowry
Copy link

The slightly embarrassing answer is biological_taxon_lsid. However, this will fail compliance checkers because the defect correction specified above last November still hasn't been actioned. I did issue an e-mail reminder and was promised it would be in the next Standard Name update which I think has has passed. However, I've just checked and nothing has changed.

@japamment Could we please get this defect corrected?

@MathewBiddle
Copy link
Author

@roy-lowry @japamment Do you know if this will be an adjustment to the existing tables (v71 - v77), or will we have to wait until v78 is released?

@roy-lowry
Copy link

@japamment @feggleton @davidhassell
This is an e-mail I received on this issue in January
Hi Roy,

Happy New Year to you too, it’s good to hear from you.

Thanks for drawing my attention to this one and apologies for missing it – Fran and I went through all the open standard names issues in the discuss repo on Monday to see which ones could be finalised, but I must admit we didn’t do the same with the conventions repo. I’ll pick this one up from the existing ticket (no need to start a new one) and make sure it gets progressed in time for the next update (i.e. not next week I’m afraid, as I don’t want to add new content after announcing it, but the next update in Feb/March). Hope that’s okay.

Actually this is a very useful reminder, as I know there are some other standard name related conventions issues that need tidying up, so it would be good to try and resolve those over the next few weeks.

Cheers,
Alison

Nothing has happened. I have e-mailed several times since to ask about progress, but received no responses making me wonder if my e-mails were falling foul of a spam filter. Consequently, I'm trying a comment here as an alternative form of communication.

@japamment
Copy link
Member

@roy-lowry my apologies and thank you for keeping this one on the radar. This ticket has now been actioned and biological_taxon_identifier will be turned into an alias of biological_taxon_lsid in the next standard names update. I have copied the syntax of the urn from an earlier post by @davidhassell - please can you check the CEDA editor to ensure the definition text contains the correct urn?

I have also updated the definitions of the other 'taxon' names to refer to biological_taxon_lsid.

@roy-lowry
Copy link

@japamment Many, many thanks. Yes, David correctly fixed my attempt using unescaped chevrons so what you have in the CEDA editor is correct..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Conventions text meaning not as intended, misleading, unclear, has typos, format or language errors
Projects
None yet
Development

No branches or pull requests

6 participants