-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map all PRO terms used in CL to uniprot (where possible). #2293
Comments
@cmungall - any suggestions for strategy? |
It might be useful to ask Darren @nataled |
I'll overlook the "something the rest of the world can use" comment ;) The results of that SPARQL query fall into two types:
For the first set, no single UniProtKB mapping is appropriate. Are you trying to obtain all the possible UniProtKB entries pertinent to those xrefs? |
@nataled - many thanks for the details. Various uses. In general including IDs that bioinformaticians are familiar with opens up more possibilities for them to use markers recorded in CL in their analyses. More specifically, we're working on a Cell Type knowledge base with a focus on cell markers in human and mouse. We have other sources of known and potential markers - curated and computed. I'd like to find some way to fold in curated cell surface markers from CL. It looks to me like in most cases 'family' here means a general term for the gene across species.
It also looks like we could pull the mouse and human uniprot IDs from the PIR pages: https://proteininformationresource.org/cgi-bin/ipcSF?id=PIRSF016630. Is there an API option? If not we will scrape. This will work for our KB plans. I think also useful to include these IDs in CL under some AP. |
Seems we can use the structure of PRO to extract many of these, e.g. https://api.triplydb.com/s/WGSZidIVe
The subclasses are not (currently ) in the import & even if they were, we should still find some way to better support bioinformatician users. From looking at the numbers, this won't work in every case, but is a good start. Suggested mechanism to extract: For all PRO terms used as markers for CL terms:
TBD: Accessible representation in CL. |
CC @AvolaAmg |
Yes, I believe most of the pr terms used in cl are category=gene and follow
a stereotypical text definition marking them as the product of the
reflexive ontolog of the human gene
Eg
https://www.ebi.ac.uk/ols4/ontologies/pr/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPR_000001408?lang=en
Ideally pro would have logical definitions for these, which would make
tracing back easier. Should be easy to do this via string matching but
ideally this would be done upstream of pro
Another idea would be pro releases sssom with inferred downward mappings
for all category=gene
…On Sat, Feb 24, 2024 at 8:51 AM David Osumi-Sutherland < ***@***.***> wrote:
Seems we can use the structure of PRO to extract many of these, e.g.
https://api.triplydb.com/s/WGSZidIVe
The subclasses are not (currently ) in the import & even if the were, we
should still find some way to better support bioinformatician users.
—
Reply to this email directly, view it on GitHub
<#2293 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOOXDNT2JQPQVXSAID3YVIK7HAVCNFSM6AAAAABDXDR2HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGQZDCMJWHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The file containing PIRSF membership can be found at https://proteininformationresource.org/projects/pirsf/. Note that the identifiers in this file don't contain 'PIR' (so, 'SF001234' instead of 'PIRSF001234'). This file goes beyond human and mouse, if that's what you need. If you only want human and mouse, then you can use our 'descendants' API for PRO: https://lod.proconsortium.org/api.html#/DAG/getDescendantByProIDs which is part of a larger set of APIs given here: https://lod.proconsortium.org/api.html You'll want to focus on the terms with local IDs that have UniProtKB accessions without a dash. |
This issue has not seen any activity in the past 6 months; it will be closed automatically in one year from now if no action is taken. |
We need to be able to map PRO terms used by CL to something the rest of the world can use. I think that means uniprot. Xrefs to uniprot are rare:
https://api.triplydb.com/s/tuAThwx4i
We mostly have xrefs to
Where we can't map based on ID, I think we may need to resort to lexical mapping. One option for this is GILDA.
@addiehl - any other suggestions based on your prior work on these + other linked resources?
The text was updated successfully, but these errors were encountered: