Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm the new WoS API meets the minimum requirements for our current use cases #231

Closed
7 tasks done
peetucket opened this issue Oct 2, 2017 · 4 comments
Closed
7 tasks done
Assignees
Labels

Comments

@peetucket
Copy link
Member

peetucket commented Oct 2, 2017

@dazza-codes
Copy link
Contributor

A record for Russ Altman contains a WoS UID - <UID>MEDLINE:24551397</UID> - and it has some dynamic data fields that indicate this UID could be a PMID, i.e.

  <dynamic_data>
    <citation_related>
      <tc_list>
        <silo_tc coll_id="MEDLINE" local_count="0"/>
      </tc_list>
    </citation_related>
    <cluster_related>
      <identifiers>
        <identifier type="eissn" value="1942-597X"/>
        <identifier type="pmid" value="MEDLINE:24551397"/>
      </identifiers>
    </cluster_related>
  </dynamic_data>

@dazza-codes dazza-codes self-assigned this Oct 4, 2017
@dazza-codes
Copy link
Contributor

dazza-codes commented Oct 4, 2017

Find a record with a PMID in the rails console:

> pub_ids = PublicationIdentifier.select('DISTINCT publication_id').where(identifier_type: 'PMID').limit(1).first
=> #<PublicationIdentifier:0x005605437947c8 id: nil, publication_id: 1>
> Publication.find(pub_ids.publication_id).pmid
=> 10000166

Assume this PMID value can be a MEDLINE-UID in the new WoS SOAP API, i.e. MEDLINE:{PMID}. Then on the wos-queries branch (PR #223), we can try to retrieve the record in the rails console.

wos_queries = WosQueries.new(WosClient.new(Settings.WOS.AUTH_CODE, :debug), 'MEDLINE')
records = wos_queries.retrieve_by_id('MEDLINE:10000166')

This constructs a query that includes:

    <woksearch:retrieveById>
      <databaseId>MEDLINE</databaseId>
      <uid>MEDLINE:10000166</uid>
      <queryLanguage>en</queryLanguage>

It failed to find that record. It doesn't matter if the default WOK database is used either. The response includes:

        <recordsFound>0</recordsFound>
        <recordsSearched>27482575</recordsSearched>

However, when I use the PMID/MEDLINE UID from the comment above, it works!

records = wos_queries.retrieve_by_id('MEDLINE:24551397')

The response includes the metadata and the record data too:

        <queryId>3</queryId>
        <recordsFound>1</recordsFound>
        <recordsSearched>27482575</recordsSearched>
        <optionValue>
          <label>RecordIDs</label>
          <value>MEDLINE:24551397</value>

Trying to sample a few PMID from the sul_pub prod-db to see if any can be found on the new SOAP-API (note that the iteration must include a sleep(1) to avoid hitting throttle errors):

pmids = PublicationIdentifier.where(identifier_type: 'PMID').limit(50).sample(10).map(&:identifier_value)
#=> ["10002407", "1001090", "10007847", "10009788", "10012482", "10014304", "10013432", "10013714", "10012537", "10014322"]
pmids_found = pmids.map do |pmid|
  sleep(1)
  [pmid, wos_queries.retrieve_by_id("MEDLINE:#{pmid}").count > 0 ]
end

The hit rate for that PMID search might be low, maybe 50%, e.g. a few runs of those queries:

[["10002407", false], ["1001090", true], ["10007847", false], ["10009788", false], 
["10012482", false], ["10014304", false], ["10013432", false], ["10013714", false],
["10012537", false], ["10014322", false]]
[["10021829", true], ["10021418", true], ["10010525", false], ["10021470", true],
["10014084", false], ["10018836", false], ["10013226", false], ["10029025", true],
["10019707", false], ["10022419", true]]
[["10021829", true], ["10021418", true], ["10010525", false], ["10021470", true], 
["10014084", false], ["10018836", false], ["10013226", false], ["10029025", true], 
["10019707", false], ["10022419", true]]

@dazza-codes
Copy link
Contributor

dazza-codes commented Oct 4, 2017

WRT to the DOI search, the WoS API doc (ver 3.0, July 7, 2015) contains:

  • p. 29, has Data Citation Index field DO=DOI
  • p. 33 has SciELO Citation Index with the same field
  • p. 34 has Web of Science Core Collection with the DO=DOI field
  • p. 53 has a couple of things with a DOI field and xpath:
    • Book Digital Object Identifier (DOI) .../dynamic_data/cluster_related/identifiers
    • Digital Object Identifier (DOI) .../dynamic_data/cluster_related/identifiers

Added a DOI search option to the WosQueries object to test this search option, e.g.

wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :debug);
wos_queries = WosQueries.new(wos_client);
dois = PublicationIdentifier.where(identifier_type: 'DOI').limit(200).sample(10).map(&:identifier_value).compact
dois_found = dois.map {|doi| sleep(1); [doi, wos_queries.search_by_doi(doi).count > 0 ] }

This seems to work pretty well. Some example runs:

[["10.5210/ojphi.v5i2.4696", true], ["10.1080/15228886.2011.623294", false], 
["10.1016/j_ijggc.2009.06.002", true], ["10.1130/G33807.1", true], 
["10.1002/aqc.2365", true], ["10.1038/jid.2013.167", true]]
 [["10.1021/ja310831m", true],
 ["10.1016/j.enggeo.2010.07.009", true],
 ["10.1016/j.eatbeh.2013.08.002", true],
 ["10.1111/j.1745-6584.2009.00640.x", true],
 ["10.4161/cib.24788", true],
 ["10.1029/2005TC001887", true],
 ["10.1029/2005JB004076", true],
 ["10.2113/gseegeosci.17.1.1", true],
 ["10.1007/s11552-013-9555-0", true]]

@dazza-codes
Copy link
Contributor

dazza-codes commented Oct 4, 2017

RE - Can we get PMIDs for WosIDs

First, let's retrieve a publication record by the WosID:

wos_id = '000070953800034'
PublicationIdentifier.where(identifier_type: 'WosItemID', identifier_value: wos_id)
wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :debug);
wos_queries = WosQueries.new(wos_client);
records = wos_queries.retrieve_by_id("WOS:#{wos_id}")
records.print # view the record XML
# that works

It seems the only REC data that contains additional identifiers is in

  • xpath: //dynamic_data/cluster_related/identifiers
  • e.g. as in comment above
<dynamic_data>
    <cluster_related>
      <identifiers>
        <identifier type="eissn" value="1942-597X"/>
        <identifier type="pmid" value="MEDLINE:24551397"/>
      </identifiers>
    </cluster_related>
  </dynamic_data>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants