-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web of Science queries #223
Conversation
b89be30
to
f35ade1
Compare
lib/wos_queries.rb
Outdated
class WosQueries | ||
|
||
# Default database 'WOK' is an umbrella for everything | ||
DATABASE = 'WOK'.freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need a constant here, just put the string as the default in initialize
. That makes it part of the YARD.
lib/wos_queries.rb
Outdated
attr_reader :wos_client | ||
|
||
# @param wos_client [WosClient] a Web Of Science client | ||
# @param database [String] a WOS database identifier (default 'WOS') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default now WOK
e3bd992
to
d8876a1
Compare
I'm stuck trying to work around what seems to be a bug in Nokogiri, reported at |
3678a3b
to
2398dbd
Compare
I've worked around the Nokogiri issues related to immutability, by using the |
ecb83f6
to
ee4a4d6
Compare
3161344
to
a9d8375
Compare
lib/wos_queries.rb
Outdated
# @return [Hash] citingArticles parameters | ||
def citing_articles_params(uid) | ||
{ | ||
databaseId: @database, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should have an attr_reader :database
and use it in these methods instead of referencing the instance variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
# @param uids [Array<String>] a list of WOS UIDs | ||
# @return [Hash] retrieveById parameters | ||
def retrieve_by_id_params(uids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ton of repetition here in the params
methods. Would it make sense to just have a base_params
method (that looks exactly like the body of this one) and then invoke:
base_params.merge(different_key: different_val)
in the places where individual key/val pairs need to be added or overridden? This would put the focus on what is unique to each set of params rather than repeating all the things that are in common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look closely at it, there are slight differences in each set of params. Given that we know the request XML is order-sensitive (right? why? weird, if so, but yea ...), doesn't this make it difficult to use a Hash.merge strategy with any confidence? I think we should be glad that savon allows us to use Hash params. These methods have already been refactored a bit to allow for some existing use-case flexibility; we might discover more as we go. I'm open to exploring whether or not a Hash.merge strategy might work and will try it out soon. If it's not a blocker right now, maybe we can move on and adapt as we go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I acknowledged those slight differences in my comment. My intent is to have a common starting point, then merge/reject from there. This would make it obvious what each of those differences is, I.E., what is special about this method.
The ordered XML thing does not apply since we are still giving the Savon gem a Hash. It figures out the order now, and it would figure out the order the same way, since we would be passing it exactly the same param. This is just about how we build the hash and how much we repeat ourselves. In general, this codebase has suffered from lack of facility manipulating ruby Hash, so I would like to start repairing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take another look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and a couple requests.
spec/lib/wos_records_spec.rb
Outdated
describe '#to_xml' do | ||
it 'works with encoded records' do | ||
result = wos_records_encoded.to_xml | ||
expect(result).to be_an String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a stronger expectation here than just "gimme a String"? Since we are dealing with encoding, it is sorta important that not all strings are equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an XSD validation for the record data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least expect it not to contain <
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think full XSD is required. Just look for something that you expect the encoding to adjust (or not adjust) and target it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
spec/lib/wos_records_spec.rb
Outdated
describe '#decode_records' do | ||
it 'works' do | ||
result = wos_records_encoded.send(:decode_records) | ||
expect(result).to be_an String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
unless uid_intersection.empty? | ||
# create a new set of records, use a philosophy of immutability | ||
# Nokogiri::XML::NodeSet enumerable methods do not return new objects | ||
# Nokogiri::XML::Document.dup is a deep copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the hangup with Nokogiri is unsurprising and doesn't require a bunch of commentary. If you want to build a nodeset from existing nodes without retaining any connection back to the parent document (which all nodes have), then you can round-trip the node through serialization .to_xml
and back into a new Nokogiri XML node/fragment. This would reduce the memory utilization by not copying the whole document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it is not a bug (in Nokogiri).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss this in person, if necessary, as I like it as it is. I hope it's not a blocker. The round trip via .to_xml
will not gain anything, as the doc only contains records anyway. The explanation fills a gap in the Nokogiri documentation and I believe will serve as a note to prevent later refactoring that could easily mistakenly introduce violations of the immutability desired (which should be caught by specs, but might not be understood without this commentary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The round trip is what divorces it from the parent document (since it becomes just a String value, with no ability to point to location of the originating document in Ruby memory). The separation is the basis of your motivation to .dup
. I agree this may benefit from better docs, but it is not unexpected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get how the .to_xml
works. What I'm saying is that the doc just contains REC elements, so there's nothing else duped by the doc.dup
anyway, so there's not much gained from all the extra serialization transforms involved in the .to_xml
; it will just slow things down and that might be worse that a faster doc.dup
when collecting 100's of records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there may be a computational tradeoff for the benefit of smaller memory footprint. We can defer on this one for later.
|
||
# @param name [String] a CSV name pattern: {last name}, {first_name} [{middle_name} | {middle initial}] | ||
# @return [WosRecords] | ||
def search_by_name(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have these 5 public methods that are effectively aliases for private ones? Put the body of the private methods here and remove the "collator" methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to move some stuff where this is done in #242 already, into this PR. This pattern of using a collator emerged from doing one request first and then another and another and then noticing that the collators themselves could be refactored, until they became so small that they, well, don't need to exist any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/wos_records.rb
Outdated
# @param record_setB [WosRecords] | ||
# @return [Set] duplicate WoS UID values | ||
def duplicate_uids(record_setB) | ||
uids.to_set.intersection(record_setB.uids.to_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array
already has an intersection operator in Ruby, the &
. So without the need to convert to_set
, this should be uids & record_setB.uids
. Intersection also deduplicates multiples and preserves order!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll try that, good tip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this works for me:
> x = [*1..10]
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
> y = [*1..10].reject {|i| i % 2 == 0 }.reverse
=> [9, 7, 5, 3, 1]
> x & y
=> [1, 3, 5, 7, 9]
> x.reverse & y
=> [9, 7, 5, 3, 1]
> x.reverse & y.reverse
=> [9, 7, 5, 3, 1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Review changes pushed in separate commits to clearly identify how each commit addressed the review concerns. |
Most of the existing review concerns are fixed, let's review again.
spec/lib/wos_records_spec.rb
Outdated
describe 'Enumerable mixin' do | ||
it 'responds to Enumerable.instance_methods' do | ||
methods = Enumerable.instance_methods | ||
methods.each { |method| expect(wos_records_encoded).to respond_to(method) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to enumerate methods individually, because that is how Ruby inheritance works. Please replace this spec with described_class.is_a?(Enumerable)
or consider removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thanks for the responses. The Would appreciate attention to one outstanding item re: |
dbae5f2
to
b2702a7
Compare
Fix #197
Provides query functionality for all the new WoS SOAP API endpoints
TODO:
Related issues:
Integration test
Use
bundle exec rails c
and, when you have the WoS credentials in place, run