-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NuGet efficiency #8679
NuGet efficiency #8679
Conversation
…into dev/rybrande/NugetJsonParse
…into dev/rybrande/NugetJsonParse
…into dev/rybrande/NugetJsonParseRebased
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
The only thing I would recommend double-checking is the use of Hash#[]
vs. Hash#fetch()
without a default, probably mainly in the response-parsing code in Dependabot::Nuget::NugetClient
. fetch()
without a default will raise if the field is omitted, so we just want to make we always expect those fields to exist.
@bdragon I double-checked the NugetClient usage of |
In general they will do the same thing, but there is a subtle difference. When you create a hash, you have the option to provide a block that will be used to specify the return value when a caller tries to access a key that doesn't exist with irb(main):001:0> a = {}
=> {}
irb(main):002:0> a[:missing]
=> nil
irb(main):003:0> b = Hash.new { |h, k| "unknown key '#{k}'" }
=> {}
irb(main):004:0> b[:missing]
=> "unknown key 'missing'" The default behavior for When a key might be missing The main thing I look out for is |
Ah, I'm glad I asked! So it sounds like if we were taking Hash's from outside callers (as a library for usage by other projects) we might use |
@@ -62,7 +64,9 @@ def build_url_for_details(repo_details) | |||
body = remove_wrapping_zero_width_chars(response.body) | |||
base_url = base_url_from_v3_metadata(JSON.parse(body)) | |||
resolved_base_url = base_url || repo_details.fetch(:url).gsub("/index.json", "-flatcontainer") | |||
search_url = search_url_from_v3_metadata(JSON.parse(body)) | |||
parsed_json = JSON.parse(body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this up so we can use it when getting the base_url from the response body.
@@ -54,6 +55,7 @@ def find_dependency_urls | |||
end.compact.uniq | |||
end | |||
|
|||
# rubocop:disable Metrics/AbcSize | |||
def build_url_for_details(repo_details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but I think this method would is a good candidate for some cleanup. Doing a bit more than building a url. Would we want to encapsulate parsing the repo metadata response somewhere else?
…into dev/rybrande/NugetJsonParseRebased
Is this ready to deploy (once up-to-date with the base branch)? |
I believe so. |
Looks like there was some unreliability in the smoke tests? It passed on re-run and doesn't seem related to my changes. |
I've deployed the updater with this change so I'm going to merge this PR. I'll cut a release of the dependabot gems on Thursday and it will include this change. |
I saw some problems related to timeouts when running Dependabot for NuGet packages. It seems that we were using the query http API in places where we already knew exactly which package it was we wanted. The query API can be quite slow on some nuget registry implementations which can exceed the 20 second timeout for that call. Rather than extend the timeout and tax NuGet registries I'm trying to use the Registration API which returns faster because it already knows which package it's working against. This has the downside that it returns a lot more JSON that we need to parse, but given the slowness of the query API I've found it to be a significant savings.
While working through this I found some other performance problems which might affect particularly large or nested repositories, such as the index call for a NuGet repo being made repeatedly even though it doesn't vary, and instances where caching results can keep us from walking transitive dependencies multiple times.