-
-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfixes in http driver functions that utilize goquery #390
Conversation
Codecov Report
@@ Coverage Diff @@
## master #390 +/- ##
========================================
+ Coverage 39.2% 39.2% +0.1%
========================================
Files 218 218
Lines 8877 8877
========================================
+ Hits 3477 3483 +6
+ Misses 5043 5039 -4
+ Partials 357 355 -2 |
Looks good! Just need to fix a failing e2e test. Regarding breaking changes - we haven't gotten to 1.0, so we can break the things :) |
This test fails. As far as I can see, this test should be performed dynamically by using The test used to find because |
go.sum
Outdated
@@ -1,6 +1,8 @@ | |||
github.com/Masterminds/glide v0.13.2/go.mod h1:STyF5vcenH/rUqTEv+/hBXlSTo7KYwg2oc2f4tzPWic= | |||
github.com/Masterminds/semver v1.4.2/go.mod h1:MB6lktGJrhw8PrUyiEoblNEGEQ+RzHPF078ddwwvV3Y= | |||
github.com/Masterminds/vcs v1.13.0/go.mod h1:N09YCmOQr6RLxC6UNHzuVwAdodYbbnycGHSmwVJjcKA= | |||
github.com/MontFerret/ferret v0.9.0 h1:eRnwpp2VIh+3/u6x7kAG0tR8mr3/PBnkqM8Ag9uocZ8= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this came from your fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, I removed this.
pkg/drivers/http/driver.go
Outdated
@@ -152,7 +152,7 @@ func (drv *Driver) Open(ctx context.Context, params drivers.Params) (drivers.HTM | |||
|
|||
defer resp.Body.Close() | |||
|
|||
if resp.StatusCode != http.StatusOK { | |||
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusInternalServerError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add this extra condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also came from my local fork, because of some testing I was doing. I removed it.
Side note: an option to add custom "retry codes" (like what scrapy offers) would be good. Some sites return perfectly valid html but respond with status 500. I usually want to consider a request as failed only if it returned 200 or 202.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure, you can def think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some additional questions regarding the PR.
Thank you for your time and effort! |
There were some bugs in the http driver's function that used goquery (
ELEMENT_EXISTS
,INNER_TEXT
):selection.Find()
always returns a non-null selection object even when empty.selection.Length()
must be used to check for contents.selection.Closest()
looks at selection's parents instead of children, which is the expected behaviour forELEMENT_EXISTS(element, selector)
This bugfix might break a lot of people's code; for example,
INNER_TEXT
often returned an empty string when it should've returned an error instead.