Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfixes in http driver functions that utilize goquery #390

Merged
merged 11 commits into from
Oct 7, 2019
Merged

Bugfixes in http driver functions that utilize goquery #390

merged 11 commits into from
Oct 7, 2019

Conversation

gabriel-marinkovic
Copy link
Contributor

There were some bugs in the http driver's function that used goquery (ELEMENT_EXISTS, INNER_TEXT):

  • goquery's selection.Find() always returns a non-null selection object even when empty. selection.Length() must be used to check for contents.
  • goquery's selection.Closest() looks at selection's parents instead of children, which is the expected behaviour for ELEMENT_EXISTS(element, selector)

This bugfix might break a lot of people's code; for example, INNER_TEXT often returned an empty string when it should've returned an error instead.

@codecov
Copy link

codecov bot commented Sep 22, 2019

Codecov Report

Merging #390 into master will increase coverage by 0.1%.
The diff coverage is 0%.

@@           Coverage Diff            @@
##           master    #390     +/-   ##
========================================
+ Coverage    39.2%   39.2%   +0.1%     
========================================
  Files         218     218             
  Lines        8877    8877             
========================================
+ Hits         3477    3483      +6     
+ Misses       5043    5039      -4     
+ Partials      357     355      -2

@ziflex
Copy link
Member

ziflex commented Sep 28, 2019

Looks good! Just need to fix a failing e2e test.

Regarding breaking changes - we haven't gotten to 1.0, so we can break the things :)

@gabriel-marinkovic
Copy link
Contributor Author

This test fails.

As far as I can see, this test should be performed dynamically by using DOCUMENT(url, true) instead of DOCUMENT(url) which is currently in the test. When the document isn't dynamic, both conditions should be false.

The test used to find because ELEMENT_EXISTS() for http driver always returned true.

go.sum Outdated
@@ -1,6 +1,8 @@
github.com/Masterminds/glide v0.13.2/go.mod h1:STyF5vcenH/rUqTEv+/hBXlSTo7KYwg2oc2f4tzPWic=
github.com/Masterminds/semver v1.4.2/go.mod h1:MB6lktGJrhw8PrUyiEoblNEGEQ+RzHPF078ddwwvV3Y=
github.com/Masterminds/vcs v1.13.0/go.mod h1:N09YCmOQr6RLxC6UNHzuVwAdodYbbnycGHSmwVJjcKA=
github.com/MontFerret/ferret v0.9.0 h1:eRnwpp2VIh+3/u6x7kAG0tR8mr3/PBnkqM8Ag9uocZ8=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this came from your fork?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I removed this.

@@ -152,7 +152,7 @@ func (drv *Driver) Open(ctx context.Context, params drivers.Params) (drivers.HTM

defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusInternalServerError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add this extra condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also came from my local fork, because of some testing I was doing. I removed it.

Side note: an option to add custom "retry codes" (like what scrapy offers) would be good. Some sites return perfectly valid html but respond with status 500. I usually want to consider a request as failed only if it returned 200 or 202.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure, you can def think about it.

Copy link
Member

@ziflex ziflex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some additional questions regarding the PR.

@ziflex
Copy link
Member

ziflex commented Oct 7, 2019

Thank you for your time and effort!

@ziflex ziflex merged commit bb210fb into MontFerret:master Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants