Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix xPath scrapper blocked by some site with WAF #346

Closed
wants to merge 1 commit into from

Conversation

hiddenpants255
Copy link
Contributor

Some site has WAF enabled and the default user-agent string of Go http client has been blocked by those firewalls.

The antchfx/htmlquery package that we're using for xPath scraper has no way to specify the user-agent string to replace the default, I have implemented local loadURL() function with the same signature and set the user-agent to "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36" to bypass the firewall rules.

@WithoutPants WithoutPants changed the title Fix xPath scrapper blocked by some site with WAF [WIP] Fix xPath scrapper blocked by some site with WAF Feb 4, 2020
@WithoutPants
Copy link
Collaborator

Marking this as WIP until I can make the change to make the User agent string configurable.

@WithoutPants WithoutPants self-assigned this Feb 4, 2020
@WithoutPants WithoutPants added this to the Version 0.2.0 milestone Feb 4, 2020
@Leopere
Copy link
Collaborator

Leopere commented Feb 11, 2020

An exhaustive list of user-agent strings https://developers.whatismybrowser.com/useragents/explore/

@bnkai bnkai added the improvement Something needed tweaking. label Feb 29, 2020
@WithoutPants
Copy link
Collaborator

Opened #409 instead of pushing directly to @hiddenpants255 develop branch. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Something needed tweaking.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants