Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpxHttpClient always follows redirects #1013

Closed
2tunnels opened this issue Feb 24, 2025 · 0 comments · Fixed by #1015
Closed

HttpxHttpClient always follows redirects #1013

2tunnels opened this issue Feb 24, 2025 · 0 comments · Fixed by #1015
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@2tunnels
Copy link
Contributor

I've encountered an issue with the redirection handling logic in HttpxHttpClient. Currently, there's no straightforward way to disable automatic redirect following when initializing HttpxHttpClient. Ideally, it should be possible to configure it like this:

crawler = HttpCrawler(http_client=HttpxHttpClient(follow_redirects=False))

While the follow_redirects argument is correctly passed to the underlying AsyncClient, HttpxHttpClient overrides this behavior in the crawl method by explicitly setting follow_redirects=True:

response = await client.send(http_request, follow_redirects=True)

I assume this was done because AsyncClient defaults to follow_redirects=False, ensuring that HTTP-based crawlers follow redirects by default. However, would it be preferable to handle this differently? For example adding a default follow_redirects argument in kwargs in _get_client method?

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant