HttpxHttpClient always follows redirects #1013

2tunnels · 2025-02-24T12:42:11Z

I've encountered an issue with the redirection handling logic in HttpxHttpClient. Currently, there's no straightforward way to disable automatic redirect following when initializing HttpxHttpClient. Ideally, it should be possible to configure it like this:

crawler = HttpCrawler(http_client=HttpxHttpClient(follow_redirects=False))

While the follow_redirects argument is correctly passed to the underlying AsyncClient, HttpxHttpClient overrides this behavior in the crawl method by explicitly setting follow_redirects=True:

crawlee-python/src/crawlee/http_clients/_httpx.py

Line 168 in 179ec93

response = await client.send(http_request, follow_redirects=True)

I assume this was done because AsyncClient defaults to follow_redirects=False, ensuring that HTTP-based crawlers follow redirects by default. However, would it be preferable to handle this differently? For example adding a default follow_redirects argument in kwargs in _get_client method?

The text was updated successfully, but these errors were encountered:

github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 24, 2025

This was referenced Feb 24, 2025

fix: Remove follow_redirects override in HttpxHttpClient #1015

Merged

CurlImpersonateHttpClient always follows redirects #1016

Closed

Pijukatel closed this as completed in #1015 Feb 26, 2025

Pijukatel closed this as completed in 88afda3 Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HttpxHttpClient always follows redirects #1013

HttpxHttpClient always follows redirects #1013

2tunnels commented Feb 24, 2025

HttpxHttpClient always follows redirects #1013

HttpxHttpClient always follows redirects #1013

Comments

2tunnels commented Feb 24, 2025