-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow cache TTL override for --find-links urls #8109
Comments
Hi @cjerdonek, would you be able to take a look at this? Saw you were the last person who made related changes. |
Take a look at #8042, seems like it can solve this problem as well |
Hi @NoahGorny, thanks for the pointer! This use case is a bit different from #8042 which sets the header in the outgoing requests to server, whereas this is more about how to interpreter/change the header in the response received from server.
The goal is here to manipulate to 'max-age' to force pip to cache the response, regardless what 'max-age' is in the actual server response. Edit: clarify where the change should be. |
ok looks like pip is using the request header to determine whether the response should be cached. #8009 index dafe55c..f0066a7 100644
--- a/src/pip/_vendor/cachecontrol/controller.py
+++ b/src/pip/_vendor/cachecontrol/controller.py
@@ -84,7 +84,9 @@ class CacheController(object):
retval = {}
+ cc_headers = 'max-age=8'
for cc_directive in cc_headers.split(","):
if not cc_directive.strip():
continue Happened to work because it will change the parsing for both request and response. |
@wisechengyi I think that allowing custom headers for the request will solve it, but only if the |
pip/src/pip/_internal/index/collector.py Line 166 in 327a315
Spoke too soon.
On top of #8078
I wonder if server response also requires max-age. |
Yes, and they mean different things. The request’s |
IIUC the response header combined with #7729 (which is available in 20.1b1) should reduce the request to one per pip invocation. |
Thanks, @uranusjr! That's more clear, and luckily pip already is able to support the caching mechanism if configured correctly. Having experimented with the code, to summarize it in a more understandable way:
Hi @amancevice, do you think it would be reasonable to incorporate the below change into #8078? I think it should still comply with the philosophy of the change, i.e. default header should be overwritten by the one specified on CLI.
|
Sounds good to me, @wisechengyi! But would you be opposed to a slightly different logic using the diff --git a/src/pip/_internal/index/collector.py b/src/pip/_internal/index/collector.py
index e2c800c2..2c3ba917 100644
--- a/src/pip/_internal/index/collector.py
+++ b/src/pip/_internal/index/collector.py
@@ -163,7 +163,9 @@ def _get_html_response(url, session):
# trip for the conditional GET now instead of only
# once per 10 minutes.
# For more information, please see pypa/pip#5670.
- "Cache-Control": "max-age=0",
+ # However if we want to override Cache-Control, e.g. via CLI,
+ # we can still do so.
+ "Cache-Control": session.headers.get('Cache-Control', 'max-age=0'),
},
)
resp.raise_for_status() |
That looks much prettier. Thanks, @amancevice ! |
@wisechengyi you should make sure that the above change will not change default behaviour (without specifying max-age) |
I wrote a test for this last night that @wisechengyi can grab. (sorry for the very large diff, not sure if this is the best way to share it) diff --git a/src/pip/_internal/index/collector.py b/src/pip/_internal/index/collector.py
index e2c800c2..2c3ba917 100644
--- a/src/pip/_internal/index/collector.py
+++ b/src/pip/_internal/index/collector.py
@@ -163,7 +163,9 @@ def _get_html_response(url, session):
# trip for the conditional GET now instead of only
# once per 10 minutes.
# For more information, please see pypa/pip#5670.
- "Cache-Control": "max-age=0",
+ # However if we want to override Cache-Control, e.g. via CLI,
+ # we can still do so.
+ "Cache-Control": session.headers.get('Cache-Control', 'max-age=0'),
},
)
resp.raise_for_status()
diff --git a/tests/unit/test_collector.py b/tests/unit/test_collector.py
index cfc2af1c..a8ae51aa 100644
--- a/tests/unit/test_collector.py
+++ b/tests/unit/test_collector.py
@@ -60,6 +60,7 @@ def test_get_html_response_archive_to_http_scheme(url, content_type):
if the scheme supports it, and raise `_NotHTML` if the response isn't HTML.
"""
session = mock.Mock(PipSession)
+ session.headers = {}
session.head.return_value = mock.Mock(**{
"request.method": "HEAD",
"headers": {"Content-Type": content_type},
@@ -87,6 +88,7 @@ def test_get_html_response_archive_to_http_scheme_is_html(url):
request is responded with text/html.
"""
session = mock.Mock(PipSession)
+ session.headers = {}
session.head.return_value = mock.Mock(**{
"request.method": "HEAD",
"headers": {"Content-Type": "text/html"},
@@ -120,6 +122,7 @@ def test_get_html_response_no_head(url):
look like an archive, only the GET request that retrieves data.
"""
session = mock.Mock(PipSession)
+ session.headers = {}
# Mock the headers dict to ensure it is accessed.
session.get.return_value = mock.Mock(headers=mock.Mock(**{
@@ -145,6 +148,7 @@ def test_get_html_response_dont_log_clear_text_password(caplog):
in its DEBUG log message.
"""
session = mock.Mock(PipSession)
+ session.headers = {}
# Mock the headers dict to ensure it is accessed.
session.get.return_value = mock.Mock(headers=mock.Mock(**{
@@ -167,6 +171,57 @@ def test_get_html_response_dont_log_clear_text_password(caplog):
]
+@pytest.mark.parametrize(
+ ("url", "headers", "cache_control"),
+ [
+ (
+ "http://python.org/python-3.7.1.zip",
+ {},
+ "max-age=0",
+ ),
+ (
+ "https://pypi.org/pip-18.0.tar.gz",
+ {},
+ "max-age=0",
+ ),
+ (
+ "http://python.org/python-3.7.1.zip",
+ {"Cache-Control": "max-age=1"},
+ "max-age=1",
+ ),
+ (
+ "https://pypi.org/pip-18.0.tar.gz",
+ {"Cache-Control": "max-age=1"},
+ "max-age=1",
+ ),
+ ],
+)
+def test_get_html_response_override_cache_control(url, headers, cache_control):
+ """
+ `_get_html_response()` should use the session's default value for the
+ Cache-Control header if provided.
+ """
+ session = mock.Mock(PipSession)
+ session.headers = headers
+ session.head.return_value = mock.Mock(**{
+ "request.method": "HEAD",
+ "headers": {"Content-Type": "text/html"},
+ })
+ session.get.return_value = mock.Mock(headers={"Content-Type": "text/html"})
+
+ resp = _get_html_response(url, session=session)
+
+ assert resp is not None
+ assert session.mock_calls == [
+ mock.call.head(url, allow_redirects=True),
+ mock.call.head().raise_for_status(),
+ mock.call.get(url, headers={
+ "Accept": "text/html", "Cache-Control": cache_control,
+ }),
+ mock.call.get().raise_for_status(),
+ ]
+
+
@pytest.mark.parametrize(
("html", "url", "expected"),
[
@@ -416,6 +471,7 @@ def test_request_http_error(caplog):
caplog.set_level(logging.DEBUG)
link = Link('http://localhost')
session = Mock(PipSession)
+ session.headers = {}
session.get.return_value = resp = Mock()
resp.raise_for_status.side_effect = requests.HTTPError('Http error')
assert _get_html_page(link, session=session) is None
@@ -429,6 +485,7 @@ def test_request_retries(caplog):
caplog.set_level(logging.DEBUG)
link = Link('http://localhost')
session = Mock(PipSession)
+ session.headers = {}
session.get.side_effect = requests.exceptions.RetryError('Retry error')
assert _get_html_page(link, session=session) is None
assert (
@@ -501,6 +558,7 @@ def test_get_html_page_directory_append_index(tmpdir):
expected_url = "{}/index.html".format(dir_url.rstrip("/"))
session = mock.Mock(PipSession)
+ session.headers = {}
fake_response = make_fake_html_response(expected_url)
mock_func = mock.patch("pip._internal.index.collector._get_html_response")
with mock_func as mock_func: |
@amancevice looks reasonable to me. thanks! it might be a good idea to put it on #8078, so if folks have further feedback, they can use the PR for review. |
Hi @NoahGorny the tests Alexandar added should do there. If you would be able to review #8078, that'll be appreciated. |
What's the problem this feature will solve?
Our wheels (thousands of them) are hosted in a flat directory, e.g. https://myhost.com/wheels/ has
With
--no-index --find-links https://myhost.com/wheels/
, pip will query it for every artifact it needs transitively. However, when large number of builds are running concurrently with pip, the wheel server can be overwhelmed.Describe the solution you'd like
We'd like some mechanism to force the cache TTL for the index page. Something to the effect of:
In this case I was hardcoding the cache TTL to be 8 seconds.
It can be plumbed via an option, e.g. adding
--force-index-cache-ttl=<some seconds>
to below:Please kindly let me know if the concept is acceptable or if there's a better approach to this.
Thanks!
The text was updated successfully, but these errors were encountered: