Implement prioritized certificate fetching #391

qfel · 2023-02-13T20:40:12Z

Provides the interface requested in #334.

This allows all certificate requests to be assigned a priority. The SecretManager will issue a configurable number of concurrent requests in the order of priority and request time (priority always wins). In particular, high-priority calls inserted later can skip lower-priority calls in the queue.

Certificate refreshes are queued at the lowest (Background) priority.

The new functionality is not used yet - just available as a new method in SecretManager (and its implementation used to back the existing fetch_certificates call). For code simplicity, SecretManager was changed to have no parallelism (but allow configurable concurrency). The code isn't as computationally heavy to require multiple cores and most of the time should be spent waiting on certificate fetches. This in particular lets us reduce synchronization and data ownership issues.

Unit tests proved tricky - they rely on tokio's time control facilities. Those are pretty hard to get right, and pretty hard to debug. Ideally tokio would evolve to at least make this type of testing more debuggable (eg. panic on auto-advance for specific blocks of code). We may decide maintaining those tests is too much burden, but at least for the moment they let us test code without having to add test-only instrumentation.

briansonnenberg

Looks great overall. Thank you for cleaning up my novice Rust code as well. :)

briansonnenberg · 2023-02-13T23:53:17Z

src/identity/manager.rs

+                    None => break 'main,
+                },
+                Some(res) = workers.next() => match res {
+                    Err(_) => break 'main,


Does breaking out of main here mean that if a particular worker request fails, even for non-critical reasons, the client will no longer process requests? Does it make sense to avoid that and have some kind of backoff retry / error handling?

Workers are refresh(...) calls, which can return only watch::error::SendError, which cannot really happen because the recv end of the channel is kept alive via self. Figured no reason to panic.

Though I just realized that the other break cannot happen because the send end of requests is also referenced via self. So this code leaks the background worker - which AFAIR is not a regression, if you want I can submit as-is and fix it later.

qfel

Cannot reproduce test failures with cargo clippy --benches --tests --bins. What am I missing?

BTW you probably want to add --no-deps flag in the test.1

qfel · 2023-02-14T05:58:27Z

src/identity/manager.rs

+                    None => break 'main,
+                },
+                Some(res) = workers.next() => match res {
+                    Err(_) => break 'main,


Workers are refresh(...) calls, which can return only watch::error::SendError, which cannot really happen because the recv end of the channel is kept alive via self. Figured no reason to panic.

Though I just realized that the other break cannot happen because the send end of requests is also referenced via self. So this code leaks the background worker - which AFAIR is not a regression, if you want I can submit as-is and fix it later.

ymesika · 2023-02-14T12:48:40Z

src/identity/caclient.rs

+    #[derive(Clone)]
+    pub struct CaClient {
+        cfg: ClientConfig,
+        state: Arc<RwLock<ClientState>>,


What's the benefit of having a state struct holding the fetches field instead of defining it here with type Arc<RwLock<Vec<Identity>>>?

At some point I was thinking of putting some more fields there but then abandoned the idea. Can replace ClientState with Vec if anybody cares.

I don't mind it like this + its mock code so not too concerned

ymesika · 2023-02-14T12:52:27Z

src/time.rs

+impl Converter {
+    pub fn new() -> Self {
+        Self {
+            sys_now: SystemTime::now(),


Isn't this nailing the now time to the time the struct instance was created? And any function call (even delayed ones) will refer to this time instead of now time?

The now and sys_now values can be arbitrary, they only have o represent the ~same moment in time so that we can convert. One of those is always subtracted and the other added, so they always cancel each other.

howardjohn · 2023-02-15T16:10:32Z

src/identity/caclient.rs

+    #[derive(Clone)]
+    pub struct CaClient {
+        cfg: ClientConfig,
+        state: Arc<RwLock<ClientState>>,


I don't mind it like this + its mock code so not too concerned

This uses a single task for all certificates for simplicity. As the actual work should be mostly IO-bound, I don't expect the lack of true parallelism to be an issue.

qfel · 2023-02-15T23:42:08Z

/retest

Fixed background worker termination, cleaned up code a bit (mostly naming, very little structure changed).

qfel · 2023-02-16T01:13:02Z

/retest

qfel · 2023-02-16T03:13:13Z

/retest

qfel · 2023-02-16T03:28:51Z

/retest

Replace unwrap_or_else(panic) calls with expect

32862f2

istio-testing added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 13, 2023

qfel force-pushed the pri branch from 7d55a2b to f4b75dc Compare February 13, 2023 21:27

briansonnenberg approved these changes Feb 14, 2023

View reviewed changes

qfel commented Feb 14, 2023

View reviewed changes

ymesika reviewed Feb 14, 2023

View reviewed changes

howardjohn approved these changes Feb 15, 2023

View reviewed changes

Implement priority-based certificate refresh

bb25699

This uses a single task for all certificates for simplicity. As the actual work should be mostly IO-bound, I don't expect the lack of true parallelism to be an issue.

qfel force-pushed the pri branch from f4b75dc to bb25699 Compare February 15, 2023 23:40

Address clippy issues, mainly use checked time math

3d85173

qfel force-pushed the pri branch from 2a28ae0 to 3d85173 Compare February 16, 2023 03:12

Apply requested patch, make gen does nothing locally

1f871cc

istio-testing merged commit a849147 into istio:master Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement prioritized certificate fetching #391

Implement prioritized certificate fetching #391

qfel commented Feb 13, 2023

briansonnenberg left a comment

briansonnenberg Feb 13, 2023

qfel Feb 14, 2023

qfel left a comment

qfel Feb 14, 2023

ymesika Feb 14, 2023

qfel Feb 14, 2023

howardjohn Feb 15, 2023

ymesika Feb 14, 2023

qfel Feb 14, 2023

howardjohn Feb 15, 2023

qfel commented Feb 15, 2023

qfel commented Feb 16, 2023

qfel commented Feb 16, 2023

qfel commented Feb 16, 2023

Implement prioritized certificate fetching #391

Implement prioritized certificate fetching #391

Conversation

qfel commented Feb 13, 2023

briansonnenberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qfel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qfel commented Feb 15, 2023

qfel commented Feb 16, 2023

qfel commented Feb 16, 2023

qfel commented Feb 16, 2023