-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement prioritized certificate fetching #391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall. Thank you for cleaning up my novice Rust code as well. :)
None => break 'main, | ||
}, | ||
Some(res) = workers.next() => match res { | ||
Err(_) => break 'main, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does breaking out of main here mean that if a particular worker request fails, even for non-critical reasons, the client will no longer process requests? Does it make sense to avoid that and have some kind of backoff retry / error handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workers are refresh(...) calls, which can return only watch::error::SendError, which cannot really happen because the recv end of the channel is kept alive via self. Figured no reason to panic.
Though I just realized that the other break cannot happen because the send end of requests is also referenced via self. So this code leaks the background worker - which AFAIR is not a regression, if you want I can submit as-is and fix it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot reproduce test failures with cargo clippy --benches --tests --bins
. What am I missing?
BTW you probably want to add --no-deps flag in the test.1
None => break 'main, | ||
}, | ||
Some(res) = workers.next() => match res { | ||
Err(_) => break 'main, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workers are refresh(...) calls, which can return only watch::error::SendError, which cannot really happen because the recv end of the channel is kept alive via self. Figured no reason to panic.
Though I just realized that the other break cannot happen because the send end of requests is also referenced via self. So this code leaks the background worker - which AFAIR is not a regression, if you want I can submit as-is and fix it later.
#[derive(Clone)] | ||
pub struct CaClient { | ||
cfg: ClientConfig, | ||
state: Arc<RwLock<ClientState>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of having a state struct holding the fetches
field instead of defining it here with type Arc<RwLock<Vec<Identity>>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I was thinking of putting some more fields there but then abandoned the idea. Can replace ClientState with Vec if anybody cares.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind it like this + its mock code so not too concerned
impl Converter { | ||
pub fn new() -> Self { | ||
Self { | ||
sys_now: SystemTime::now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this nailing the now time to the time the struct instance was created? And any function call (even delayed ones) will refer to this time instead of now time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The now and sys_now values can be arbitrary, they only have o represent the ~same moment in time so that we can convert. One of those is always subtracted and the other added, so they always cancel each other.
#[derive(Clone)] | ||
pub struct CaClient { | ||
cfg: ClientConfig, | ||
state: Arc<RwLock<ClientState>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind it like this + its mock code so not too concerned
This uses a single task for all certificates for simplicity. As the actual work should be mostly IO-bound, I don't expect the lack of true parallelism to be an issue.
/retest Fixed background worker termination, cleaned up code a bit (mostly naming, very little structure changed). |
/retest |
/retest |
/retest |
Provides the interface requested in #334.
This allows all certificate requests to be assigned a priority. The SecretManager will issue a configurable number of concurrent requests in the order of priority and request time (priority always wins). In particular, high-priority calls inserted later can skip lower-priority calls in the queue.
Certificate refreshes are queued at the lowest (Background) priority.
The new functionality is not used yet - just available as a new method in SecretManager (and its implementation used to back the existing fetch_certificates call). For code simplicity, SecretManager was changed to have no parallelism (but allow configurable concurrency). The code isn't as computationally heavy to require multiple cores and most of the time should be spent waiting on certificate fetches. This in particular lets us reduce synchronization and data ownership issues.
Unit tests proved tricky - they rely on tokio's time control facilities. Those are pretty hard to get right, and pretty hard to debug. Ideally tokio would evolve to at least make this type of testing more debuggable (eg. panic on auto-advance for specific blocks of code). We may decide maintaining those tests is too much burden, but at least for the moment they let us test code without having to add test-only instrumentation.