-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo the chronos bounce code to be easier to read and actually clean up old jobs #7
Conversation
max_expected=1, | ||
) | ||
matching_jobs = chronos_tools.lookup_chronos_jobs( | ||
all_existing_jobs = chronos_tools.lookup_chronos_jobs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you instead use chronos_tools.match_job_names_to_service_instance
here so we don't have this regex stuff?
Sorry, I should've raised this in the first review. Can you explain the decision of doing the cleanup here vs doing it in cleanup_chronos_jobs? My first thoughts are that it belongs in cleanup_chronos_jobs given that it was written to do exactly this - remove things from chronos that we don't want in there. Adding this complexity to every bounce seems odd. You've probably thought about this though, so curious to hear what you think? |
client.update(job) | ||
|
||
|
||
def delete_job(client, job): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this, disable_job
and create_job
be in chronos_tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that
Yes, remember that cleanup_chronos_jobs is analogous to cleanup_marathon_jobs, it cleans up jobs for services that no longer exist, that don't have the help of of any other script. But that isn't enough for marathon because we can't really leave the old version of service left behind while the new one is running, the old one needs to be deleted and drained "in band", so that is why the marathon bounce cleans up apps. Chronos is the same thing. We need to clean up the jobs asap, to prevent them from potentially overlapping, or in the future, make sure the old job is cleaned up before the new one is in place (upthendown?). In general we need the same kind of "in-band" hook to be able to clean up jobs at the right moment, and not wait for a separate cleanup script. We still need cleanup_chronos_jobs, to clean up the "rouge" jobs that are left behind from de-provisioned services, or just some goofball (me) manually adding bogus jobs to the web interface. |
shipit |
I'll ship this tomorrow |
🚢 |
This is a followup on https://reviewboard.yelpcorp.com/r/120781/
This is becoming more relevant as the list of old disabled jobs continues to build on our chronos installations.