Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo the chronos bounce code to be easier to read and actually clean up old jobs #7

Merged
merged 5 commits into from
Nov 2, 2015

Conversation

solarkennedy
Copy link
Contributor

This is a followup on https://reviewboard.yelpcorp.com/r/120781/

This is becoming more relevant as the list of old disabled jobs continues to build on our chronos installations.

max_expected=1,
)
matching_jobs = chronos_tools.lookup_chronos_jobs(
all_existing_jobs = chronos_tools.lookup_chronos_jobs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you instead use chronos_tools.match_job_names_to_service_instance here so we don't have this regex stuff?

@Rob-Johnson
Copy link
Contributor

Sorry, I should've raised this in the first review.

Can you explain the decision of doing the cleanup here vs doing it in cleanup_chronos_jobs? My first thoughts are that it belongs in cleanup_chronos_jobs given that it was written to do exactly this - remove things from chronos that we don't want in there. Adding this complexity to every bounce seems odd. You've probably thought about this though, so curious to hear what you think?

client.update(job)


def delete_job(client, job):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this, disable_job and create_job be in chronos_tools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that

@solarkennedy
Copy link
Contributor Author

Yes, remember that cleanup_chronos_jobs is analogous to cleanup_marathon_jobs, it cleans up jobs for services that no longer exist, that don't have the help of of any other script.

But that isn't enough for marathon because we can't really leave the old version of service left behind while the new one is running, the old one needs to be deleted and drained "in band", so that is why the marathon bounce cleans up apps.

Chronos is the same thing. We need to clean up the jobs asap, to prevent them from potentially overlapping, or in the future, make sure the old job is cleaned up before the new one is in place (upthendown?). In general we need the same kind of "in-band" hook to be able to clean up jobs at the right moment, and not wait for a separate cleanup script.

We still need cleanup_chronos_jobs, to clean up the "rouge" jobs that are left behind from de-provisioned services, or just some goofball (me) manually adding bogus jobs to the web interface.

@mrtyler
Copy link
Contributor

mrtyler commented Oct 29, 2015

shipit

@solarkennedy
Copy link
Contributor Author

I'll ship this tomorrow

@Rob-Johnson
Copy link
Contributor

🚢

@solarkennedy solarkennedy merged commit 343babe into master Nov 2, 2015
@solarkennedy solarkennedy self-assigned this Nov 3, 2015
@asottile asottile deleted the PAASTA-1220 branch May 31, 2016 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants