Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added experimental manual mesos task reconciliation script (orphan killer) #238

Merged
merged 2 commits into from
Feb 5, 2016

Conversation

solarkennedy
Copy link
Contributor

This is an experimental script that looks at the slave state and kills orphaned containers.

With the new mesos upgrade, it left behind a bunch. Some of the bug reports suggest that when you do upgrades to blow away the metadata..... In the future after we do an upgrade we can run this script in dry-run mode to see?

Comments welcome, but I want to ship this before the weekend so our boxes don't continue to have really full disks, bogus orphaned tasks running everywhere, and clusters out of resources.

Longer term maybe we can run this as a monitoring check, returning 0 if there is no bogus containers, and 1 if there is some.

@solarkennedy solarkennedy self-assigned this Feb 5, 2016
@solarkennedy solarkennedy changed the title Mesos reconcilation Added experimental manual mesos task reconciliation script (orphan killer) Feb 5, 2016
print "Killing %s. (%s)" % (container["Names"][0], mesos_task_id)
docker_client.kill(container)
else:
print "Would kill %s. (%s)" % (container["Names"][0], mesos_task_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Could you hint at using the --force option in this message?

@nhandler
Copy link
Contributor

nhandler commented Feb 5, 2016

+1 for adding the monitoring check. That would have been very helpful in catching/fixing this issue.

@nhandler
Copy link
Contributor

nhandler commented Feb 5, 2016

Ship It!
Ship It!

solarkennedy added a commit that referenced this pull request Feb 5, 2016
Added experimental manual mesos task reconciliation script (orphan killer)
@solarkennedy solarkennedy merged commit 1caf534 into master Feb 5, 2016
@solarkennedy solarkennedy deleted the mesos_reconcilation branch February 5, 2016 21:36
frameworks = state.get('frameworks')
executors = [ex for fw in frameworks for ex in fw.get('executors', [])
if u'TASK_RUNNING' in [t[u'state'] for t in ex.get('tasks', [])]]
return [e["id"] for e in executors]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return a set here to make this run in linear time instead of quadratic

Line 50 checks if a container id is in the running task ids for every container, and checking membership in a list is O(n) while checking membership in a set is O(1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants