Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial draft of minesweeper daemonset #1778

Merged
merged 5 commits into from
Jan 27, 2021
Merged

Conversation

minrk
Copy link
Member

@minrk minrk commented Jan 18, 2021

  • daemonset for one instance per node
  • config and source for minesweeper are encrypted so it's not so trivial to see what we do here (also harder to review changes!)
  • produces reports on suspicious pods, terminates subset based on different criteria
  • daemonset runs as singleuser-server uid, root gid with limited capabilities due to hostPID access
  • chartpress images are now published to docker hub instead of private registry (same as z2jh), since we now have an image that all federation members use

Main things to consider:

  • how secret should this stuff be? As it works now, making it public would also make it ~trivial to circumvent if folks know where to look. Public review is no longer possible, though, as folks with git-crypt access need to checkout the files to locally and review them by hand. Currently the inspection of processes/pods is encrypted, but the scaffolding and image are publicly reviewable
  • image for this service is custom but not built in this repo by chartpress. As it is, it has nothing in it other than the kubernetes Python client package and ps. But we cannot use chartpress for images right now (I think) because non-GKE projects can't pull images from GKE. chartpress images are now published to dockerhub

allows secrets in images or the chart itself
@betatim
Copy link
Member

betatim commented Jan 19, 2021

local = threading.local()

def get_kube():
    """Get thread-local kubernetes client

    kubernetes client objects aren't threadsafe, I guess
    """
    if not hasattr(local, "kube"):
        local.kube = kubernetes.client.CoreV1Api()
    return local.kube

Do you understand more about this? Wondering if we should be doing this for BinderHub as well.

Could we hide things like the patterns we look like in a config file in a config in secrets/ and keep the rest of the script in a public place? Maybe include the source of suspicious_cmd() in what we keep secret. To me the rest is maybe not obvious stuff but also not mega useful for getting around the detection. WDYT?

@minrk minrk force-pushed the minesweeper branch 2 times, most recently from a03e2c5 to 33b9b38 Compare January 19, 2021 08:52
@minrk
Copy link
Member Author

minrk commented Jan 19, 2021

Do you understand more about this? Wondering if we should be doing this for BinderHub as well.

It appears to be in the 'handshake' itself. I don't understand enough about the kuberntes API to be sure what's going on here, but it surprised me a bit to see this issue here (and it was very reliable) but not in binderhub ~ever. Maybe it has to do with a first-run request for a token or something? Or maybe it's specific to the streaming websocket requests involved here.

It may be appropriate to do it in binderhub, but probably fine to wait-and-see. Could come up on a kubernetes version bump, perhaps.

Could we hide things like the patterns we look like in a config file in a config in secrets/ and keep the rest of the script in a public place?

Yes, we could definitely do that, however just knowing what information we look at is enough to make a pretty good guess. I'll have a look

produces reports on suspicious pods

terminates a subset of processes based on stricter criteria
instead of parsing ps output ourselves

makes extending/adding field much easier
@minrk
Copy link
Member Author

minrk commented Jan 19, 2021

@betatim I moved the inspection implementations into an encrypted minesweeper/secrets/herorat.py, which is much smaller. The rest of the logic: lookup all processes, report on suspicious ones, terminate those marked for deletion is now in a more readily reviewable minesweeper.py, while only the inspection/flagging is in the black box.

There are two flags to come out of inspection:

  • "suspicious" indicates that there should be a report, and
  • "should_terminate" indicates that it should be immediately terminated (suspicious is always True if should_terminate)

Both pods and processes are inspected and can be flagged for suspicion.

The image being used is currently defined here: https://github.com/minrk/overwatch . The only requirements (so far): python, psutil, kubernetes.

run as singleuser-pod uid for access to certain attributes
and root group for others

drop all capabilities by default, since we don't want to be able to e.g. kill other processes
@minrk minrk requested a review from betatim January 22, 2021 11:17
@minrk
Copy link
Member Author

minrk commented Jan 22, 2021

@betatim I believe I've implemented the changes discussed at the meeting:

  • minesweeper runs as singleuser uid instead of root to limit access to other processes
  • privileges are very limited with capabilities: drop: all
  • all images, including the new minesweeper image, are published to the jupyterhub org on docker hub, with the same permissions, mechanism as z2jh (I have already added the secrets to the repo so it should work when merged)

make them public images

now that all federation members use at least one image (minesweeper),
publish in a public place like z2jh instead of the private gcp registry.

DOCKER_USERNAME, DOCKER_PASSWORD for jupyterhubbot have been added to secrets,
the same one we use for z2jh.
name:
tag:
name: jupyterhub/mybinder.org-analytics-publisher
tag: 'set-by-chartpress'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes caused by chartpress --reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants