Skip to content

Commit

Permalink
updating sysctls for 3.11
Browse files Browse the repository at this point in the history
  • Loading branch information
kalexand-rh committed Aug 6, 2018
1 parent ba06a71 commit a8a0ad8
Showing 1 changed file with 63 additions and 44 deletions.
107 changes: 63 additions & 44 deletions admin_guide/sysctls.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,18 @@ toc::[]

Sysctl settings are exposed via Kubernetes, allowing users to modify certain
kernel parameters at runtime for namespaces within a container. Only sysctls
that are namespaced can be set independently on pods; if a sysctl is not
namespaced (called _node-level_), it cannot be set within {product-title}.
Moreover, only those sysctls considered _safe_ are whitelisted by default; other
_unsafe_ sysctls can be manually enabled on the node to be available to the
that are namespaced can be set independently on pods. If a sysctl is not
namespaced, called _node-level_, it cannot be set within {product-title}.
Moreover, only those sysctls considered _safe_ are whitelisted by default; you
can manually enable other _unsafe_ sysctls on the node to be available to the
user.

[[undersatnding-sysctls]]
== Understanding Sysctls
== Understanding sysctls

In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the *_/proc/sys/_* virtual
process file system. The parameters cover various subsystems such as:
process file system. The parameters cover various subsystems, such as:

- kernel (common prefix: *_kernel._*)
- networking (common prefix: *_net._*)
Expand All @@ -40,10 +40,10 @@ $ sudo sysctl -a
----

[[namespaced-vs-node-level-sysctls]]
== Namespaced Versus Node-Level Sysctls
== Namespaced versus node-level sysctls

A number of sysctls are _namespaced_ in today’s Linux kernels. This means that
they can be set independently for each pod on a node. Being namespaced is a
you can set them independently for each pod on a node. Being namespaced is a
requirement for sysctls to be accessible in a pod context within Kubernetes.

The following sysctls are known to be namespaced:
Expand All @@ -56,63 +56,64 @@ The following sysctls are known to be namespaced:

Sysctls that are not namespaced are called _node-level_ and must be set
manually by the cluster administrator, either by means of the underlying Linux
distribution of the nodes (e.g., via *_/etc/sysctls.conf_*) or using a DaemonSet
with privileged containers.
distribution of the nodes, such as by modifying the *_/etc/sysctls.conf_* file,
or by using a DaemonSet with privileged containers.

[NOTE]
====
Consider marking nodes with special sysctls as tainted. Only schedule pods onto
them that need those sysctl settings. Use the
link:http://kubernetes.io/docs/user-guide/kubectl/kubectl_taint/[Kubernetes _taints and toleration_ feature] to implement this.
xref:../admin_guide/scheduling/taints_tolerations.adoc#admin-guide-taints[taints
and toleration feature] to mark the nodes.
====

[[safe-vs-unsafe-sysclts]]
== Safe Versus Unsafe Sysctls
== Safe versus unsafe sysctls

Sysctls are grouped into _safe_ and _unsafe_ sysctls. In addition to proper
namespacing, a safe sysctl must be properly isolated between pods on the same
node. This means that setting a safe sysctl for one pod:
node. This means that if you set a sysctl as safe for one pod it must not:

- must not have any influence on any other pod on the node,
- must not allow to harm the node's health, and
- must not allow to gain CPU or memory resources outside of the resource limits of
a pod.
- Influence any other pod on the node
- Harm the node's health
- Gain CPU or memory resources outside of the resource limits of a pod

By far, most of the namespaced sysctls are not necessarily considered safe.

For {product-title} 3.3.1, the following sysctls are supported (whitelisted) in
the safe set:
Currently, {product-title} supports, or whitelists, the following sysctls
in the safe set:

- *_kernel.shm_rmid_forced_*
- *_net.ipv4.ip_local_port_range_*
- *_net.ipv4.tcp_syncookies_*

This list will be extended in future versions when the kubelet supports better
This list might be extended in future versions when the kubelet supports better
isolation mechanisms.

All safe sysctls are enabled by default. All unsafe sysctls are disabled by
default and must be allowed manually by the cluster administrator on a per-node
basis. Pods with disabled unsafe sysctls will be scheduled, but will fail to
default, and the cluster administrator must manually enable them on a per-node
basis. Pods with disabled unsafe sysctls will be scheduled but will fail to
launch.

[[enabling-unsafe-sysctls]]
== Enabling unsafe sysctls

The cluster administrator can allow certain unsafe sysctls for very special
situations such as high-performance or real-time application tuning.

If you want to use unsafe sysctls, cluster administrators must enable them
individually on nodes. They can enable only namespaced sysctls.

[WARNING]
====
Due to their nature of being unsafe, the use of unsafe sysctls is
at-your-own-risk and can lead to severe problems like wrong behavior of
containers, resource shortage, or complete breakage of a node.
====

[[enabling-unsafe-sysctls]]
== Enabling Unsafe Sysctls

With the warning above in mind, the cluster administrator can allow certain
unsafe sysctls for very special situations, e.g., high-performance or real-time
application tuning.

If you want to use unsafe sysctls, cluster administrators must enable them
individually on nodes. Only namespaced sysctls can be enabled this way.

. Specify the unsafe sysctls to use as the value of the `kubeletArguments`\ parameter in the appropriate xref:../admin_guide/manage_nodes.adoc#modifying-nodes[node configuration map]
file, as described in xref:../admin_guide/manage_nodes.adoc#configuring-node-resources[Configuring Node Resources]:
. Use the `*kubeletArguments*` field in the *_/etc/origin/node/node-config.yaml_*
file, as described in
xref:../admin_guide/manage_nodes.adoc#configuring-node-resources[Configuring Node Resources], to set the desired unsafe sysctls:
+
----
kubeletArguments:
Expand All @@ -134,31 +135,49 @@ ifdef::openshift-origin[]
endif::[]

[[setting-sysctls-for-a-pod]]
== Setting Sysctls for a Pod
== Setting sysctls for a pod

Sysctls are set on pods using the pod's `securityContext`. The `securityContext`
applies to all containers in the same pod.

The following example uses the pod `securityContext` to set a safe sysctl
`kernel.shm_rmid_forced` and two unsafe sysctls, `net.ipv4.route.min_pmtu` and
`kernel.msgmax`. There is no distinction between _safe_ and _unsafe_ sysctls in
the specification.

Sysctls are set on pods using annotations. They apply to all containers in the
same pod.
[WARNING]
====
To avoid destabilizing your operating system, modify sysctl parameters only
after you understand their effects.
====

Here is an example, with different annotations for safe and unsafe sysctls:
Modify the YAML file that defines the pod and add the `securityContext` spec, as
shown in the following example:

[source,yaml]
----
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
annotations:
security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1
security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.route.min_pmtu=1000,kernel.msgmax=1 2 3
spec:
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "0"
- name: net.ipv4.route.min_pmtu
value: "552"
- name: kernel.msgmax
value: "65536"
...
----

[NOTE]
====
A pod with the unsafe sysctls specified above will fail to launch on any node
that has not enabled those two unsafe sysctls explicitly. As with node-level
sysctls, use the
link:http://kubernetes.io/docs/user-guide/kubectl/kubectl_taint[taints and
that the admin has not explicitly enabled those two unsafe sysctls. As with
node-level sysctls, use the
xref:../admin_guide/scheduling/taints_tolerations.adoc#admin-guide-taints[taints and
toleration feature] or
xref:../admin_guide/manage_nodes.adoc#updating-labels-on-nodes[labels on nodes]
to schedule those pods onto the right nodes.
Expand Down

0 comments on commit a8a0ad8

Please sign in to comment.