This software is pre-production and should not be deployed to production servers.
Table of Contents
To build WCA pex distribution file one need:
- GNU make
- docker
To build pex file inside docker (Dockerfile is used), please run:
make wca_package_in_docker
The command will result in creation of dist/wca.pex
file.
File dist/wca.pex
must be copied to /usr/bin/wca.pex
.
To build distribution file with support for storing metrics in Apache Kafka please follow Building executable binary with KafkaStorage component enabled guide.
- Centos 7.6 with at least 3.10.0-862 kernel with support of resctrl filesystem (WCA should work on earlier versions of centos or other Linux distributions, however it is tested only on centos 7.6)
- Python 3.6.x
All other WCA dependencies are bundled using PEX.
For RDT related features:
- Hardware with Intel RDT support.
It is possible to use RDT features on Skylake family of processors. However, there are known issues mentioned in errata:
- SKZ4 MBM does not accurately track write bandwidth,
- SKZ17 CMT counters may not count accurately,
- SKZ18 CAT may not restrict cacheline allocation under certain conditions,
- SKZ19 MBM counters may undercount.
To enable RDT please add kernel boot time parameters rdt=cmt,mbmtotal,mbmlocal,l3cat
(kernel documenatation).
yum install python3 # centos 7.6
Then, verify that Python is installed correctly:
python3 --version
Should output:
Python 3.6.x
WCA processes should not be run with root privileges. Following privileges are needed to run WCA as non-root user:
- CAP_DAC_OVERRIDE - to allow non-root use cgroups filesystem.
- CAP_SETUID capability and SECBIT_NO_SETUID_FIXUP secure bit set - to allow non-root use resctrl filesystem.
/proc/sys/kernel/perf_event_paranoid
- content of the file must be set to0
or-1
to allow non-root user to collect all the necessary perf event information.
If it is impossible or undesired to run WCA with privileges outlined above, then you must add -0
(or its
long form: --root
) argument when starting the process)
Assumptions:
/var/lib/wca
directory existswca
user and group already exists
Please use following template as systemd /etc/systemd/system/wca.service
unit file:
[Unit] Description=Workload Collocation Agent [Service] ExecStart=/usr/bin/scl enable rh-python36 '/usr/bin/wca.pex \ --config /etc/wca/wca_config.yml \ --register $EXTRA_COMPONENT \ --log info' User=wca Group=wca # CAP_DAC_OVERRIDE allows to remove resctrl groups and CAP_SETUID allows to change effective uid to add tasks to the groups CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_SETUID AmbientCapabilities=CAP_DAC_OVERRIDE CAP_SETUID # We must avoid dropping capabilities after changing effective uid from root to wca SecureBits=no-setuid-fixup Restart=always RestartSec=5 LimitNOFILE=500000 WorkingDirectory=/var/lib/wca [Install] WantedBy=multi-user.target
where:
--register
flag is needed if external plugin needs to be used.
$EXTRA_COMPONENT
should be replaced with name of a class e.g. your_custom_module.allocators:YourCustomAllocator
.
Class name must comply with pkg_resources format.
All dependencies of the class must be available in currently used PYTHONPATH.
You can use wca.allocators:NOPAllocator
that is already bundled within dist/wca.pex
file and does not have to be registered
(if you decide to use it remove registration from wca.service file).
note: | Running wca with dedicated "wca" user is more secure, but requires enabling perf counters to be used by non-root users.
You need to reconfigure perf_event_paranoid sysctl paramter like this:
sudo sysctl -w kernel.perf_event_paranoid=-1 or for persistent mode modify /etc/sysctl.conf and set
kernel.perf_event_paranoid = -1 . Mode about perf_event_paranoid here |
---|
It is recommended to build a pex file with external component and its dependencies bundled. See prm plugin from platform-resource-manager as an example of such an approach.
Config /etc/wca/wca_config.yml
must exists. See an example configuration file to be used with NOPAllocator
:
runner: !AllocationRunner
config: !AllocationRunnerConfig
node: !MesosNode
mesos_agent_endpoint: 'http://127.0.0.1:5051'
timeout: 5
interval: 1.
metrics_storage: !LogStorage
output_filename: '/tmp/metrics_storage.log'
extra_labels:
env_id: "$HOST_IP"
anomalies_storage: !LogStorage
output_filename: '/tmp/anomalies_storage.log'
allocator: !NOPAllocator
...
...
Following configuration is required in order to use MesosNode
component to discover new tasks:
- Mesos containerizer (
--containerizers=mesos
) must be used. - Mesos agent must be configured to support following isolators
filesystem/linux
,docker/volume
,docker/runtime
,cgroups/cpu
,cgroups/perf_event
.
- Mesos agent must expose operator API over secure socket. WCA TLS can be disabled in configuration by modifying
mesos_agent_endpoint
property. - Mesos agent may be configured to use Docker registry to fetch images.