Skip to content

Latest commit

 

History

History
186 lines (129 loc) · 7.55 KB

install.rst

File metadata and controls

186 lines (129 loc) · 7.55 KB

Building, installing and running WCA

This software is pre-production and should not be deployed to production servers.

To build WCA pex distribution file one need:

  • GNU make
  • docker

To build pex file inside docker (Dockerfile is used), please run:

make wca_package_in_docker

The command will result in creation of dist/wca.pex file.

File dist/wca.pex must be copied to /usr/bin/wca.pex.

To build distribution file with support for storing metrics in Apache Kafka please follow Building executable binary with KafkaStorage component enabled guide.

  • Centos 7.6 with at least 3.10.0-862 kernel with support of resctrl filesystem (WCA should work on earlier versions of centos or other Linux distributions, however it is tested only on centos 7.6)
  • Python 3.6.x

All other WCA dependencies are bundled using PEX.

For RDT related features:

It is possible to use RDT features on Skylake family of processors. However, there are known issues mentioned in errata:

  • SKZ4 MBM does not accurately track write bandwidth,
  • SKZ17 CMT counters may not count accurately,
  • SKZ18 CAT may not restrict cacheline allocation under certain conditions,
  • SKZ19 MBM counters may undercount.

To enable RDT please add kernel boot time parameters rdt=cmt,mbmtotal,mbmlocal,l3cat (kernel documenatation).

yum install python3  # centos 7.6

Then, verify that Python is installed correctly:

python3 --version

Should output:

Python 3.6.x

WCA processes should not be run with root privileges. Following privileges are needed to run WCA as non-root user:

If it is impossible or undesired to run WCA with privileges outlined above, then you must add -0 (or its long form: --root) argument when starting the process)

Assumptions:

  • /var/lib/wca directory exists
  • wca user and group already exists

Please use following template as systemd /etc/systemd/system/wca.service unit file:

[Unit]
Description=Workload Collocation Agent

[Service]
ExecStart=/usr/bin/scl enable rh-python36 '/usr/bin/wca.pex \
    --config /etc/wca/wca_config.yml \
    --register $EXTRA_COMPONENT \
    --log info'
User=wca
Group=wca
# CAP_DAC_OVERRIDE allows to remove resctrl groups and CAP_SETUID allows to change effective uid to add tasks to the groups
CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_SETUID
AmbientCapabilities=CAP_DAC_OVERRIDE CAP_SETUID
# We must avoid dropping capabilities after changing effective uid from root to wca
SecureBits=no-setuid-fixup
Restart=always
RestartSec=5
LimitNOFILE=500000
WorkingDirectory=/var/lib/wca

[Install]
WantedBy=multi-user.target

where:

--register flag is needed if external plugin needs to be used. $EXTRA_COMPONENT should be replaced with name of a class e.g. your_custom_module.allocators:YourCustomAllocator. Class name must comply with pkg_resources format. All dependencies of the class must be available in currently used PYTHONPATH.

You can use wca.allocators:NOPAllocator that is already bundled within dist/wca.pex file and does not have to be registered (if you decide to use it remove registration from wca.service file).

note:Running wca with dedicated "wca" user is more secure, but requires enabling perf counters to be used by non-root users. You need to reconfigure perf_event_paranoid sysctl paramter like this: sudo sysctl -w kernel.perf_event_paranoid=-1 or for persistent mode modify /etc/sysctl.conf and set kernel.perf_event_paranoid = -1. Mode about perf_event_paranoid here

It is recommended to build a pex file with external component and its dependencies bundled. See prm plugin from platform-resource-manager as an example of such an approach.

Config /etc/wca/wca_config.yml must exists. See an example configuration file to be used with NOPAllocator:

runner: !AllocationRunner
  config: !AllocationRunnerConfig
    node: !MesosNode
      mesos_agent_endpoint: 'http://127.0.0.1:5051'
    timeout: 5
    interval: 1.
    metrics_storage: !LogStorage
      output_filename: '/tmp/metrics_storage.log'
    extra_labels:
      env_id: "$HOST_IP"
    anomalies_storage: !LogStorage
      output_filename: '/tmp/anomalies_storage.log'
    allocator: !NOPAllocator
        ...
    ...

Following configuration is required in order to use MesosNode component to discover new tasks:

  • Mesos containerizer (--containerizers=mesos) must be used.
  • Mesos agent must be configured to support following isolators
    • filesystem/linux,
    • docker/volume,
    • docker/runtime,
    • cgroups/cpu,
    • cgroups/perf_event.
  • Mesos agent must expose operator API over secure socket. WCA TLS can be disabled in configuration by modifying mesos_agent_endpoint property.
  • Mesos agent may be configured to use Docker registry to fetch images.