Skip to content

Commit

Permalink
Integrate zeusd into zeus.device.gpu (#85)
Browse files Browse the repository at this point in the history
  • Loading branch information
jaywonchung authored May 30, 2024
1 parent 9f9394f commit f1857d3
Show file tree
Hide file tree
Showing 30 changed files with 1,054 additions and 562 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/check_homepage_build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ jobs:
- name: Install other homepage dependencies
run: pip install '.[docs]'
- name: Build homepage
run: mkdocs build --verbose
run: mkdocs build --verbose --strict
2 changes: 1 addition & 1 deletion .github/workflows/deploy_homepage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ env:
jobs:
deploy:
runs-on: ubuntu-latest
if: github.event.repository.fork == false
if: github.repository_owner = 'ml-energy'
steps:
- name: Checkout repository
uses: actions/checkout@v4
Expand Down
24 changes: 24 additions & 0 deletions .github/workflows/publish_crates_io.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Release

on:
push:
tags:
- zeusd-v*

jobs:
cargo-publish:
if: github.repository_owner == 'ml-energy'
runs-on: ubuntu-latest
env:
CARGO_TERM_COLOR: always
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
sparse-checkout: zeusd
- name: Publish to crates.io
uses: katyo/publish-crates@v2
with:
path: zeusd
registry-token: ${{ secrets.CRATES_IO_TOKEN }}
check-repo: false
4 changes: 2 additions & 2 deletions .github/workflows/publish_pypi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ name: Publish Python package to PyPI
on:
push:
tags:
- v*
- zeus-v*

jobs:
publish:
runs-on: ubuntu-latest
if: github.event.repository.fork == false
if: github.repository_owner = 'ml-energy'
steps:
- name: Checkout repository
uses: actions/checkout@v3
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/push_docker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
branches:
- master
tags:
- v*
- zeus-v*
paths:
- '.github/workflows/push_docker.yaml'
- 'capriccio/**'
Expand All @@ -21,6 +21,7 @@ on:

jobs:
build_and_push:
if: github.repository_owner == 'ml-energy'
runs-on: ubuntu-latest
steps:
- name: Remove unnecessary files
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: Check format, lint, and test
name: (Zeus) Check format, lint, and test

on:
push:
paths:
- '.github/workflows/fmt_lint_test.yaml'
- '.github/workflows/zeus_fmt_lint_test.yaml'
- 'zeus/**'
- 'tests/**'
- 'capriccio/*.py'
Expand All @@ -14,7 +14,7 @@ on:

# Jobs initiated by previous pushes get cancelled by a new push.
concurrency:
group: ${{ github.ref }}-lint-and-test
group: ${{ github.ref }}-zeus-lint-and-test
cancel-in-progress: true

jobs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/zeusd_fmt_lint_test.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Check format, lint, and test for Zeusd
name: (Zeusd) Check format, lint, and test

on:
push:
Expand Down
38 changes: 26 additions & 12 deletions docs/getting_started/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,32 +83,46 @@ docker build -t mlenergy/zeus:master --build-arg TARGETARCH=amd64 -f docker/Dock

## System privileges

!!! Important "Nevermind if you're just measuring"
No special system-level privileges are needed if you are just measuring time and energy.
!!! Important "Nevermind if you're just measuring GPU energy"
No special system-level privileges are needed if you are just measuring GPU time and energy.
However, when you're looking into optimizing energy and if that method requires changing the GPU's power limit or SM frequency, special system-level privileges are required.

### When are extra system privileges needed?

The Linux capability `SYS_ADMIN` is required in order to change the GPU's power limit or frequency.
Specifically, this is needed by the [`GlobalPowerLimitOptimizer`][zeus.optimizer.power_limit.GlobalPowerLimitOptimizer] and the [`PipelineFrequencyOptimizer`][zeus.optimizer.pipeline_frequency.PipelineFrequencyOptimizer].

### Obtaining privileges with Docker
### Option 1: Running applications in a Docker container

Using Docker, you can pass `--cap-add SYS_ADMIN` to `docker run`.
Since this significantly simplifies running Zeus, we recommend users to consider this option first.
Also, since Zeus is running inside a container, there is less potential for damage even if things go wrong.
This is also possible for Kubernetes Pods with `securityContext.capabilities.add` in container specs ([docs](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container){.external}).

### Obtaining privileges with `sudo`
### Option 2: Deploying the Zeus daemon (`zeusd`)

If you cannot use Docker, you can run your application with `sudo`.
This is not recommended due to security reasons, but it will work.
Granting `SYS_ADMIN` to the entire application just to be able to change the GPU's configuration is [granting too much](https://en.wikipedia.org/wiki/Principle_of_least_privilege){.external}.
Instead, Zeus provides the [**Zeus daemon** or `zeusd`](https://github.com/ml-energy/zeus/tree/master/zeusd){.external}, which is a simple server/daemon process that is designed to run with admin privileges and exposes the minimal set of APIs wrapping NVML methods for changing the GPU's configuration.
Then, an unprivileged (i.e., run normally by any user) application can ask `zeusd` via a Unix Domain Socket to change the local node's GPU configuration on its behalf.

### GPU management server
To deploy `zeusd`:

It is fair to say that granting `SYS_ADMIN` to the application is itself giving too much privilege.
We just need to be able to change the GPU's power limit or frequency, instead of giving the process privileges to administer the system.
Thus, to reduce the attack surface, we are considering solutions such as a separate GPU management server process on a node ([tracking issue](https://github.com/ml-energy/zeus/issues/29)), which has `SYS_ADMIN`.
Then, an unprivileged application process can ask the GPU management server via a UDS to change the GPU's configuration on its behalf.
``` { .sh .annotate }
# Install zeusd
cargo install zeusd

# Run zeusd with admin privileges
sudo zeusd \
--socket-path /var/run/zeusd.sock \ # (1)!
--socket-permissions 666 # (2)!
```

1. Unix domain socket path that `zeusd` listens to.
2. Applications need *write* access to the socket to be able to talk to `zeusd`. This string is interpreted as [UNIX file permissions](https://en.wikipedia.org/wiki/File-system_permissions#Numeric_notation).

### Option 3: Running applications with `sudo`

This is probably the worst option.
However, if none of the options above work, you can run your application with `sudo`, which automatically has `SYS_ADMIN`.

## Next Steps

Expand Down
2 changes: 1 addition & 1 deletion examples/huggingface/run_clm.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
from transformers.utils.versions import require_version

from zeus.monitor import ZeusMonitor
from zeus.optimizer import HFGlobalPowerLimitOptimizer
from zeus.optimizer.power_limit import HFGlobalPowerLimitOptimizer

# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.37.2")
Expand Down
9 changes: 5 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ dependencies = [
"pydantic", # The `zeus.utils.pydantic_v1` compatibility layer allows us to unpin Pydantic in most cases.
"rich",
"tyro",
"httpx"
]
dynamic = ["version"]

Expand All @@ -37,13 +38,13 @@ Documentation = "https://ml.energy/zeus"

[project.optional-dependencies]
# One day FastAPI will drop support for Pydantic V1. Then fastapi has to be pinned as well.
pfo = ["pydantic<2", "httpx"]
pfo-server = ["fastapi[all]", "pydantic<2", "lowtime", "aiofiles", "httpx", "torch"]
bso = ["pydantic<2", "httpx"]
pfo = ["pydantic<2"]
pfo-server = ["fastapi[all]", "pydantic<2", "lowtime", "aiofiles", "torch"]
bso = ["pydantic<2"]
bso-server = ["fastapi[all]", "sqlalchemy", "pydantic<2", "python-dotenv"]
migration = ["alembic", "sqlalchemy", "pydantic<2", "python-dotenv"]
lint = ["ruff", "black==22.6.0", "pyright", "pandas-stubs", "transformers"]
test = ["fastapi[all]", "sqlalchemy", "pydantic<2", "httpx", "pytest==7.3.2", "pytest-mock==3.10.0", "pytest-xdist==3.3.1", "anyio==3.7.1", "aiosqlite==0.20.0"]
test = ["fastapi[all]", "sqlalchemy", "pydantic<2", "pytest==7.3.2", "pytest-mock==3.10.0", "pytest-xdist==3.3.1", "anyio==3.7.1", "aiosqlite==0.20.0"]
docs = ["mkdocs-material[imaging]==9.5.19", "mkdocstrings[python]==0.25.0", "mkdocs-gen-files==0.5.0", "mkdocs-literate-nav==0.6.1", "mkdocs-section-index==0.3.9", "mkdocs-redirects==1.2.1", "urllib3<2", "black"]
# greenlet is for supporting apple mac silicon for sqlalchemy(https://docs.sqlalchemy.org/en/20/faq/installation.html)
dev = ["zeus-ml[pfo-server,bso,bso-server,migration,lint,test]", "greenlet"]
Expand Down
68 changes: 68 additions & 0 deletions zeus/device/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""Common utilities for device management."""

from __future__ import annotations

import os
import ctypes
from functools import lru_cache

from zeus.utils.logging import get_logger

logger = get_logger(__name__)


@lru_cache(maxsize=1)
def has_sys_admin() -> bool:
"""Check if the current process has `SYS_ADMIN` capabilities."""
# First try to read procfs.
try:
with open("/proc/self/status") as f:
for line in f:
if line.startswith("CapEff"):
bitmask = int(line.strip().split()[1], 16)
has = bool(bitmask & (1 << 21))
logger.info(
"Read security capabilities from /proc/self/status -- SYS_ADMIN: %s",
has,
)
return has
except Exception:
logger.info("Failed to read capabilities from /proc/self/status", exc_info=True)

# If that fails, try to use the capget syscall.
class CapHeader(ctypes.Structure):
_fields_ = [("version", ctypes.c_uint32), ("pid", ctypes.c_int)]

class CapData(ctypes.Structure):
_fields_ = [
("effective", ctypes.c_uint32),
("permitted", ctypes.c_uint32),
("inheritable", ctypes.c_uint32),
]

# Attempt to load libc and set up capget
try:
libc = ctypes.CDLL("libc.so.6")
capget = libc.capget
capget.argtypes = [ctypes.POINTER(CapHeader), ctypes.POINTER(CapData)]
capget.restype = ctypes.c_int
except Exception:
logger.info("Failed to load libc.so.6", exc_info=True)
return False

# Initialize the header and data structures
header = CapHeader(version=0x20080522, pid=0) # Use the current process
data = CapData()

# Call capget and check for errors
if capget(ctypes.byref(header), ctypes.byref(data)) != 0:
errno = ctypes.get_errno()
logger.info(
"capget failed with error: %s (errno %s)", os.strerror(errno), errno
)
return False

bitmask = data.effective
has = bool(bitmask & (1 << 21))
logger.info("Read security capabilities from capget -- SYS_ADMIN: %s", has)
return has
8 changes: 8 additions & 0 deletions zeus/device/exception.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,11 @@ class ZeusBaseGPUError(ZeusBaseError):
def __init__(self, message: str) -> None:
"""Initialize Base Zeus Exception."""
super().__init__(message)


class ZeusdError(ZeusBaseGPUError):
"""Exception class for Zeus daemon-related errors."""

def __init__(self, message: str) -> None:
"""Initialize Zeusd error."""
super().__init__(message)
73 changes: 47 additions & 26 deletions zeus/device/gpu/__init__.py
Original file line number Diff line number Diff line change
@@ -1,37 +1,55 @@
"""GPU device module for Zeus. Abstraction of GPU devices.
"""Abstraction layer for GPU devices.
The main function of this module is [`get_gpus`][zeus.device.gpu.get_gpus],
which returns a GPU Manager object specific to the platform.
!!! Important
In theory, any NVIDIA GPU would be supported.
On the other hand, for AMD GPUs, we currently only support ROCm 6.0 and later.
## Getting handles to GPUs
The main API exported from this module is the `get_gpus` function. It returns either
[`NVIDIAGPUs`][zeus.device.gpu.nvidia.NVIDIAGPUs] or [`AMDGPUs`][zeus.device.gpu.amd.AMDGPUs]
depending on the platform.
The main function of this module is [`get_gpus`][zeus.device.gpu.get_gpus], which returns a GPU Manager object specific to the platform.
To instantiate a GPU Manager object, you can do the following:
```python
from zeus.device import get_gpus
gpus = get_gpus() # Returns NVIDIAGPUs() or AMDGPUs() depending on the platform.
gpus = get_gpus()
```
There exists a 1:1 mapping between specific library functions and methods implemented in the GPU Manager object.
For example, for NVIDIA systems, if you wanted to do:
## Calling GPU management APIs
GPU management library APIs are mapped to methods on [`GPU`][zeus.device.gpu.common.GPU].
For example, for NVIDIA GPUs (which uses `pynvml`), you would have called:
```python
handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_index)
constraints = pynvml.nvmlDeviceGetPowerManagementLimitConstraints(handle)
```
You can now do:
With the Zeus GPU abstraction layer, you would now call:
```python
gpus = get_gpus() # returns a NVIDIAGPUs object
constraints = gpus.getPowerManagementLimitConstraints(gpu_index)
gpus = get_gpus() # returns an NVIDIAGPUs object
constraints = gpus.getPowerManagementLimitConstraints(gpu_index)
```
Class hierarchy:
## Non-blocking calls
- [`GPUs`][zeus.device.gpu.GPUs]: Abstract class for GPU managers.
- [`NVIDIAGPUs`][zeus.device.gpu.NVIDIAGPUs]: GPU manager for NVIDIA GPUs, initialize NVIDIAGPU objects.
- [`AMDGPUs`][zeus.device.gpu.AMDGPUs]: GPU manager for AMD GPUs, initialize AMDGPU objects.
- [`GPU`][zeus.device.gpu.GPU]: Abstract class for GPU objects.
- [`NVIDIAGPU`][zeus.device.gpu.NVIDIAGPU]: GPU object for NVIDIA GPUs.
- [`AMDGPU`][zeus.device.gpu.AMDGPU]: GPU object for AMD GPUs.
Some implementations of `GPU` support non-blocking calls to setters.
If non-blocking calls are not supported, setting `block` will be ignored and the call will block.
Check [`GPU.supports_non_blocking`][zeus.device.gpu.common.GPU.supports_nonblocking_setters]
to see if non-blocking calls are supported.
Note that non-blocking calls will not raise exceptions even if the call fails.
Currently, only [`ZeusdNVIDIAGPU`][zeus.device.gpu.nvidia.ZeusdNVIDIAGPU] supports non-blocking calls
to methods that set the GPU's power limit, GPU frequency, memory frequency, and persistence mode.
This is possible because the Zeus daemon supports a `block: bool` parameter in HTTP requests,
which can be set to `False` to make the call return immediately without checking the result.
## Error handling
The following exceptions are defined in this module:
Expand All @@ -55,9 +73,8 @@
- [`ZeusGPULibRMVersionMismatchError`][zeus.device.gpu.ZeusGPULibRMVersionMismatchError]: Error for library version mismatch.
- [`ZeusGPUMemoryError`][zeus.device.gpu.ZeusGPUMemoryError]: Error for memory issues.
- [`ZeusGPUUnknownError`][zeus.device.gpu.ZeusGPUUnknownError]: Error for unknown issues.
"""

from __future__ import annotations

from zeus.device.gpu.common import *
Expand All @@ -70,17 +87,21 @@


def get_gpus(ensure_homogeneous: bool = False) -> GPUs:
"""Initialize and return a singleton GPU monitoring object for NVIDIA or AMD GPUs.
"""Initialize and return a singleton object for GPU management.
This function returns a GPU management object that aims to abstract
the underlying GPU vendor and their specific monitoring library
(pynvml for NVIDIA GPUs and amdsmi for AMD GPUs). Management APIs
are mapped to methods on the returned [`GPUs`][zeus.device.gpu.GPUs] object.
The function returns a GPU management object that aims to abstract the underlying GPU monitoring libraries
(pynvml for NVIDIA GPUs and amdsmi for AMD GPUs), and provides a 1:1 mapping between the methods in the object and related library functions.
GPU availability is checked in the following order:
This function attempts to initialize GPU monitoring using the pynvml library for NVIDIA GPUs
first. If pynvml is not available or fails to initialize, it then tries to use the amdsmi
library for AMD GPUs. If both attempts fail, it raises a ZeusErrorInit exception.
1. NVIDIA GPUs using `pynvml`
1. AMD GPUs using `amdsmi`
1. If both are unavailable, a `ZeusGPUInitError` is raised.
Args:
ensure_homogeneous (bool, optional): If True, ensures that all tracked GPUs have the same name. False by default.
ensure_homogeneous (bool): If True, ensures that all tracked GPUs have the same name.
"""
global _gpus
if _gpus is not None:
Expand Down
Loading

0 comments on commit f1857d3

Please sign in to comment.