0.18.38
Intel Gaudi
dstack
now supports Intel Gaudi accelerators with SSH fleets.
To use Intel Gaudi with dstack
, create an SSH fleet, and once it's up, feel free to specify gaudi
, gaudi2
, or gaudi3
as a GPU name (or intel
as a vendor name) in your run configuration:
type: dev-environment
python: "3.12"
ide: vscode
resources:
gpu: gaudi2:8 # 8 × Gaudi 2
Note
To use SSH fleets with Intel Gaudi, ensure that the Gaudi software and drivers are installed on each host. This should include the drivers, hl-smi
, and Habana Container Runtime.
Volumes
Stop duration and force detachment
In some cases, a volume may get stuck in the detaching
state. When this happens, the run is marked as stopped, but the instance remains in an inconsistent state, preventing its deletion or reuse. Additionally, the volume cannot be used with other runs.
To address this, dstack
now ensures that the run remains in the terminating
state until the volume is fully detached. By default, dstack
waits for 5m before forcing a detach. You can override this using stop_duration
by setting a different duration or disabling it (off
) for an unlimited duration.
Note
Force detaching a volume may corrupt the file system and should only be used as a last resort. If volumes frequently require force detachment, contact your cloud provider’s support to identify the root cause.
Bug-fixes
This update also resolves an issue where dstack
mistakenly marked a volume as attached
even though it was actually detached.
UI
Fleets
The UI has been updated to simplify fleet and instance management. The Fleets
page now allows users to terminate fleets and displays both active and terminated fleets. The new Instances page shows active and terminated instances across all fleets.
What's changed
- Add Intel Gaudi support for SSH fleets by @un-def in #2216
- Support models with non-standard
finish_reason
by @jvstme in #2229 - [Internal]: Ensure all files end with a newline by @jvstme in #2227
- [chore]: Refactor gateway modules by @jvstme in #2226
- [chore]: Move connection pool to proxy deps by @jvstme in #2235
- [chore]: Update migration
ffa99edd1988
by @jvstme in #2217 - [chore]: Update/remove dstack-proxy TODOs by @jvstme in #2239
- [UI] New UI for fleets and instances by @olgenn in #2236
- Improve UX when no offers found by @jvstme in #2240
- Implement volumes force detach by @r4victor in #2242
Full changelog: 0.18.37...0.18.38