To use the model server, you can just pull the image oneflowinc/oneflow-serving
from docker hub. Only when you want to modify the source code, you need to build from source.
You can build on bare metal, and you can also pull the docker image and follow the instructions below to build in docker container.
docker pull
To build from source, you need to build liboneflow first.
Build liboneflow from source
git clone --depth=1 cd oneflow mkdir build && cd build cmake -C ../cmake/caches/cn/cuda.cmake -DBUILD_CPP_API=ON -DWITH_MLIR=ON -G Ninja .. ninja
Build oneflow backend from source
mkdir build && cd build cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ -DTRITON_ENABLE_GPU=ON -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. ninja
Launch triton server
cd ../ # back to root of the serving docker run --runtime=nvidia --rm --network=host \ -v$(pwd)/examples:/models \ -v$(pwd)/build/ \ -v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib \ bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/ /opt/tritonserver/bin/tritonserver \ --model-repository=/models --backend-directory=/backends' curl -v localhost:8000/v2/health/ready # ready check
If you want to use XLA, TensorRT and OpenVINO in OneFlow-Serving, please build OneFlow-XRT and rebuild oneflow backend.
Build OneFlow-XRT with XLA, TensorRT or OpenVINO
git clone cd oneflow-xrt mkdir build && cd build # Build OneFlow-XRT XLA cmake -G Ninja .. -DBUILD_XLA=ON && ninja # Build OneFlow-XRT TensorRT cmake -G Ninja .. -DBUILD_TENSORRT=ON -DTENSORRT_ROOT=/path/to/tensorrt && ninja # Build OneFlow-XRT OpenVINO cmake -G Ninja .. -DBUILD_OPENVINO=ON -DOPENVINO_ROOT=/path/to/openvino && ninja
Build oneflow backend from source
mkdir build && cd build # Use TensorRT cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ -DTRITON_ENABLE_GPU=ON -DUSE_TENSORRT=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. ninja # Use XLA cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ -DTRITON_ENABLE_GPU=ON -DUSE_XLA=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. ninja # Use OpenVINO cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ -DTRITON_ENABLE_GPU=ON -DUSE_OPENVINO=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. ninja
Launch triton server
cd ../ # back to root of the serving docker run --runtime=nvidia --rm --network=host \ -v$(pwd)/examples:/models \ -v$(pwd)/build/ \ -v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib \ -v$(pwd)/oneflow-xrt/build/install/lib:/xrt_libs \ \ bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/:/xrt_libs /opt/tritonserver/bin/tritonserver \ --model-repository=/models --backend-directory=/backends' curl -v localhost:8000/v2/health/ready # ready check