Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip install -e . does not work #1065

Closed
1 of 4 tasks
wangkuiyi opened this issue Feb 7, 2024 · 15 comments
Closed
1 of 4 tasks

pip install -e . does not work #1065

wangkuiyi opened this issue Feb 7, 2024 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@wangkuiyi
Copy link
Contributor

System Info

x86, H100, Ubuntu

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

On a system running CUDA 12.3 and H100, I installed the dependencies by running scripts referred to by Dockerfile.multi:

# https://www.gnu.org/software/bash/manual/html_node/Bash-Startup-Files.html
# The default values come from `nvcr.io/nvidia/pytorch`
ENV BASH_ENV=${BASH_ENV:-/etc/bash.bashrc}
ENV ENV=${ENV:-/etc/shinit_v2}
SHELL ["/bin/bash", "-c"]
FROM base as devel
COPY docker/common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
COPY docker/common/install_cmake.sh install_cmake.sh
RUN bash ./install_cmake.sh && rm install_cmake.sh
COPY docker/common/install_ccache.sh install_ccache.sh
RUN bash ./install_ccache.sh && rm install_ccache.sh
# Download & install internal TRT release
ARG TRT_VER
ARG CUDA_VER
ARG CUDNN_VER
ARG NCCL_VER
ARG CUBLAS_VER
COPY docker/common/install_tensorrt.sh install_tensorrt.sh
RUN bash ./install_tensorrt.sh \
--TRT_VER=${TRT_VER} \
--CUDA_VER=${CUDA_VER} \
--CUDNN_VER=${CUDNN_VER} \
--NCCL_VER=${NCCL_VER} \
--CUBLAS_VER=${CUBLAS_VER} && \
rm install_tensorrt.sh
# Install latest Polygraphy
COPY docker/common/install_polygraphy.sh install_polygraphy.sh
RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh
# Install mpi4py
COPY docker/common/install_mpi4py.sh install_mpi4py.sh
RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh
# Install PyTorch
ARG TORCH_INSTALL_TYPE="skip"
COPY docker/common/install_pytorch.sh install_pytorch.sh
RUN bash ./install_pytorch.sh $TORCH_INSTALL_TYPE && rm install_pytorch.sh
by setting ENV to ~/.bashrc.

This allowed me to run the following command to build TensorRT-LLM from source code:

pip install -e . --extra-index-url https://pypi.nvidia.com

The building process is very fast, which does not look right, because it usually takes 40 minutes for build_wheel.py to build everything.

After the building, pip list shows that tensorrt-llm is installed.

$ pip list | grep tensorrt
tensorrt                 9.2.0.post12.dev5
tensorrt-llm             0.9.0.dev2024020600 /root/TensorRT-LLM

However, importing it would error:

$ pythonon3 -c 'import tensorrt_llm'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/TensorRT-LLM/tensorrt_llm/__init__.py", line 44, in <module>
    from .hlapi.llm import LLM, ModelConfig
  File "/root/TensorRT-LLM/tensorrt_llm/hlapi/__init__.py", line 1, in <module>
    from .llm import LLM, ModelConfig
  File "/root/TensorRT-LLM/tensorrt_llm/hlapi/llm.py", line 17, in <module>
    from ..executor import (GenerationExecutor, GenerationResult,
  File "/root/TensorRT-LLM/tensorrt_llm/executor.py", line 11, in <module>
    import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

Expected behavior

My project requires me to build the main branch of TensorRT-LLM. It would be great if pip install could work, so I could declare TensorRT-LLM as a dependency in my project's pyproject.toml file.

actual behavior

I had to build TensorRT-LLM by invoking build_wheel.py as in https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/build_from_source.md#build-tensorrt-llm

additional notes

I was able to build vLLM with the CUDA kernels using pip -e .. Not sure if we could take their build setup as a reference.

@wangkuiyi wangkuiyi added the bug Something isn't working label Feb 7, 2024
@TobyGE
Copy link

TobyGE commented Feb 8, 2024

to install the main brach, you can use the following command
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

also check the readme for more detail

@wangkuiyi
Copy link
Contributor Author

That doesn’t work. As described above, the project requires TensorRT-LLM built from the main branch.

@TobyGE
Copy link

TobyGE commented Feb 8, 2024 via email

@wangkuiyi
Copy link
Contributor Author

wangkuiyi commented Feb 8, 2024 via email

@wangkuiyi
Copy link
Contributor Author

I looks like pip install -e . does not automatically trigger the buiding of the Python binding of the C++ runtime.

@jdemouth-nvidia
Copy link
Collaborator

@Shixiaowei02 , can you help with that issue, please?

@jdemouth-nvidia
Copy link
Collaborator

@wangkuiyi , for your information, @Shixiaowei02 is based in China. It means that he won't be able to work on this issue before the end of the break for the Chinese New Year.

@wangkuiyi
Copy link
Contributor Author

Thank you @jdemouth-nvidia and @Shixiaowei02 ! No rush please. It is totally fine after the lunar new year.

@Shixiaowei02
Copy link
Collaborator

I am working on fixing this issue now. Thanks for your support!

@ekagra-ranjan
Copy link

Thanks! I am also facing this issue.

@andyluo7
Copy link

I am facing the same issue.

@Shixiaowei02
Copy link
Collaborator

Can you use these two commands to temporarily bypass this issue? We will fix this issue in the near future and synchronize it to the main branch. Thank you! @wangkuiyi

python3 scripts/build_wheel.py --trt_root /usr/local/tensorrt
pip3 install -e .

@lifelongeeek
Copy link

lifelongeeek commented Mar 2, 2024

After build wheel & editable install, I still got the same error

    import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

@Shixiaowei02
Copy link
Collaborator

Shixiaowei02 commented Apr 11, 2024

Currently, the calling relationship between build_wheel.py and setup.py is inverted, resulting in incomplete installation when users run pip install -e .. Meanwhile, setup.py has been deprecated, so give a friendlier error as a stopgap here. We will come back and refactor when we have the bandwidth later. Thank you!

@felixslu
Copy link

felixslu commented Apr 16, 2024

a friendlier error
@Shixiaowei02 I have got this error in recently released tensorrt-llm v0.9.0. ,please give me some advice to fix it ,tks!

`python3 -c "import tensorrt_llm; print(tensorrt_llm.version)"
Traceback (most recent call last):
File "/opt/workspace/TensorRT-LLM_v0.9.0/tensorrt_llm/init.py", line 39, in
import tensorrt_llm.bindings # NOQA
ImportError: /opt/workspace/TensorRT-LLM_v0.9.0/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK12tensorrt_llm8executor25SpeculativeDecodingConfig22getAcceptanceThresholdEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/opt/workspace/TensorRT-LLM_v0.9.0/tensorrt_llm/init.py", line 41, in
raise ImportError(
ImportError: Import of the bindings module failed. Please check the package integrity. If you are attempting to use the pip development mode (editable installation), please execute build_wheels.py first, and then run `pip install -e .``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants