Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GOSS boosting error on GPU H100 #6811

Open
SergeevVladislav opened this issue Feb 2, 2025 · 1 comment
Open

GOSS boosting error on GPU H100 #6811

SergeevVladislav opened this issue Feb 2, 2025 · 1 comment
Labels

Comments

@SergeevVladislav
Copy link

SergeevVladislav commented Feb 2, 2025

Description

I have encountered the following error while training binary classification task with lightgbm 4.5.0 on H100 and device="cuda":

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pywrapper_utils/run_thread/full_batch_run_thread.py", line 47, in _execute_user_function
result = self.user_main_function(**kwargs)
File "/opt/module/source/main.py", line 31, in main
model.perform_all_calculations()
File "/opt/module/source/model/feature_selector.py", line 61, in perform_all_calculations
selected_features: List[Tuple] = self.select_features(base_model, kfold)
File "/opt/module/source/model/feature_selector.py", line 84, in select_features
model.fit(X_train, y_train)
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/sklearn.py", line 1284, in fit
super().fit(
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/sklearn.py", line 955, in fit
self._Booster = train(
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/engine.py", line 307, in train
booster.update(fobj=fobj)
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/basic.py", line 4135, in update
_safe_call(
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/basic.py", line 296, in _safe_call
raise LightGBMError(_LIB.LGBM_GetLastError().decode("utf-8"))
lightgbm.basic.LightGBMError: [CUDA] invalid argument /tmp/pip-install-9rgzugd6/lightgbm_37941d8e64514c0e844ef71f72ef6b9c/src/boosting/goss.hpp 63

Environment info

python3.9
cuda 12.4
scikit-learn==1.6.1

Command(s) you used to install LightGBM

pip install lightgbm --config-settings=cmake.define.USE_CUDA=ON
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM.

Are you able to share a minimal, reproducible example? Or at least, the exact parameters you passed to LightGBM?

The LightGBM functions you use and confirguration you pass to them changes what underlying code is called. Providing details like that reduces the effort required to investigate this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants