Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] remove hard dependency on 'scikit-learn', fix minimal runtime dependencies #5942

Merged
merged 1 commit into from
Jun 24, 2023

Conversation

jameslamb
Copy link
Collaborator

Today, pip install lightgbm absolutely requires that you're able to also install numpy, scipy, and scikit-learn (and therefore all of their recursive dependencies).

This PR proposes removing the hard requirement on scikit-learn, so that users who aren't using the lightgbm.sklearn interface don't have to pay the download time, disk usage, and incompatibility-risk cost of having to install scikit-learn.

scikit-learn is already used conditionally throughout the package...

"""sklearn"""
try:
from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin

if not SKLEARN_INSTALLED:
raise LightGBMError('scikit-learn is required for lightgbm.sklearn. '
'You must install scikit-learn and restart your session to use this module.')

... so this is only a change to the Python package's dependency metadata and doesn't require any changes to the library code.

While doing this, I also discovered that the changes from #5759 broke the required dependencies, such that pip install lightgbm wouldn't automatically install numpy and scipy if they weren't present. This PR fixes that as well.

How I tested this

Built the wheel (just using --nomp to make compilation a little faster, it's irrelevant to this test):

docker run \
    --rm \
    -v $(pwd):/opt/lgb-build \
    -w /opt/lgb-build \
    python:3.11 \
    sh ./build-python.sh bdist_wheel --nomp

Confirmed that pip install lightgbm pulls in just numpy and scipy

docker run \
    --rm \
    -v $(pwd):/opt/lgb-build \
    -w /opt/lgb-build \
    python:3.11 \
    /bin/bash -c "pip install --find-links=./dist 'lightgbm' && pip freeze"
lightgbm==3.3.5.99
numpy==1.25.0
scipy==1.10.1

Confirmed that pip install lightgbm[scikit-learn] works, and pulls in numpy, scikit-learn, scipy, and all of scikit-learn's other dependencies.

docker run \
    --rm \
    -v $(pwd):/opt/lgb-build \
    -w /opt/lgb-build \
    -it python:3.11 \
    /bin/bash -c "pip install --find-links=./dist 'lightgbm[scikit-learn]' && pip freeze"
joblib==1.2.0
lightgbm==3.3.5.99
numpy==1.25.0
scikit-learn==1.2.2
scipy==1.10.1
threadpoolctl==3.1.0

Notes for Reviewers

I called this breaking because if anyone is currently relying on pip install lightgbm pulling in scikit-learn (e.g. they aren't specifying the scikit-learn dependency separately), their setup will break when upgrading to lightgbm 4.0. I think that's acceptable in exchange for making it possible to build a lighter-weight, less-risky-to-build, Python environment using just lightgbm, numpy, and scipy.

I got the idea for this by observing that xgboost does the same thing

https://github.com/dmlc/xgboost/blob/54da4b31856625e9cca1848e1aa8ab8bf584e5fe/python-package/pyproject.toml#L30-L33

https://github.com/dmlc/xgboost/blob/54da4b31856625e9cca1848e1aa8ab8bf584e5fe/python-package/pyproject.toml#L41

@jameslamb jameslamb merged commit 9edea60 into master Jun 24, 2023
@jameslamb jameslamb deleted the scikit-learn-extra branch June 24, 2023 01:17
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants