Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] prefix all internal object names with "_" #5313

Closed
42 tasks done
Tracked by #5153
jameslamb opened this issue Jun 21, 2022 · 12 comments · Fixed by #5947
Closed
42 tasks done
Tracked by #5153

[python-package] prefix all internal object names with "_" #5313

jameslamb opened this issue Jun 21, 2022 · 12 comments · Fixed by #5947

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented Jun 21, 2022

Short Description

I'm opening this RFC to propose the following:

all objects in the Python package which are intended to be internal should have names beginning with _

The convention "anything starting with _ is internal" is common in Python programming. If LightGBM's Python package followed that convention consistently, it would simplify LightGBM's position on which parts of the Python package are considered "internal" and subject to breaking changes in any release.

Detailed Description

In pursuit of the v4.0.0 release (#5153), we have been accepting breaking changes which simplify the project and reduce its maintenance burden, so that limited maintainer time can be focused on providing a good experience with LightGBM's main functionality.

I think "all internal objects should begin with _" is one such type of change. Making it clearer which parts of the Python package are considered internal reduces the surface area of the public API which users need to consider, and increases contributors' flexibility to change implementations without breaking user code.

R packages support a "namespace" concept (see "Writing R extensions"), which clearly separates exported and internal objects, and I think that mechanism has been really helpful. The closest thing that Python has, to my knowledge, is the __all__ member for modules (https://docs.python.org/3/tutorial/modules.html#importing-from-a-package). __all__ controls the behavior of "star imports".

For example, if you run the following:

from lightgbm import *
dir()

the only LightGBM objects you'll see are those in the package's __all__.

This mechanism is a good start, but prefixing all internal objects with _ would be an even stronger signal about which things are internal. It would also improve the consistency of the Python package's code.

Specific Proposal

expand to see original proposal (click me)

I am proposing that the following be prefixed with a _ and treated as internal:

Considerations for Implementation

Check that any name being modified is only used at runtime and not part of the state for objects like Booster or LGBMClassifier which might be pickled (since pickle only stores references to imports and expects to be able to reference them when running load()).

Status: Accepted

@StrikerRUS @jmoralez @shiyu1994 @guolinke Whenever you have time, will you please comment here on whether or not you support this proposal? Thank you for your time and consideration.

This was originally opened as a request for comment, but it's been accepted and we're now accepting contributions to make more of the Python package's namespace private?

How to Contribute

Anyone is welcome to submit a pull request to help contribute to this effort.

Please limit pull requests to 1 function/method/module per pull request, to make them easier to review.

When you open pull requests, please put Contributes to #5313 in the description.

Add __all__ entries to modules.

Prefix objects, functions, class attributes with _

@jmoralez
Copy link
Collaborator

Thanks for writing this up, I support it. Are we also adding the __all__? I think it's good having both.

@jameslamb
Copy link
Collaborator Author

Are we also adding the __all__? I think it's good having both.

There is already a __all__ controlling what happens when you from lightgbm import *.

__all__ = ['Dataset', 'Booster', 'CVBooster', 'Sequence',
'register_logger',
'train', 'cv',
'LGBMModel', 'LGBMRegressor', 'LGBMClassifier', 'LGBMRanker',
'DaskLGBMRegressor', 'DaskLGBMClassifier', 'DaskLGBMRanker',
'log_evaluation', 'record_evaluation', 'reset_parameter', 'early_stopping',
'plot_importance', 'plot_split_value_histogram', 'plot_metric', 'plot_tree', 'create_tree_digraph']

Are you talking about also adding a __all__ to each module, for example one in python-package/lightgbm/sklearn.py that would limit what gets imported by from lightgbm.sklearn import *?

I don't know enough about this to know whether or not that would work. Like I'm not sure if __all__ as a concept is specific to __init__.py files. But I'd be interested in trying it!

@jmoralez
Copy link
Collaborator

jmoralez commented Jun 24, 2022

Yeah I meant at the module level, I believe it would be very useful for callback for example.

Edit: Adding reference

The public names defined by a module are determined by checking the module’s namespace for a variable named _all_

https://docs.python.org/3/reference/simple_stmts.html#import

Edit2:
On that same paragraph it says:

If _all_ is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character ('_')

So we can set __all__ on the modules as a first step and that would avoid importing the current internal functions that aren't prefixed by _.

@jameslamb
Copy link
Collaborator Author

Ok nice! I hadn't thought of that. Totally support that idea. I think that new public classes and functions are rare enough in this project that the communication benefit of doing that is worth the slight additional maintenance burden.

@StrikerRUS
Copy link
Collaborator

I'm +1 for

all objects in the Python package which are intended to be internal should have names beginning with _

Also I'm proposing rethinking which properties of Dataset, Booster and other public classes should be public.
For example, I don't think that pandas_categorical or handle properties should be public.

self.pandas_categorical = None

self.handle = None

@guolinke
Copy link
Collaborator

I'm +1

@shiyu1994
Copy link
Collaborator

I'm +1 for this proposal! Thanks.

@jameslamb
Copy link
Collaborator Author

Ok thanks all! I'll update the description and convert this from an RFC to a regular maintenance issue, similar to #3756 .

@jameslamb jameslamb changed the title [RFC] [python] prefix all internal object names with "_" [python-package] prefix all internal object names with "_" Jul 14, 2022
@jameslamb
Copy link
Collaborator Author

I've updated the description and title to describe how to contribute on this initiative. @StrikerRUS @jmoralez feel free to edit it with any changes you'd like!

For example, I didn't list out any class attributes yet (but agree with the idea in #5313 (comment)).

@jameslamb
Copy link
Collaborator Author

I'm glad we originally opened this as a good first issue and that some contributors made their first PRs into the project working on it!

But I'm going to take this over and finish it. It's been open for 6 months, and I'd like to get these changes in for v4.0.0 (#5153).

@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 30, 2022

The last piece of this is @StrikerRUS 's proposal to also make some more of the class attributes for public classes private.

Here's my proposal for which class attributes I think we should make private in the v4.0.0 release:

  • Booster.handle
    • ctypes pointer to the Booster object on the C++ side. Keeping this private would allow changing the implementation for how that reference is managed
  • Booster.network
  • Dataset.handle
    • ctypes pointer to the Dataset object on the C++ side. Keeping this private would allow changing the implementation for how that reference is managed
  • Dataset.params_back_up
  • Dataset.need_slice
  • no changes to lightgbm.dask classes
  • no changes to lightgbm.sklearn classes
  • no changes to lightgbm.basic.CVBooster

@jmoralez @guolinke @StrikerRUS @shiyu1994 @btrotta what do you think?

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants