Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: decythonize configuration space #321

Merged
merged 4 commits into from
Jun 13, 2023

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Apr 28, 2023

Unreviewable - Just the summary to read

This PR essentially converts configuration_space.pyx and util.pyx into their equivalent .py files. This was done at no cost according to our benchmark (see bottom). This lets Language Servers, the "smarts" behind most editors actually be able to auto-complete, jump to definition and show docs etc...

A benefit of making the code as much python as possible is for an easier time for contributions, either ourselves or others. Tooling is getting smarter and better but it does not interop with cython.


Part of this also ended up just factoring and cleaning up configuration_space.py into two seperate files, namely configuration_space.py and configuration.py. I went through and clean up a lot of old school string formatting, turned the errors into reusable rather than copied strings and just some small non-functional changes cleanup.


While going through this code, I noticed a lot of redundancy with respect to methods available and behaviours. For example, take a look at #265.

As both ConfigurationSpace and Configuration act as mappings from str -> Hyperparameter and str -> Value respectively, interfacing with them as mappings should be the prefered way to do so. Mappings are by their interface, non-mutable and so this should remove unintended side effects while keeping behaviour similar to objects that people are familiar with.

# ConfigurationSpace
    # ------------ Marked Deprecated --------------------
    # Probably best to only remove these once we actually
    # make some other breaking changes
    # * Search `Marked Deprecated` to find others

    def get_hyperparameter(self, name: str) -> Hyperparameter:
        """Hyperparameter from the space with a given name.

        Parameters
        ----------
        name : str
            Name of the searched hyperparameter

        Returns
        -------
        :ref:`Hyperparameters`
            Hyperparameter with the name ``name``
        """
        warnings.warn(
            "Prefer `space[name]` over `get_hyperparameter`",
            DeprecationWarning,
            stacklevel=2,
        )
        return self[name]

    def get_hyperparameters(self) -> list[Hyperparameter]:
        """All hyperparameters in the space.

        Returns
        -------
        list(:ref:`Hyperparameters`)
            A list with all hyperparameters stored in the configuration space object
        """
        warnings.warn(
            "Prefer using `list(space.values())` over `get_hyperparameters`",
            DeprecationWarning,
            stacklevel=2,
        )
        return list(self._hyperparameters.values())

    def get_hyperparameters_dict(self) -> dict[str, Hyperparameter]:
        """All the ``(name, Hyperparameter)`` contained in the space.

        Returns
        -------
        dict(str, :ref:`Hyperparameters`)
            An OrderedDict of names and hyperparameters
        """
        warnings.warn(
            "Prefer using `dict(space)` over `get_hyperparameters_dict`",
            DeprecationWarning,
            stacklevel=2,
        )
        return self._hyperparameters.copy()

    def get_hyperparameter_names(self) -> list[str]:
        """Names of all the hyperparameter in the space.

        Returns
        -------
        list(str)
            List of hyperparameter names
        """
        warnings.warn(
            "Prefer using `list(space.keys())` over `get_hyperparameter_names`",
            DeprecationWarning,
            stacklevel=2,
        )
        return list(self._hyperparameters.keys())

    # ---------------------------------------------------
 
 
# Configuration 
    # ------------ Marked Deprecated --------------------
    # Probably best to only remove these once we actually
    # make some other breaking changes
    # * Search `Marked Deprecated` to find others
    def get_dictionary(self) -> dict[str, Any]:
        """A representation of the :class:`~ConfigSpace.configuration_space.Configuration`
        in dictionary form.

        Returns
        -------
        dict
            Configuration as dictionary
        """
        warnings.warn(
            "`Configuration` act's like a dictionary."
            " Please use `dict(config)` instead of `get_dictionary`"
            " if you explicitly need a `dict`",
            DeprecationWarning,
            stacklevel=2,
        )
        return dict(self)

    # ---------------------------------------------------

Beyond this point was then fixing up errors due to mypy and flake8. I updated the stack to my preffered trio black, ruff, mypy, which revealed some subtle bugs in the code (e.g. non-existing variables with edge-case loops) and some slight mis-typings. Since users can now utilize the types, I went through and fixed these.


Lastly, I enabled dependabot and @Neonkraft, you've been selected as tribute to hit merge on it's miny PR's to update the github workflows :)


Moving forward, I'm really not sure these benchmarks are enough to tackle any work on the actual de-cythonizing of the hyperparameters. In an ideal world, we simply move the expensive hot-loop of sampling to cython and have the rest of the class be normal ass python, but I'm not sure if we'll notice any degradation doing so.


Loose Benchmark dump

# Original
python scripts/benchmark_sampling.py
###
/home/skantify/code/ConfigSpace/test/test_searchspaces/auto-sklearn_2017_11_17.pcs
Average time sampling 100 configurations 0.02957932949066162
Average time retrieving a nearest neighbor 0.004668627050187853
Average time checking one configuration 0.000151949600314924

# Python ConfigSpace
python scripts/benchmark_sampling.py
###
/home/skantify/code/ConfigSpace/test/test_searchspaces/auto-sklearn_2017_11_17.pcs
Average time sampling 100 configurations 0.02950618267059326
Average time retrieving a nearest neighbor 0.004467318058013916
Average time checking one configuration 0.00014262521543179817

# Changes to configuration.py
python scripts/benchmark_sampling.py
###
/home/skantify/code/ConfigSpace/test/test_searchspaces/auto-sklearn_2017_11_17.pcs
Average time sampling 100 configurations 0.029507386684417724
Average time retrieving a nearest neighbor 0.004249808523390028
Average time checking one configuration 0.000141533725826923

# More cleanup in configuration.py
python scripts/benchmark_sampling.py
###
/home/skantify/code/ConfigSpace/test/test_searchspaces/auto-sklearn_2017_11_17.pcs
Average time sampling 100 configurations 0.029939210414886473
Average time retrieving a nearest neighbor 0.004236165417565239
Average time checking one configuration 0.0001415610185918961

# More or less final cleanup
python scripts/benchmark_sampling.py
###
/home/skantify/code/ConfigSpace/test/test_searchspaces/auto-sklearn_2017_11_17.pcs
Average time sampling 100 configurations 0.02384169101715088
Average time retrieving a nearest neighbor 0.0036894316143459742
Average time checking one configuration 0.00012484226634795653

# After updated tooling fixxes
python scripts/benchmark_sampling.py
###
/home/skantify/code/ConfigSpace/test/test_searchspaces/auto-sklearn_2017_11_17.pcs
Average time sampling 100 configurations 0.02370922565460205
Average time retrieving a nearest neighbor 0.0036389772097269696
Average time checking one configuration 0.00012414814954135508

Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there. I just started having a look into this. Thanks a lot for cleaning the code! I was just wondering whether we should touch networkx at all since these are verbatim copies from the original code, and we don't intend to change anything in these files any more.

@eddiebergman
Copy link
Contributor Author

The only thing touched was formatting. I don't think it matters much except not formatting it just means we need to add some linting rules to ignore it. I don't really mind either way.

Let me know your decision :)

@mfeurer
Copy link
Contributor

mfeurer commented May 11, 2023

The only thing touched was formatting. I don't think it matters much except not formatting it just means we need to add some linting rules to ignore it. I don't really mind either way.

Then let's format them to make our workflow files easier.

Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very partial review number one.

@@ -153,7 +155,8 @@ def Float(
log=log,
meta=meta,
)
elif isinstance(distribution, Normal):

if isinstance(distribution, Normal):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this considered more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's certainly a possible point of debate, I just went with a default of ruff which say it's unnecessary. There are cases where it is more readable in my opinion but here I have no real strong say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConfigSpace numpy issue
2 participants