Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Circular Imports with pandas.core.reshape.concat #29133

Closed
wants to merge 11 commits into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Oct 21, 2019

While working on #29124 I found a lot of general uses for this, but had to keep importing in the function body to prevent circular imports. By switching the isinstance checks to using ABC classes and moving the import of Index objects those circular imports can mostly be done away with

The only thing left that was causing circular imports was in the Categorical space. Can try to figure that out later, though cc @TomAugspurger in case that's a path that has already been gone down

@WillAyd WillAyd added the Refactor Internal refactoring of code label Oct 21, 2019
if isinstance(sample, Series):
# TODO: Should this really require a class import?
"""
if isinstance(sample, ABCSeries):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe a relic of the old panel days but as of now it seems a little overkill to import the generic class definition to resolve "index" and "columns" to their respective axis numbers. For now I just did this in the function body directly below commented code, but this probably belongs somewhere else as a utility function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intuition is that you're right here. A lot of _axis_reversed stuff feels leftover

@@ -7134,7 +7133,6 @@ def _join_compat(
self, other, on=None, how="left", lsuffix="", rsuffix="", sort=False
):
from pandas.core.reshape.merge import merge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the merge import also be moved up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was hoping so but doesn't appear to be the case. I think its intertwined with some of the Categorical stuff so might make sense if / when I can resolve that space

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great.

Categorical is used in a lot of places where I think/hope a less high-powered tool could be used, and doing so couple simplify the dependency structure of the code base quite a bit.

@jbrockmendel
Copy link
Member

pending green, LGTM

@TomAugspurger
Copy link
Contributor

Can try to figure that out later, though cc @TomAugspurger in case that's a path that has already been gone down

I don't recall anyone trying it before.

axis = DataFrame._get_axis_number(axis)
else:
axis = sample._get_axis_number(axis)
"""
if not isinstance(axis, int):
axis = {"index": 0, "columns": 1}[axis]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you need the alias "rows" here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha yep - nice little surprise there. Actually taking a look at the axis handling atm; might clean up and do as a pre-cursor.

Copy link
Member Author

@WillAyd WillAyd Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to sit on this PR for a bit. Somewhat of a rabbit hole but I think will make sense to get traction on #29140 and simplify the axis handling instead of doing this as a one-off, especially since there's not much of a rush here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is fixing this more complicated than just adding the "rows" alias to that dict? runtime imports of concat are a non-trivial pain point for me when trying to reason about code, so i'd like to see this go through

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's it. I was hoping to do a more general utility function for axis validation (see #29140) but I can just add this here as well if you find it helpful. Not the worst cruft...

@WillAyd
Copy link
Member Author

WillAyd commented Oct 23, 2019

FYI not sure if anyone has seen this but here are the official Python docs on dealing with circular imports:

https://docs.python.org/3/faq/programming.html#how-can-i-have-modules-that-mutually-import-each-other

We tend to violate Guido's recommendations a lot (ex: doing from pandas.core.base import PandasObject instead of import pandas.core.base as base ... base.PandasObject but also defining class variables that aren't built ins or constant values.

Something to consider...

@jbrockmendel
Copy link
Member

@WillAyd can you re-push. For some reason re-starting azure builds doesn't always take

@jbrockmendel
Copy link
Member

needs rebase. the azure failure hopefully unrelated

@WillAyd
Copy link
Member Author

WillAyd commented Nov 2, 2019

Yea still looking at this locally but trying to get a more comprehensive fix in, which could include categorical. The problem is that this module can spur a lot of imports from the index area which import from pandas.core.base which import from pandas.core.indexes ... so a lot of the circular imports I think can be traced back to that. Might have a few pre-cursor PRs to make this a full fledged change

@jbrockmendel
Copy link
Member

which could include categorical.

that would be really nice if you can pull it off

@jbrockmendel
Copy link
Member

which could include categorical.

I'm looking at some related refactoring*, wondering what you have in mind for Categorical, in particular if .categories would cease to be an Index.

* Roughly, trying to get core.algorithms, core.nanops, and maybe core.missing to not depend on Index/Series etal

@WillAyd
Copy link
Member Author

WillAyd commented Nov 3, 2019

I'm looking at some related refactoring*, wondering what you have in mind for Categorical, in particular if .categories would cease to be an Index.

Nothing concrete. Categorical might even be a red herring and the import of ExtensionArray from pandas.core.arrays really be the problem, as pandas.core.arrays imports things like Categorical before base is fully defined

I'm looking at this off and on so if it's something you are fully interested I certainly don't mind if you are motived to take over some aspects of this

@WillAyd
Copy link
Member Author

WillAyd commented Nov 7, 2019

Closing to clear queue; will pick back up when I get a little more free time

@WillAyd WillAyd closed this Nov 7, 2019
@WillAyd WillAyd deleted the circ-import-fix branch January 16, 2020 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants