Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: IntervalIndex intersections are not intersections in many cases #38743

Closed
3 tasks done
ivirshup opened this issue Dec 28, 2020 · 0 comments · Fixed by #38834
Closed
3 tasks done

BUG: IntervalIndex intersections are not intersections in many cases #38743

ivirshup opened this issue Dec 28, 2020 · 0 comments · Fixed by #38834
Labels
Bug Interval Interval data type
Milestone

Comments

@ivirshup
Copy link
Contributor

ivirshup commented Dec 28, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

# Your code here
import pandas._testing as tm

idx_unique = tm.makeIntervalIndex(k=5)
idx_non_unique = idx_unique[[0, 0, 1, 2, 3, 4]]

assert idx_unique.intersection(idx_non_unique).equals(idx_non_unique)
assert idx_non_unique.intersection(idx_unique).equals(idx_non_unique)
# As an implementation note:
assert idx_non_unique._intersection_non_unique(idx_unique).equals(idx_non_unique)

All of these should probably fail

Problem description

Intersections on IntervalIndexes should not repeat duplicated elements, similar to all other Index types.

I came across this while writing tests for #38654

For example, if we defined a test:

def check_intersection_commutative(left, right):
    assert left.intersection(right).equals(right.intersection(left))

@pytest.mark.parametrize("index_maker", tm.index_subclass_makers_generator())
def test_intersection_uniqueness(index_maker):
    if index_make is tm.makeMultiIndex:
        pytest.pass()  # This index maker is bugged
    ind_unique = index_maker(k=5)
    ind_non_unique = idx_unique[[0, 0, 1, 2, 3, 4]]

    check_intersection_commutative(ind_unique, ind_non_unique)
    assert idx_unique.intersection(idx_non_unique).is_unique

Only IntervalIndex would fail (not including the bugged makeMultiIndex).

Expected Output

All of the assertions in the original example should fail.

Output of pd.show_versions()

dev environment
INSTALLED VERSIONS
------------------
commit           : 9f1a41dee0e9167361d9f435b4c3d499d01e415a
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.0.dev0+210.g9f1a41dee0.dirty
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 49.2.0.post20200712
Cython           : 0.29.21
pytest           : 5.4.3
hypothesis       : 5.20.3
sphinx           : 3.1.1
blosc            : None
feather          : None
xlsxwriter       : 1.2.9
lxml.etree       : 4.5.2
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.16.1
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : 1.3.2
fsspec           : 0.7.4
fastparquet      : 0.4.1
gcsfs            : 0.6.2
matplotlib       : 3.2.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : None
pyarrow          : 0.17.1
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.0
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.50.1
1.1.5
INSTALLED VERSIONS
------------------
commit           : b5958ee1999e9aead1938c0bba2b674378807b3d
python           : 3.8.6.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.5
numpy            : 1.19.4
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.3.3
setuptools       : 51.0.0
Cython           : 0.29.21
pytest           : 6.2.1
hypothesis       : None
sphinx           : 3.3.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.5
fastparquet      : 0.4.1
gcsfs            : None
matplotlib       : 3.3.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 2.0.0
pytables         : None
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.4
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.2
xlrd             : 1.2.0
xlwt             : None
numba            : 0.52.0
@ivirshup ivirshup added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 28, 2020
@rhshadrach rhshadrach added Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 28, 2020
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Dec 28, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Dec 30, 2020
@jreback jreback added Interval Interval data type and removed Index Related to the Index class or subclasses labels Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants