Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: #15752 Add test_drop_duplicates for Categorical dtypes #18072

Merged
merged 6 commits into from
Nov 6, 2017
Merged

TST: #15752 Add test_drop_duplicates for Categorical dtypes #18072

merged 6 commits into from
Nov 6, 2017

Conversation

jamestran201
Copy link

@jamestran201 jamestran201 commented Nov 2, 2017

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Nov 2, 2017

Hello @tmnhat2001! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 06, 2017 at 02:47 Hours UTC

@jamestran201 jamestran201 reopened this Nov 2, 2017
@codecov
Copy link

codecov bot commented Nov 2, 2017

Codecov Report

Merging #18072 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18072      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45737    45727      -10     
- Misses       4383     4393      +10
Flag Coverage Δ
#multiple 89.04% <ø> (-0.01%) ⬇️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.41% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...289ad35. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 2, 2017

Codecov Report

Merging #18072 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18072      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45737    45727      -10     
- Misses       4383     4393      +10
Flag Coverage Δ
#multiple 89.04% <ø> (-0.01%) ⬇️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.41% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...289ad35. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 2, 2017

Codecov Report

Merging #18072 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18072      +/-   ##
==========================================
+ Coverage   91.25%   91.26%   +<.01%     
==========================================
  Files         163      163              
  Lines       50120    50122       +2     
==========================================
+ Hits        45737    45743       +6     
+ Misses       4383     4379       -4
Flag Coverage Δ
#multiple 89.07% <ø> (+0.02%) ⬆️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/io/parsers.py 95.5% <0%> (-0.01%) ⬇️
pandas/core/indexes/timedeltas.py 91.19% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.46% <0%> (ø) ⬆️
pandas/core/reshape/merge.py 94.26% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.51% <0%> (ø) ⬆️
pandas/core/indexes/period.py 92.87% <0%> (+0.19%) ⬆️
pandas/plotting/_converter.py 65.2% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...f9c65c7. Read the comment docs.

@jamestran201 jamestran201 reopened this Nov 2, 2017
@pytest.mark.parametrize(
"input1, input2, cat_array",
[
(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so instead of duplicating code like this you can use multiple parametrize
just parametrize on the dtype and construct inside
something like

@pytest.mark.parmetrize(
"consturctor", ['int_', ....])
def test_drop_duplicates_non_bool(self, input):
      input1 = np.array([1, 2, 3, 3], dtype=dtype)
      ....

this will work for datetimes and timedeltas as well (use 'M8' and 'm8')

@jreback jreback added Categorical Categorical Data Type Testing pandas testing functions or related to the test suite labels Nov 2, 2017
@jamestran201
Copy link
Author

Thanks, that makes the test a lot cleaner. While making changes to the test, I noticed an issue with creating a series of datetime64[D] categorical.

>>> import pandas as pd
>>> import numpy as np
>>> cat_array = np.array([1, 2, 3, 4, 5], dtype=np.dtype("datetime64[D]"))
>>> cat_array
array(['1970-01-02', '1970-01-03', '1970-01-04', '1970-01-05', '1970-01-06'], dtype='datetime64[D]')
>>> input1 = np.array([1, 2, 3, 3], dtype=np.dtype("datetime64[D]"))
>>> input1
array(['1970-01-02', '1970-01-03', '1970-01-04', '1970-01-04'], dtype='datetime64[D]')
>>> tc1 = pd.Series(pd.Categorical(input1, categories=cat_array, ordered=True))
>>> tc1
0   NaT
1   NaT
2   NaT
3   NaT
dtype: category
Categories (5, datetime64[ns]): [1970-01-02 < 1970-01-03 < 1970-01-04 < 1970-01-05 < 1970-01-06]
pd.show_versions():
INSTALLED VERSIONS
------------------
commit: e1dc3c9
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0.dev0+41.ge1dc3c9
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.27.2
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Nov 3, 2017

@tmnhat2001 hmm that looks like a bug.

can you mark that xfail, then we can merge this and can address in a folllowup.

@jreback
Copy link
Contributor

jreback commented Nov 3, 2017

@tmnhat2001 I think this is the issue: #7996

@jamestran201
Copy link
Author

Yeah, that looks like it. There's no issue if datetime64[h] is used. Can we use that instead?

@@ -797,6 +797,96 @@ def test_set_categories_inplace(self):
cat.set_categories(['a', 'b', 'c', 'd'], inplace=True)
tm.assert_index_equal(cat.categories, pd.Index(['a', 'b', 'c', 'd']))

@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an xfail with datetime64[D], something like this

"timedelta64[h]", pytest.param('datetime64[D]', marks=pytest.mark.xfail(
                         reason='write the issue number'))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, just added it

@jreback jreback added this to the 0.22.0 milestone Nov 6, 2017
@jreback jreback merged commit 9c49e64 into pandas-dev:master Nov 6, 2017
@jreback
Copy link
Contributor

jreback commented Nov 6, 2017

thanks @tmnhat2001

@jreback
Copy link
Contributor

jreback commented Nov 6, 2017

certainly take PR's for the fix for xfail here or any other issues you'd like to work on!

@jamestran201
Copy link
Author

thanks for helping!

@jamestran201 jamestran201 deleted the issue15752-categorical branch November 7, 2017 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants