TST: #15752 Add test_drop_duplicates for Categorical dtypes #18072

jamestran201 · 2017-11-02T03:54:27Z

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2017-11-02T03:54:30Z

Hello @tmnhat2001! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 06, 2017 at 02:47 Hours UTC

codecov · 2017-11-02T10:35:13Z

Codecov Report

Merging #18072 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18072      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45737    45727      -10     
- Misses       4383     4393      +10

Flag	Coverage Δ
#multiple	`89.04% <ø> (-0.01%)`	⬇️
#single	`40.32% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/core/indexes/datetimes.py	`95.41% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...289ad35. Read the comment docs.

codecov · 2017-11-02T10:35:15Z

Codecov Report

Merging #18072 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18072      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45737    45727      -10     
- Misses       4383     4393      +10

Flag	Coverage Δ
#multiple	`89.04% <ø> (-0.01%)`	⬇️
#single	`40.32% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/core/indexes/datetimes.py	`95.41% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...289ad35. Read the comment docs.

codecov · 2017-11-02T10:35:30Z

Codecov Report

Merging #18072 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18072      +/-   ##
==========================================
+ Coverage   91.25%   91.26%   +<.01%     
==========================================
  Files         163      163              
  Lines       50120    50122       +2     
==========================================
+ Hits        45737    45743       +6     
+ Misses       4383     4379       -4

Flag	Coverage Δ
#multiple	`89.07% <ø> (+0.02%)`	⬆️
#single	`40.32% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/io/parsers.py	`95.5% <0%> (-0.01%)`	⬇️
pandas/core/indexes/timedeltas.py	`91.19% <0%> (ø)`	⬆️
pandas/core/indexes/category.py	`97.46% <0%> (ø)`	⬆️
pandas/core/reshape/merge.py	`94.26% <0%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.51% <0%> (ø)`	⬆️
pandas/core/indexes/period.py	`92.87% <0%> (+0.19%)`	⬆️
pandas/plotting/_converter.py	`65.2% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15fa4bd...f9c65c7. Read the comment docs.

jreback · 2017-11-02T11:38:12Z

pandas/tests/test_categorical.py

+    @pytest.mark.parametrize(
+        "input1, input2, cat_array",
+        [
+            (


so instead of duplicating code like this you can use multiple parametrize
just parametrize on the dtype and construct inside
something like

@pytest.mark.parmetrize( "consturctor", ['int_', ....]) def test_drop_duplicates_non_bool(self, input): input1 = np.array([1, 2, 3, 3], dtype=dtype) ....

this will work for datetimes and timedeltas as well (use 'M8' and 'm8')

jamestran201 · 2017-11-03T03:01:17Z

Thanks, that makes the test a lot cleaner. While making changes to the test, I noticed an issue with creating a series of datetime64[D] categorical.

>>> import pandas as pd
>>> import numpy as np
>>> cat_array = np.array([1, 2, 3, 4, 5], dtype=np.dtype("datetime64[D]"))
>>> cat_array
array(['1970-01-02', '1970-01-03', '1970-01-04', '1970-01-05', '1970-01-06'], dtype='datetime64[D]')
>>> input1 = np.array([1, 2, 3, 3], dtype=np.dtype("datetime64[D]"))
>>> input1
array(['1970-01-02', '1970-01-03', '1970-01-04', '1970-01-04'], dtype='datetime64[D]')
>>> tc1 = pd.Series(pd.Categorical(input1, categories=cat_array, ordered=True))
>>> tc1
0   NaT
1   NaT
2   NaT
3   NaT
dtype: category
Categories (5, datetime64[ns]): [1970-01-02 < 1970-01-03 < 1970-01-04 < 1970-01-05 < 1970-01-06]

pd.show_versions():

INSTALLED VERSIONS
------------------
commit: e1dc3c9
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0.dev0+41.ge1dc3c9
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.27.2
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

jreback · 2017-11-03T10:28:57Z

@tmnhat2001 hmm that looks like a bug.

can you mark that xfail, then we can merge this and can address in a folllowup.

jreback · 2017-11-03T12:58:54Z

@tmnhat2001 I think this is the issue: #7996

jamestran201 · 2017-11-04T00:26:13Z

Yeah, that looks like it. There's no issue if datetime64[h] is used. Can we use that instead?

jreback · 2017-11-04T17:10:46Z

pandas/tests/test_categorical.py

@@ -797,6 +797,96 @@ def test_set_categories_inplace(self):
        cat.set_categories(['a', 'b', 'c', 'd'], inplace=True)
        tm.assert_index_equal(cat.categories, pd.Index(['a', 'b', 'c', 'd']))

+    @pytest.mark.parametrize(


can you add an xfail with datetime64[D], something like this

"timedelta64[h]", pytest.param('datetime64[D]', marks=pytest.mark.xfail( reason='write the issue number'))

yup, just added it

jreback · 2017-11-06T13:35:31Z

thanks @tmnhat2001

jreback · 2017-11-06T13:35:54Z

certainly take PR's for the fix for xfail here or any other issues you'd like to work on!

jamestran201 · 2017-11-07T03:16:48Z

thanks for helping!

…pandas-dev#18072)

Add test_drop_duplicates for Categorical dtypes

e3cdb6e

Fix PEP8 issue

289ad35

jamestran201 closed this Nov 2, 2017

jamestran201 reopened this Nov 2, 2017

jamestran201 closed this Nov 2, 2017

jamestran201 reopened this Nov 2, 2017

jreback requested changes Nov 2, 2017

View reviewed changes

jreback added Categorical Categorical Data Type Testing pandas testing functions or related to the test suite labels Nov 2, 2017

Parameterize dtypes

e1dc3c9

Fix PEP8 issue

20c41e3

Fix PEP8 issue

7b7e577

jreback reviewed Nov 4, 2017

View reviewed changes

Use datetime64[D] and mark it as xfail

f9c65c7

jreback added this to the 0.22.0 milestone Nov 6, 2017

jreback approved these changes Nov 6, 2017

View reviewed changes

jreback merged commit 9c49e64 into pandas-dev:master Nov 6, 2017

jamestran201 deleted the issue15752-categorical branch November 7, 2017 04:01

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

TST: pandas-dev#15752 Add test_drop_duplicates for Categorical dtypes (…

c94e5f0

…pandas-dev#18072)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: #15752 Add test_drop_duplicates for Categorical dtypes #18072

TST: #15752 Add test_drop_duplicates for Categorical dtypes #18072

jamestran201 commented Nov 2, 2017 •

edited

Loading

pep8speaks commented Nov 2, 2017 •

edited

Loading

codecov bot commented Nov 2, 2017

codecov bot commented Nov 2, 2017

codecov bot commented Nov 2, 2017 •

edited

Loading

jreback Nov 2, 2017

jamestran201 commented Nov 3, 2017

jreback commented Nov 3, 2017

jreback commented Nov 3, 2017

jamestran201 commented Nov 4, 2017

jreback Nov 4, 2017

jamestran201 Nov 6, 2017

jreback commented Nov 6, 2017

jreback commented Nov 6, 2017

jamestran201 commented Nov 7, 2017

TST: #15752 Add test_drop_duplicates for Categorical dtypes #18072

TST: #15752 Add test_drop_duplicates for Categorical dtypes #18072

Conversation

jamestran201 commented Nov 2, 2017 • edited Loading

pep8speaks commented Nov 2, 2017 • edited Loading

Comment last updated on November 06, 2017 at 02:47 Hours UTC

codecov bot commented Nov 2, 2017

Codecov Report

codecov bot commented Nov 2, 2017

Codecov Report

codecov bot commented Nov 2, 2017 • edited Loading

Codecov Report

jreback Nov 2, 2017

Choose a reason for hiding this comment

jamestran201 commented Nov 3, 2017

jreback commented Nov 3, 2017

jreback commented Nov 3, 2017

jamestran201 commented Nov 4, 2017

jreback Nov 4, 2017

Choose a reason for hiding this comment

jamestran201 Nov 6, 2017

Choose a reason for hiding this comment

jreback commented Nov 6, 2017

jreback commented Nov 6, 2017

jamestran201 commented Nov 7, 2017

jamestran201 commented Nov 2, 2017 •

edited

Loading

pep8speaks commented Nov 2, 2017 •

edited

Loading

codecov bot commented Nov 2, 2017 •

edited

Loading