Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Index.fillna #11343

Merged
merged 1 commit into from
Nov 13, 2015
Merged

ENH: Add Index.fillna #11343

merged 1 commit into from
Nov 13, 2015

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Oct 16, 2015

Closes #10089.

  • value can only accept scalar.
  • MultiIndex.fillna raises NotImplementedError because isnull is not defined for MI.
  • Moved hasnans and related properties to base Index.

@sinhrks sinhrks added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Oct 16, 2015
@sinhrks sinhrks added this to the 0.17.1 milestone Oct 16, 2015
if len(index) == 0:
pass
elif isinstance(index, MultiIndex):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert NotImplementedError here

@sinhrks
Copy link
Member Author

sinhrks commented Oct 16, 2015

Thanks, did the suggested changes.

A test fails in Series.fillna which uses DatetimeIndex.putmask. Looks #11329 fixes the same part, I'm leaving it now.

@sinhrks sinhrks changed the title ENH: Add Index.fillna (WIP) ENH: Add Index.fillna Oct 16, 2015
@sinhrks sinhrks force-pushed the idx_fillna branch 2 times, most recently from 163b6cc to c08ac92 Compare October 17, 2015 23:24
@sinhrks
Copy link
Member Author

sinhrks commented Oct 18, 2015

Rebased, and now green.

@sinhrks sinhrks changed the title (WIP) ENH: Add Index.fillna ENH: Add Index.fillna Oct 18, 2015
try:
np.putmask(values, mask, value)
return self._shallow_copy(values)
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this catches TypeError as well it can be much more generic

@sinhrks
Copy link
Member Author

sinhrks commented Oct 25, 2015

Thanks, simplified the logic based on the comments.

return np.array([], dtype=np.int64)

@cache_readonly
def hasnans(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is override core/base.py, but do we use that anylonger? (I don't think Series uses this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't notice base.py. Should remove it from base.py because Series can be changed inplace?

--------
numpy.ndarray.putmask
"""
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this any longer once you call _assert_can_do_op in Index.putmask

@sinhrks sinhrks force-pushed the idx_fillna branch 4 times, most recently from 687ce6a to bf1f842 Compare October 31, 2015 01:14
@sinhrks sinhrks changed the title ENH: Add Index.fillna (WIP)ENH: Add Index.fillna Oct 31, 2015
@sinhrks sinhrks force-pushed the idx_fillna branch 2 times, most recently from 6ea192b to b20820a Compare October 31, 2015 03:16
@sinhrks sinhrks changed the title (WIP)ENH: Add Index.fillna ENH: Add Index.fillna Oct 31, 2015
@sinhrks
Copy link
Member Author

sinhrks commented Oct 31, 2015

Thanks, updated based on the comments.

  • _convert_for_op converts value for scalar op
  • assert_can_do_op check whether the value is acceptable for scalar op
    • timezone check is split to different method. Because it is used in list-likes such as equals.
  • Some changes related to output detailed error messages / keep current messages


if isinstance(item, np.datetime64):
# np.datetime64 doesn't have tz, once convert to Timestamp
item = Timestamp(item)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this seems to be not necessary?

e.g. you are checking if its np.datetime64 below so it will never be the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, needs change. This is for current _has_same_tz cannot get tz from np.datetime64.

https://github.com/pydata/pandas/pull/11343/files#diff-23ecb29e7ceba52109a365e447400d2eR562

  1. move the condition to _has_same_tz ? ( convert np.datetime64 to Timestamp).
  2. If we keep 1594th line, 1598th line can be if isinstance(item, datetime):

Maybe option 1 looks better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh 1 is better, this might work

vzone = tslib.get_timezone(getattr(Timestamp(other), 'tzinfo', '__no_tz__'))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually if you always wrap other then you can simply return Timestamp(other).tz I think

(though may be overlooking something)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _has_same_tz, other can be DTI or list-likes. Thus added if for np.datetime64.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2015

@sinhrks looks really good!

just a minor comment / question

@sinhrks
Copy link
Member Author

sinhrks commented Nov 1, 2015

Reabsed, and now green.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2015

lgtm

merge when ready

thanks!

might want to add s short example in the Index docs (iirc where we go over Index set ops)

@sinhrks
Copy link
Member Author

sinhrks commented Nov 1, 2015

Thanks. Added brief doc to indexing.rst. Merging within few days if there is no further comments.

CC: @jorisvandenbossche

Missing values
~~~~~~~~~~~~~~

.. _indexing.missing:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this above the title?

@jorisvandenbossche
Copy link
Member

@sinhrks Looks good! One more remark, doc-wise: I see there is a downcast keyword, but there is no example given of its usage in docstring or rst docs. That would be nice to include to clarify its usage I think.

jreback added a commit that referenced this pull request Nov 13, 2015
@jreback jreback merged commit 2b0886c into pandas-dev:master Nov 13, 2015
@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

@sinhrks thanks!

always gr8 PR's from you!

@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

I don't see an entry in API.rst can you add?

@sinhrks
Copy link
Member Author

sinhrks commented Nov 13, 2015

@jreback Ah, let me add it with #11537.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants