Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Index.normalize as a Series or DataFrame method #5502

Closed
cancan101 opened this issue Nov 13, 2013 · 7 comments
Closed

ENH: Add Index.normalize as a Series or DataFrame method #5502

cancan101 opened this issue Nov 13, 2013 · 7 comments
Labels

Comments

@cancan101
Copy link
Contributor

Currently normalize does not work on a Series or a DataFrame:

pd.Series(range(5), index=pd.date_range('2013-1-1', periods=5,freq='D')).normalize()
AttributeError: 'Series' object has no attribute 'normalize'

However, changing the timezone, which I consider to be a similar type of operation, does work:

In [50]: pd.Series(range(5), index=pd.date_range('2013-1-1', periods=5,freq='D')).tz_localize('America/New_York')
Out[50]: 
2013-01-01 00:00:00-05:00    0
2013-01-02 00:00:00-05:00    1
2013-01-03 00:00:00-05:00    2
2013-01-04 00:00:00-05:00    3
2013-01-05 00:00:00-05:00    4
Freq: D, dtype: int64

My current workaround:

s = pd.Series(range(5), index=pd.date_range('2013-1-1', periods=5,freq='D'))
s.index = s.index.normalize()
@jreback
Copy link
Contributor

jreback commented Nov 13, 2013

related #4551

@hayd
Copy link
Contributor

hayd commented Nov 14, 2013

What are you expecting normalize to do?

Surely it would make more sense for it to be the following:

In [1]: df
Out[1]: 
   A  B
0  1  2
1  3  4
2  5  6

In [2]: df / np.sum(df.values)
Out[2]: 
          A         B
0  0.047619  0.095238
1  0.142857  0.190476
2  0.238095  0.285714

In [3]: df.div(df.sum())
Out[3]: 
          A         B
0  0.111111  0.166667
1  0.333333  0.333333
2  0.555556  0.500000

In [4]: df.div(df.sum(1), axis=0)
Out[4]: 
          A         B
0  0.333333  0.666667
1  0.428571  0.571429
2  0.454545  0.545455

@cancan101
Copy link
Contributor Author

If we are going down that route, I would argue that normalize is pretty vague as it can currently be used. to_midnight is far more explicit. That being said, I would be fine with to_midnight being the new method added on a Series or DataFrame.

@hayd
Copy link
Contributor

hayd commented Nov 14, 2013

I think similar to Series string methods, we should put DatetimeIndex methods (like normalize) in a class. I'm not sure these make sense in DataFrame but certainly useful for Series.

@pwaller
Copy link
Contributor

pwaller commented May 15, 2017

I just went looking for a normalize function, expecting df.div(df.sum()) (as in @hayd's comment). The problem I have is that df is a chained pipe expression, so I went with (for a series) s.pipe(lambda s: s.div(df.sum())). I did also try df.div(lambda df: df.sum()), but I don't know if that's in the right spirit or not.

@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke mroeschke removed the Timezones Timezone data dtype label Jan 14, 2019
@mroeschke mroeschke changed the title Ability to normalize a Series or DataFrame ENH: Add Index.normalize as a Series or DataFrame method Mar 31, 2020
@mroeschke mroeschke added Enhancement and removed Internals Related to non-user accessible pandas implementation labels Mar 31, 2020
@jbrockmendel
Copy link
Member

-1, the API is too big as it is, plus it isn't obvious that it refers to the index and not the data

@jreback
Copy link
Contributor

jreback commented Sep 20, 2020

yep we already have .dt to direct this

@jreback jreback closed this as completed Sep 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants