Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample: use year-start as rule #351

Conversation

MarcoGorelli
Copy link

@MarcoGorelli MarcoGorelli commented Mar 1, 2023

Currently, the resample results look a bit odd, in that the labels are always from the year-end:

In [1]: import numpy as np
   ...: ser_index = pd.DatetimeIndex([
   ...:     np.datetime64('0000-01-01', 's'),
   ...:     np.datetime64('2000-01-01', 's'),
   ...: ])
   ...: ser = pd.Series(range(2), index=ser_index)
   ...: ts = pyleo.Series.from_pandas(ser, metadata={'time_unit': 'years CE', 'time_name': 'datetime'})

In [2]: ts.resample('1ka').mean()
Out[2]: {'log': ()}

None
datetime [years CE]
0.999337       0.0
1000.998793    NaN
2001.000986    1.0
Name: value, dtype: float64

In [3]: ts.resample('1ka').mean().datetime_index
Out[3]: DatetimeIndex(['0-12-31', '1000-12-31', '2000-12-31'], dtype='datetime64[s]', name='datetime', freq=None)

This is because we're using multiples of 'Y' as the resample rule, which means "year end" - https://pandas.pydata.org/docs/dev/user_guide/timeseries.html#dateoffset-objects

My bad here, I should've used 'AS' the first time. I think this make it align more with user expectations anyway.

The above example would look like this:

In [1]: 
   ...: import numpy as np
   ...: ser_index = pd.DatetimeIndex([
   ...:     np.datetime64('0000-01-01', 's'),
   ...:     np.datetime64('2000-01-01', 's'),
   ...: ])
   ...: ser = pd.Series(range(2), index=ser_index)
   ...: ts = pyleo.Series.from_pandas(ser, metadata={'time_unit': 'years CE', 'time_name': 'datetime'})

In [2]: ts.resample('1ka').mean()
Out[2]: {'log': ()}

None
datetime [years CE]
0.000000       0.0
1000.002194    NaN
2000.001649    1.0
Name: value, dtype: float64

In [3]: ts.resample('1ka').mean().datetime_index
Out[3]: DatetimeIndex(['0-01-01', '1000-01-01', '2000-01-01'], dtype='datetime64[s]', name='datetime', freq=None)

There is still some rounding error in the years CE representation, due to using an approximation for the number of seconds in the year:

# UDUNITS, see: http://cfconventions.org/cf-conventions/cf-conventions#time-coordinate
SECONDS_PER_YEAR = 31556925.974592 # 86400 * 365.24219878

I've also added a test for resample.interpolate

@CommonClimate CommonClimate merged commit df067b3 into LinkedEarth:Development Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants