Resample: use year-start as rule #351

MarcoGorelli · 2023-03-01T18:50:14Z

Currently, the resample results look a bit odd, in that the labels are always from the year-end:

In [1]: import numpy as np
   ...: ser_index = pd.DatetimeIndex([
   ...:     np.datetime64('0000-01-01', 's'),
   ...:     np.datetime64('2000-01-01', 's'),
   ...: ])
   ...: ser = pd.Series(range(2), index=ser_index)
   ...: ts = pyleo.Series.from_pandas(ser, metadata={'time_unit': 'years CE', 'time_name': 'datetime'})

In [2]: ts.resample('1ka').mean()
Out[2]: {'log': ()}

None
datetime [years CE]
0.999337       0.0
1000.998793    NaN
2001.000986    1.0
Name: value, dtype: float64

In [3]: ts.resample('1ka').mean().datetime_index
Out[3]: DatetimeIndex(['0-12-31', '1000-12-31', '2000-12-31'], dtype='datetime64[s]', name='datetime', freq=None)

This is because we're using multiples of 'Y' as the resample rule, which means "year end" - https://pandas.pydata.org/docs/dev/user_guide/timeseries.html#dateoffset-objects

My bad here, I should've used 'AS' the first time. I think this make it align more with user expectations anyway.

The above example would look like this:

In [1]: 
   ...: import numpy as np
   ...: ser_index = pd.DatetimeIndex([
   ...:     np.datetime64('0000-01-01', 's'),
   ...:     np.datetime64('2000-01-01', 's'),
   ...: ])
   ...: ser = pd.Series(range(2), index=ser_index)
   ...: ts = pyleo.Series.from_pandas(ser, metadata={'time_unit': 'years CE', 'time_name': 'datetime'})

In [2]: ts.resample('1ka').mean()
Out[2]: {'log': ()}

None
datetime [years CE]
0.000000       0.0
1000.002194    NaN
2000.001649    1.0
Name: value, dtype: float64

In [3]: ts.resample('1ka').mean().datetime_index
Out[3]: DatetimeIndex(['0-01-01', '1000-01-01', '2000-01-01'], dtype='datetime64[s]', name='datetime', freq=None)

There is still some rounding error in the years CE representation, due to using an approximation for the number of seconds in the year:

Pyleoclim_util/pyleoclim/utils/tsbase.py

Lines 24 to 25 in 0da854c

    
           # UDUNITS, see: http://cfconventions.org/cf-conventions/cf-conventions#time-coordinate 
        
           SECONDS_PER_YEAR = 31556925.974592  # 86400 * 365.24219878

I've also added a test for resample.interpolate

use year-start as resample rule

306284b

MarcoGorelli requested a review from CommonClimate March 1, 2023 18:50

CommonClimate approved these changes Mar 2, 2023

View reviewed changes

CommonClimate merged commit df067b3 into LinkedEarth:Development Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample: use year-start as rule #351

Resample: use year-start as rule #351

MarcoGorelli commented Mar 1, 2023 •

edited

Loading

	# UDUNITS, see: http://cfconventions.org/cf-conventions/cf-conventions#time-coordinate
	SECONDS_PER_YEAR = 31556925.974592 # 86400 * 365.24219878

Resample: use year-start as rule #351

Resample: use year-start as rule #351

Conversation

MarcoGorelli commented Mar 1, 2023 • edited Loading

MarcoGorelli commented Mar 1, 2023 •

edited

Loading