Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gkernel result decaying rapidly towards zero #392

Closed
alexkjames opened this issue Apr 19, 2023 · 7 comments
Closed

gkernel result decaying rapidly towards zero #392

alexkjames opened this issue Apr 19, 2023 · 7 comments
Assignees
Labels

Comments

@alexkjames
Copy link
Contributor

Describe the bug
gkernel is not functioning as it should, values are moving exponentially towards zero

To Reproduce
To demonstrate the unusual behavior run

ts = pyleo.utils.load_dataset('LR04')
ts.gkernel().plot()

Expected behavior
Shouldn't decay towards zero

Additional context
This is likely due to code introduced by #381, which divides the value axis of the gkernel method by the sum of the weights each time values are calculated, resulting in the series decaying to zero very quickly.

@alexkjames alexkjames added the bug label Apr 19, 2023
@alexkjames alexkjames self-assigned this Apr 19, 2023
@alexkjames alexkjames changed the title gkernel producing stranger results gkernel result decaying rapidly towards zero Apr 19, 2023
@alexkjames alexkjames mentioned this issue Apr 19, 2023
@CommonClimate
Copy link
Collaborator

Thanks for catching my mistake. I see that things are working fine for LR04 now, but I just tried it on EDC dD and I get this:
EDC_gkernel

It seems like the default choice of bin size (median) is too fine, so gkernel creates lots of empty segments. I tried to use step_type='max' instead, but:

  • (a) nomenclature is inconsistent with MultipleSeries.common_time(), which calls this variable step_style; shouldn't they be aligned?
  • (b) I get an error message with step_type='max':
/Users/julieneg/Documents/GitHub/Pyleoclim_util/pyleoclim/utils/tsutils.py:477: RuntimeWarning: invalid value encountered in double_scalars
  yc[i]  = sum(weight*yslice)/sum(weight) # normalize by the sum of weights
Out[9]: 
(<Figure size 720x288 with 1 Axes>,
 <Axes: xlabel='Age [y BP]', ylabel='$\\delta \\mathrm{D}$ [‰]'>)

Can you please look into it and see what is lurking behind?

@alexkjames
Copy link
Contributor Author

a) I believe they aren't aligned because we didn't want to change the APIs and break legacy examples (which I think is wise)
b) I'll look into it, that's a strange error.

@alexkjames
Copy link
Contributor Author

@CommonClimate I did some testing and I realized that gkernel wasn't actually acknowledging the passed step_type, nor has it ever acknowledged step_type. The parameter was overwritten in old versions of the code base, and in the new version it didn't get passed to the underlying utility. This has been fixed in my most recent commits.

In the example you attached, what's being shown is in fact the result of the usage of the 'max' step_style/step_type. However, the default parameter value for h, the kernel e-folding scale, means that the calculated weights are tiny for windows that contain only one or two data points in the low resolution portion of the series. When gkernel tries to calculate values using these tiny weights, python raises a warning and returns a nan value because we try to divide by a very, very small number.

I changed the docstring to warn people about this behavior. The function works properly otherwise as far as I can tell, it just requires that the user exercise some caution.

@CommonClimate
Copy link
Collaborator

hmm, i'll see if we can get it to behave more like bin() by making h a function of the increment size.

re: alignment, i'll add a step_style parameter and send a deprecation warning for step_type.

@alexkjames
Copy link
Contributor Author

alexkjames commented Apr 19, 2023

I think it has less to do with step size than it does with the minimum number of points in a given window. As far as I can tell if you have windows that have only one or two points in them, the weights will be tiny and python will be unhappy (unless h is large). Making it so that windows that only contain a single point just return that point might alleviate the issue somewhat, although you can still get really low weight values for windows that have just two or three points as well (edit: I just tried this and it didn't seem to make much of a difference, so its probably best to try your approach).

sounds good on the deprecation warning

@CommonClimate
Copy link
Collaborator

actually, I just got this with your code:

EDC_gkernel

import pyleoclim as pyleo
ts = pyleo.utils.load_dataset('EDC-dD').convert_time_unit('ky BP')
fig, ax = ts.plot() 
tsg = ts.gkernel(step_type='max')
tsg.label = ts.label + ' (gkernel)'
tsg.plot(ax=ax,color='C1')
fig.tight_layout()

that's exactly what I was hoping the code would do, so no change needed. I'll just update the parameters and push a fix

@CommonClimate
Copy link
Collaborator

addressed by #393

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants