Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function for regressing/correlating multiple fields? #3784

Closed
AndrewILWilliams opened this issue Feb 20, 2020 · 6 comments · Fixed by #4089
Closed

Function for regressing/correlating multiple fields? #3784

AndrewILWilliams opened this issue Feb 20, 2020 · 6 comments · Fixed by #4089

Comments

@AndrewILWilliams
Copy link
Contributor

I came across this StackOverflow thread about applying linear regression across multi-dimensional arrays and there were some very efficient, xarray-based solutions suggested to the OP which leveraged vectorized operations. I was wondering if there was any interest in adding a function like this to future releases of xarray? It seems like a very neat and useful functionality to have, and not just for meteorological applications!

https://stackoverflow.com/questions/52108417/how-to-apply-linear-regression-to-every-pixel-in-a-large-multi-dimensional-array

Most answers draw from this blog post:
https://hrishichandanpurkar.blogspot.com/2017/09/vectorized-functions-for-correlation.html

@max-sixty
Copy link
Collaborator

Very much so! Unfortunately we've not managed to merge anything yet; check out.
#1115, #2652

We'd be very keen to merge a minimum-viable-product and then iterate from there. If you'd be interested in contributing towards this that would be great!

@AndrewILWilliams
Copy link
Contributor Author

I'll take a look at them!

@AndrewILWilliams
Copy link
Contributor Author

@max-sixty Just had a peruse through a few of the relevant issues, do we know what the status of [#3550 ] is? It seems like @r-beer was pretty close on this, right?

@max-sixty
Copy link
Collaborator

Yes that's so close! I'm sad that didn't make it in.

@r-beer if you see this: do you have a one-sentence update? Might you come back to this area or should someone else try to land it?

@AndrewILWilliams
Copy link
Contributor Author

Hi @max-sixty, just coming back to this now. It seems @r-beer isn't available...do you know roughly how far away his PR was from completion? I'm getting a little bit lost trying to follow #3550 sorry!

Was the main todo to avoid the drop=True after broadcasting? Is there any idea about what to do instead?

@AndrewILWilliams
Copy link
Contributor Author

In a fit of covid-induced insanity, I've decided to have a crack at finishing up #3550 ! I'm playing around with the changes made by @r-beer at the moment, but I'm finding the tests quite confusing - I think they're wrong? But maybe someone could help me out with this?

Here's something from test_computation.py in #3550

def test_cov(da_a, da_b, dim):
    def pandas_cov(ts1, ts2):
        """Ensure the ts are aligned and missing values ignored"""
        ts1, ts2 = xr.align(ts1, ts2)
        valid_values = ts1.notnull() & ts2.notnull()

        ts1 = ts1.where(valid_values, drop=True)
        ts2 = ts2.where(valid_values, drop=True)

        return ts1.to_series().cov(ts2.to_series())

    expected = pandas_cov(da_a, da_b)
    actual = xr.cov(da_a, da_b, dim)

    assert_allclose(actual, expected)

What I don't understand is, why would we expect the Pandas covariance or correlation functions to return anything remotely like the output of xr.cov()? The line ts1.to_series().cov(ts2.to_series()) always produces a scalar value, whereas in most reasonable use cases xr.cov(da_a, da_b, dim) would be producing a matrix of values (eg. the pixel-wise correlation in time between two DataArrays).

I wasn't sure whether to open a PR for this or not? I'm working on it but would require some help to set up some appropriate tests...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants