-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected results for diagonal entries when using generic callable in corr #25726
Comments
Why do you expect the self-correlation to be 0? |
I am not expecting the correlation coefficient to be 0. What I compute in the example above is the p-value for the significance of the correlation. And I expect this p-value to be 0. The following example calculates the correlation coefficient and the p-value for a self-correlation. from scipy.stats import pearsonr
a = [1,2,3]
print(pearsonr(a, a))
(1.0, 0.0) In a way, I am misusing |
Ahh, understood. I'm not sure about changing this. cc @shadiakiki1986 if you have thoughts. |
In line of my request, there is a question on StackOverflow about calculating p-Values. Using a callable in |
Not quite perfect though, as you're clashing with the semantics of the
method (even though the types and signature work out fine).
…On Thu, Mar 14, 2019 at 8:34 AM Fabian Rost ***@***.***> wrote:
In line of my request, there is a question on StackOverflow about
calculating p-Values. Using a callable in corr seems the perfect answer
to this question:
https://stackoverflow.com/questions/25571882/pandas-columns-correlation-with-statistical-significance
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIoWBHktZUH2uugKui3jlsNHLxNICks5vWk_TgaJpZM4b0CfK>
.
|
I can see this. So what I am asking for is a more general method than In case you want to keep the current implementation for the diagonals: Would you agree, that it would be good to mention in the documentation that pandas expects the callable to return 1 for diagonal elements? At least, I did not expect this. |
I'm not sure. Documenting this behavior, if that's indeed what we want,
would certainly be welcome.
…On Thu, Mar 14, 2019 at 8:58 AM Fabian Rost ***@***.***> wrote:
I can see this. So what I am asking for is a more general method than corr
that computes pairwise summary statistics of columns. Does such a method
already exist for pandas or would this be a feature request?
In case you want to keep the current implementation for the diagonals:
Would you agree, that it would be good to mention in the documentation that
pandas expects the callable to return 1 for diagonal elements? At least, I
did not expect this.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIgKqYKCt3QrI3v-s1Cmq7L9KTZeAks5vWlWMgaJpZM4b0CfK>
.
|
I would prefer if diagonal elements are computed by the supplied method as this is what I need. |
As @fabianrost84 indicated earlier, the 1 in the diagonal is indeed hard-coded and not returned by the generic callable passed to the
It would be convenient for this particular p-value use case to have the callable calculate the diagonals too, but it opens the door to other changes such as the resultant matrix from corr not having to be symmetric. For the p-value issue, simply subtracting the diagonal would do, e.g. |
I think that's my preference to.
…On Thu, Mar 14, 2019 at 10:01 AM Shadi Akiki ***@***.***> wrote:
As @fabianrost84 <https://github.com/fabianrost84> indicated earlier, the
1 in the diagonal is indeed hard-coded and not returned by the generic
callable passed to the .corr function. The below callable would still
generate 1's along the diagonal
import pandas as pd
import numpy as np
return_zero = lambda a, b: 0
df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], columns=['dogs', 'cats'])
df.corr(method=return_zero)
It would be convenient for this particular p-value use case to have the
callable calculate the diagonals too, but it opens the door to other
changes such as the resultant matrix from corr not having to be symmetric.
For the p-value issue, simply subtracting the diagonal would do, e.g. df.corr(method=...)
- np.eye(len(df.columns)). I'm all for documenting the behavior and
keeping the implementation as is.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIqEC_DtKMxqEVo3wVDo8-s8S-hqKks5vWmQxgaJpZM4b0CfK>
.
|
What about this? """
* callable: callable with input two 1d ndarrays
and returning a float. The callable is expected to be commutative
and to return 1.0 for two identical input ndarrays.
.. versionadded:: 0.24.0 |
I don't think that wording is precise. The callable method itself doesn't
require these conditions. I would say something like "Note that the
returned matrix from corr will have 1 along the diagonals and will be
symmetric regardless of the callable's behavior"
…On Thu, Mar 14, 2019, 17:10 Fabian Rost ***@***.***> wrote:
What about this?
""" * callable: callable with input two 1d ndarrays and returning a float. The callable is expected to be commutative and to return 1.0 for two identical input ndarrays. .. versionadded:: 0.24.0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIAOhJ6es-NEK_k0lRDY8wB2qXCS4ZSqks5vWmZ8gaJpZM4b0CfK>
.
|
Should I add this to #25729 and make the PR a bit more general? Or should I open a new PR for this? |
#25729 is already merged, so I'll open a new PR. |
Code Sample, a copy-pastable example if possible
Problem description
I want to use the method argument of
corr
to compute p-values. However, diagonal elements are set to1
. I would expect them to be0
. They are set to1
here:pandas/pandas/core/frame.py
Lines 7025 to 7026 in cb00deb
Although I can see that for a 'normal' correlation
1
is expected, this is not the case in my example. Hence, I would suggest to remove these two lines fromframe.py
.Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: