-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expectation "expect_column_values_to_be_between" returns error if null values present #72
Comments
Great Expecatations should be handling null values properly (as you expect), and you're definitely running into a bug. The expect_column_values_to_be_between expectations should be discarding null values before performing the tests (but making that information available to you as a part of the summary_obj if you choose the SUMMARY output format). I'm not sure what is causing the problem yet. The following works for me:
Could you share anything about the values in the column on which you're trying to run the expectation and potentially also the build environment you're running on? We're just starting to test across different platforms, and it's possible that we're having an issue there. |
I tried running your code above, and it throws the same error, so we must be having some version/environment issue. I'm running python 2.7, installed through anaconda. Tried in jupyter notebook, ipython and as a command line script. Installed great-expectations from this commit hash: 1a5b357e54382bbe0aa0ab76e861026db211ad03. What else would be helpful to know? |
Can you run |
Here are the results.
I'm on macOS Sierra 10.12.6 |
@eringong I can replicate your issue using the environment you have, however I believe that it's an issue that pandas has fixed in pandas-dev/pandas#13637 related to the way that series are broadcast together. Can you upgrade your pandas to the current version (or at least newer?) |
Does this mean we should update requirements.txt to require pandas 0.19? On Wed, Aug 30, 2017 at 5:30 AM, jcampbell [email protected] wrote:
-- |
If it's definitely the issue, then yes I think so. (Though the current requirements should have alerted on install that the scipy version is out of date too, I think). |
Updated all conda packages and it works beautifully now. Thank you! |
On the scipy version alert, it is interesting that when I installed great-expectations, I got this message from the install:
although as we confirmed, conda actually was running scipy-0.18.1, so should have failed. |
Glad it's working! I updated the requirements file in develop to clarify that we have that dependency on a feature in newer pandas. For the issue of the scipy requirement, I suspect the issue was from scipy having been installed with pip (perhaps by something else you pip-installed) rather than via conda. I probably should have asked for the result of |
…index-based bitwise comparison of Series objects (#72)
When I run
df.my_df.expect_column_values_to_be_between
on columns that have null values, itthrows this IndexingError.
The column has 2 NaNs. When I drop the NaNs, then expectation works without error.
Not sure if this is intended behavior of the expectation, but it would be a nice feature if the expectation could handle missing data, perhaps as an argument of the expectation (expect_some_null_values = True/False).
The text was updated successfully, but these errors were encountered: