-
Notifications
You must be signed in to change notification settings - Fork 956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC: Modify single IV to accept nans #1296
base: main
Are you sure you want to change the base?
PoC: Modify single IV to accept nans #1296
Conversation
This PR is stale because it has been open for 60 days with no activity. |
thanks for adding this @AshwinParanjape (and for your patience--I somehow missed this PR). Can you add a test case for this based on the referenced issue? Have also added an inline comment on informing the user about missing data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding this fix. added a few comments.
y0_z = np.mean(data[self._target_estimand.outcome_variable[0]][instrument == 0]) | ||
x1_z = np.mean(data[self._target_estimand.treatment_variable[0]][instrument == 1]) | ||
x0_z = np.mean(data[self._target_estimand.treatment_variable[0]][instrument == 0]) | ||
y1_z = np.nanmean(data[self._target_estimand.outcome_variable[0]][instrument == 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the motivation, but sometimes the user may not be aware of the missing data (especially when it may be due to a bug in their data processing code).
For this reason, can we issue a warning message as this nanmean operation is done?
the code structure can be: try np.mean operations, if it fails, then we issue a warning that "The dataset contains missing data, proceeding to ignore it", and then using nanmean as a fallback.
for warning, you can use, self.logger.warning("message")
I modified the binary and continuous single IV case to see how it be made to work with partial data.
For binary IV, np.mean -> np.nanmean is sufficient
For continuous valued IV, I'm using np.ma (masked array) to mask the NaNs from covariance computations