-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation #17
base: main
Are you sure you want to change the base?
Documentation #17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, added some minor changes, which we will discuss offline.
Other than that, LGTM 🚀
@@ -43,10 +69,10 @@ def _sample(self: EAR, data: pd.DataFrame, column: str | int, error_rate: float, | |||
raise ValueError(msg) | |||
|
|||
# we offset the upper bound of the lower_error_index by a) the existing number of errors in the row, and b) the number of errors to-be generated. | |||
upper_bound = len(se_data) - sum(se_mask) - n_errors | |||
upper_bound = len(se_data) - sum(se_mask) - n_errors # upper bound = length of data - current number of errors - number of errors to be generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upper_bound = len(se_data) - sum(se_mask) - n_errors # upper bound = length of data - current number of errors - number of errors to be generated | |
upper_bound = len(se_data) - sum(se_mask) - n_errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine without this comment. Maybe thinking about renaming se_...
lower_error_index = self._random_generator.integers(0, upper_bound) if upper_bound > 0 else 0 | ||
error_index_range = range(lower_error_index, lower_error_index + n_errors) | ||
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :] | ||
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :] # Sort by the condition_to_column values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :] # Sort by the condition_to_column values | |
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine without this comment. Maybe thinking about renaming se_...
error_index_range = range(lower_error_index, lower_error_index + n_errors) | ||
selected_rows = se_data_error_free.sort_values().iloc[error_index_range] | ||
selected_rows = se_data_error_free.sort_values().iloc[error_index_range] # Introduce errors to locations of sorted values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selected_rows = se_data_error_free.sort_values().iloc[error_index_range] # Introduce errors to locations of sorted values | |
selected_rows = se_data_error_free.sort_values().iloc[error_index_range] |
I think that's clear enough without this comment.
|
||
Description: | ||
Does error checking for the abstract method '_sample'. | ||
Assigns the _random_generator attribute. | ||
Calls subclass _sample method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description: | |
Does error checking for the abstract method '_sample'. | |
Assigns the _random_generator attribute. | |
Calls subclass _sample method. |
I feel that's not necessary for users.
Added docstrings to many functions (some were private and simple enough to not seem to need one).