Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #17

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Documentation #17

wants to merge 7 commits into from

Conversation

chandlerNick
Copy link

Added docstrings to many functions (some were private and simple enough to not seem to need one).

@chandlerNick chandlerNick reopened this Feb 7, 2025
Copy link
Member

@se-jaeger se-jaeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, added some minor changes, which we will discuss offline.
Other than that, LGTM 🚀

@@ -43,10 +69,10 @@ def _sample(self: EAR, data: pd.DataFrame, column: str | int, error_rate: float,
raise ValueError(msg)

# we offset the upper bound of the lower_error_index by a) the existing number of errors in the row, and b) the number of errors to-be generated.
upper_bound = len(se_data) - sum(se_mask) - n_errors
upper_bound = len(se_data) - sum(se_mask) - n_errors # upper bound = length of data - current number of errors - number of errors to be generated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
upper_bound = len(se_data) - sum(se_mask) - n_errors # upper bound = length of data - current number of errors - number of errors to be generated
upper_bound = len(se_data) - sum(se_mask) - n_errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine without this comment. Maybe thinking about renaming se_...

lower_error_index = self._random_generator.integers(0, upper_bound) if upper_bound > 0 else 0
error_index_range = range(lower_error_index, lower_error_index + n_errors)
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :]
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :] # Sort by the condition_to_column values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :] # Sort by the condition_to_column values
selected_rows = data_column_error_free.sort_values(by=condition_to_column).iloc[error_index_range, :]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine without this comment. Maybe thinking about renaming se_...

error_index_range = range(lower_error_index, lower_error_index + n_errors)
selected_rows = se_data_error_free.sort_values().iloc[error_index_range]
selected_rows = se_data_error_free.sort_values().iloc[error_index_range] # Introduce errors to locations of sorted values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
selected_rows = se_data_error_free.sort_values().iloc[error_index_range] # Introduce errors to locations of sorted values
selected_rows = se_data_error_free.sort_values().iloc[error_index_range]

I think that's clear enough without this comment.

Comment on lines +49 to +53

Description:
Does error checking for the abstract method '_sample'.
Assigns the _random_generator attribute.
Calls subclass _sample method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Description:
Does error checking for the abstract method '_sample'.
Assigns the _random_generator attribute.
Calls subclass _sample method.

I feel that's not necessary for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants