-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pdp gen failure from different institution ids #47
Conversation
113fab5
to
15dc7a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, good catch. Since uniqueness of institution_id
only matters when generating multiple records, I think we should make it an optional parameter, so the single-record case still works outside the context of the generate data script. Does that make sense to you? Maybe I'm trying to preserve a use case that doesn't exist in practice, but making it a required arg for both types of records feels a bit heavy.
@@ -8,7 +8,7 @@ | |||
|
|||
class Provider(BaseProvider): | |||
def raw_course_record( | |||
self, cohort_record: t.Optional[dict] = None, normalize_col_names: bool = False | |||
self, cohort_record: t.Optional[dict] = None, normalize_col_names: bool = False, institution_id: int = 12345 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel super strongly about this, but following existing patterns, it seems like we'd want to pull institution_id
from the cohort_record
if provided or otherwise generate a random one, rather than passing cohort record && a specified institution id. In this case, I think you'd just leave all the below code as it was, and just drop this extra arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ya that seems way better. I was missing some context that you added with your comment.
03f0e05
to
076cf91
Compare
076cf91
to
b40bc75
Compare
for _ in range(args.num_students) | ||
] | ||
course_records = [ | ||
FAKER.raw_course_record( | ||
cohort_record, normalize_col_names=args.normalize_col_names | ||
) | ||
cohort_record, normalize_col_names=args.normalize_col_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, are you using an autoformatter on this code? This parens should be dangling according to ruff 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not, I ran ruff (with this latest commit).
print(df_obs) | ||
|
||
|
||
def test_multiple_raw_cohort_records(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
There's a cohort schema check that ensures there's a single institution for all the data, but the pdp generation script creates a new institution id per record. This creates the institution id once and uses it for both cohort and course data. Also fixed what's probably a copy paste error, test_raw_cohort_record -> test_raw_course_record in test_raw_course.py
Co-authored-by: Burton DeWilde <[email protected]>
- Removed course changes (it was already looking at cohort) - Default to generating an id if it's not provided
0f6d944
to
e1fb9a4
Compare
There's a cohort schema check that ensures there's a single institution for all the data, but the pdp generation script creates a new institution id per record (so data created using the script will fail the validation). This creates the institution id once and uses it for both cohort and course data.
changes
context
questions