Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pdp gen failure from different institution ids #47

Merged
merged 4 commits into from
Jan 4, 2025

Conversation

ZakMiller
Copy link
Contributor

There's a cohort schema check that ensures there's a single institution for all the data, but the pdp generation script creates a new institution id per record (so data created using the script will fail the validation). This creates the institution id once and uses it for both cohort and course data.

changes

  • Generates institution id once in the parent script and then passes it down.
  • Adds a test that uses the new field to pass the same value down to multiple cohorts/courses (this would fail without this change) for both cohort (which has the check) and course (which doesn't)
  • Ranames test_raw_cohort_record to test_raw_course_record in test_raw_course.py (unrelated, looks like a copy paste error)

context

  • Noticed this while running through the templates using the existing test data in the repo. Some more similar changes to follow.

questions

  • Should the same schema check exist for the course schema?
  • Where should I create the asana ticket (that's mentioned in CONTRIBUTING.md)?

@ZakMiller ZakMiller force-pushed the bugfix/consistent-pdp-institution-id branch from 113fab5 to 15dc7a0 Compare December 30, 2024 21:06
Copy link
Member

@bdewilde bdewilde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, good catch. Since uniqueness of institution_id only matters when generating multiple records, I think we should make it an optional parameter, so the single-record case still works outside the context of the generate data script. Does that make sense to you? Maybe I'm trying to preserve a use case that doesn't exist in practice, but making it a required arg for both types of records feels a bit heavy.

src/student_success_tool/generation/pdp/raw_cohort.py Outdated Show resolved Hide resolved
src/student_success_tool/generation/pdp/raw_cohort.py Outdated Show resolved Hide resolved
tests/generation/pdp/test_raw_course.py Show resolved Hide resolved
@@ -8,7 +8,7 @@

class Provider(BaseProvider):
def raw_course_record(
self, cohort_record: t.Optional[dict] = None, normalize_col_names: bool = False
self, cohort_record: t.Optional[dict] = None, normalize_col_names: bool = False, institution_id: int = 12345
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel super strongly about this, but following existing patterns, it seems like we'd want to pull institution_id from the cohort_record if provided or otherwise generate a random one, rather than passing cohort record && a specified institution id. In this case, I think you'd just leave all the below code as it was, and just drop this extra arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ya that seems way better. I was missing some context that you added with your comment.

@ZakMiller ZakMiller force-pushed the bugfix/consistent-pdp-institution-id branch from 03f0e05 to 076cf91 Compare January 3, 2025 19:57
@ZakMiller ZakMiller force-pushed the bugfix/consistent-pdp-institution-id branch from 076cf91 to b40bc75 Compare January 3, 2025 21:53
for _ in range(args.num_students)
]
course_records = [
FAKER.raw_course_record(
cohort_record, normalize_col_names=args.normalize_col_names
)
cohort_record, normalize_col_names=args.normalize_col_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, are you using an autoformatter on this code? This parens should be dangling according to ruff 🤷‍♂️

Copy link
Contributor Author

@ZakMiller ZakMiller Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not, I ran ruff (with this latest commit).

print(df_obs)


def test_multiple_raw_cohort_records():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

ZakMiller and others added 4 commits January 4, 2025 14:26
There's a cohort schema check that ensures there's a single institution
for all the data, but the pdp generation script creates a new
institution id per record. This creates the institution id once and uses
it for both cohort and course data.

Also fixed what's probably a copy paste error, test_raw_cohort_record ->
test_raw_course_record in test_raw_course.py
- Removed course changes (it was already looking at cohort)
- Default to generating an id if it's not provided
@ZakMiller ZakMiller force-pushed the bugfix/consistent-pdp-institution-id branch from 0f6d944 to e1fb9a4 Compare January 4, 2025 19:26
@ZakMiller ZakMiller merged commit 58d2e2d into develop Jan 4, 2025
5 checks passed
@ZakMiller ZakMiller deleted the bugfix/consistent-pdp-institution-id branch January 4, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants