Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SHAP inference functionality #51

Merged
merged 4 commits into from
Jan 8, 2025

Conversation

bdewilde
Copy link
Member

@bdewilde bdewilde commented Jan 8, 2025

changes

  • auto-formats / lints / tidies up inference src and tests modules
  • refactors single-batch shap values calculation logic into a separate function, and renames the spark udf that applies that batch func to chunks of a larger dataset
  • adds a unit test for the single-batch shap values func

context

Prior art: PR #48

Btw @vishpillai123 , I had started putting together an inference nb using code in this repo, but was waiting until I could chat with Daniel (Wednesday!) about how this is meant to work in practice, given confusion about the dbutils nb widgets and such.

questions

@bdewilde bdewilde marked this pull request as ready for review January 8, 2025 02:22
Copy link
Contributor

@vishpillai123 vishpillai123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring provides more clarity, though my only concern is whether it has the same SHAP values output when we test on a particular school (it should). Either way, we'll have the notebook eventually to provide test cases/sanity checks.

So, I realized db.utils is used in the context of Databricks jobs! It's quite elegant actually. I'm using the one Kelsey created and it's really nice once you get the privs sorted out.

@bdewilde
Copy link
Member Author

bdewilde commented Jan 8, 2025

So, I realized db.utils is used in the context of Databricks jobs! It's quite elegant actually. I'm using the one Kelsey created for Lee College and it's really nice once you get the privs sorted out.

I'm chatting with Daniel about this stuff in an hour, hopefully we'll both be clear on how the pipeline is meant to run in production by EoD. :)

@bdewilde bdewilde merged commit 514dfcb into develop Jan 8, 2025
5 checks passed
@bdewilde bdewilde deleted the refactor-inference-functionality branch January 8, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants