Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an onboarding topic-aware user embedder with various embedding sources and strategies #147

Merged
merged 39 commits into from
Jan 30, 2025

Conversation

Sris-Ty
Copy link
Contributor

@Sris-Ty Sris-Ty commented Jan 7, 2025

No description provided.

@Sris-Ty Sris-Ty requested a review from karlhigley January 7, 2025 16:35
@karlhigley
Copy link
Collaborator

karlhigley commented Jan 23, 2025

@mdekstrand Does the failing "Check data pipeline up-to-date" job here imply that I need to run dvc repro and commit the results? If so that's fine for me, but how do people without a pile of GPUs under their desk do that?

EDIT: On a related note, how do I select CUDA and optimize CPU parallelism with dvc repro?

@mdekstrand
Copy link
Contributor

@karlhigley Yes, that's what it means. I'm fine with merging with that check failed, so long as the MIND subset check passes — make sure we can reproduce the pipeline, but maybe we haven't actually done so. I wish there was a way to mark a GitHub action so that its failure is a warning instead of an error — then we could more clearly indicate which checks are mandatory and which checks (so far just this one) are optional. I think it is useful to to know that we've made changes that result in out-of-date pipelines, even if we don't fix that right away, although I'm definitely open to counterarguments there :).

To control runs: environment variable POPROX_REC_DEVICE=cuda forces CUDA, although I think it's enabled by default when supported (make sure you're running under the dev-cuda Pixi environment, not dev; it may also not work in the dev container, I don't know if containers have CUDA access). POPROX_CPU_COUNT controls the degree of parallelism (for now, I've been thinking about a future change to that as a byproduct of making it easier to prevent the CPU oversubscription problems we saw previously, but that's the current setting). There are not currently settings to control use of multiple GPUs.

@karlhigley
Copy link
Collaborator

@mdekstrand Thanks, that helps! It is possible to give containers CUDA access but I don't think our dev containers have that yet. I'll give these settings a go using the Pixi environment though.

@karlhigley
Copy link
Collaborator

I'm running the dvc repro process but it's going to take awhile. Will submit a separate PR once that finishes, but want to get this merged so we can start sending ourselves some test newsletters with the new pipelines and getting a feel for how well they work when there's a significant click history to draw on.

@mdekstrand
Copy link
Contributor

@karlhigley @Sris-Ty do you know what would be making the tests take so long?

@karlhigley
Copy link
Collaborator

No—it's been stuck there for quite awhile. Not sure if it's something about the pipelines that join multiple recommenders?

@mdekstrand
Copy link
Contributor

I cancelled and restarted the job, to see if it was an ephemeral hang, but no. We should stick a timeout on that job, but also figure out what's being slow here.

@mdekstrand
Copy link
Contributor

The tests run and pass in reasonable time on my macbook.

@Sris-Ty
Copy link
Contributor Author

Sris-Ty commented Jan 24, 2025

In that case, I am hoping joining different recommender is okay as it is taking reasonable time?

@mdekstrand
Copy link
Contributor

@Sris-Ty I am not sure what you mean.

I doubt the joining is the problem, as the tests ran fine on my local machine. I am testing in the devcontainer to see if they fail there.

@mdekstrand
Copy link
Contributor

The tests also demonstrate this problem in my devcontainer, so it's apparently something that shows up in Linux, not macOS; but is not specific to x86.

@karlhigley karlhigley merged commit 9131d25 into main Jan 30, 2025
6 of 7 checks passed
@mdekstrand
Copy link
Contributor

glad we could merge this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants