Replies: 2 comments
-
I haven't looked into this at all. What is it good for? |
Beta Was this translation helpful? Give feedback.
0 replies
-
It's supposed to go faster inference on multi-GPU I guess. Mainline sets it at 4, I set it at 1, because I didn't notice much improvement back in the days, but I noticed more vram consumption and gpu load. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@ikawrakow, could you set up a CLI argument (or at least an env variable, it's much simpler I guess but I'm failing to do it right) to determine GGML_SCHED_MAX_COPIES without recompiling? It impacts VRAM occupation and performances, and it'd be great to set that up conveniently for benching and customized use.
Beta Was this translation helpful? Give feedback.
All reactions