ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU#11183

Open

compilade wants to merge 7 commits intomasterfrom compilade/cuda-tq2_0

+213-2

Commits on Dec 28, 2024

ggml-cuda : add TQ2_0 support
compilade
committed

Commits on Jan 9, 2025

Commits on Jan 10, 2025

ggml-cuda : remove some superfluous comments for TQ2_0 tile loading
compilade
committed

Commits on Jan 12, 2025

ggml-cuda : slight optimizations for TQ2_0

compilade
and
JohannesGaessler
committed
ggml-metal : supports_op returns false for ternary types
compilade
committed
ggml-cuda : use i and j instead of i0 and i in vec_dot_tq2_0_q8_1
compilade
committed