ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU #18222
Job | Run time |
---|---|
12m 30s | |
5m 34s | |
12m 26s | |
3m 25s | |
2m 22s | |
2m 47s | |
2m 47s | |
5m 3s | |
3m 49s | |
2m 59s | |
2m 46s | |
8m 4s | |
2m 6s | |
12m 7s | |
2m 59s | |
2m 18s | |
2m 13s | |
19m 38s | |
3m 40s | |
12m 26s | |
4m 53s | |
3m 14s | |
4m 42s | |
12m 7s | |
41m 54s | |
37m 29s | |
4m 44s | |
5m 11s | |
30m 2s | |
4m 40s | |
6m 47s | |
13m 48s | |
5m 47s | |
6m 9s | |
5m 6s | |
5m 52s | |
4m 49s | |
2m 58s | |
2m 53s | |
3m 51s | |
0s | |
0s | |
5h 28m 55s |