zkLLM: Zero Knowledge Proofs for Large Language Models

This repository benchmarks the overhead and accuracy drop in zkLLM. (Note: The README and repository are still under development.)

Proving Time

Proving time, as reported in pioneering works, includes the time needed for sumcheck and commitment opening. We follow this standard, excluding irrelevant parts like compilation and loading.

We provide the following files to measure the overhead of corresponding tensor operations:

tlookup-bm.cu
matmul-bm.cu
hadamard-bm.cu
com-bm.cu

The first three files are for sumchecks only, while com-bm.cu handles commitments. Currently, proving time measurement for the entire inference process is not yet fully automated. The measurement involves listing all operations, storing and loading the input/output tensors of each operation using File I/Os, and then measuring the overheads of the sumcheck and commitment opening one by one.

Quick Estimation

For a rough estimate of proving time, use:

tlookup-bm.py
matmul-bm.py
com-bm.py
hadamard-bm.py

These scripts measure proving time with tensors of designated sizes but filled with random values. The bottleneck is the attention mechanism or zkAttn in LLaMa-2, where the input size is num_layer * num_head * seq_len * seq_len, which is 32 * 32 * 2048 * 2048 (7B) or 40 * 40 * 2048 * 2048 (13B).

zkAttn involves 5 (or 2) tlookups and two Hadamard products. The most time-consuming operations are listed below. The remaining operations have asymptotically lower complexity, ensuring the total proof time does not exceed 30s correspondingly.

Sumcheck of tlookup: There are 40 * 40 = 1600 instances with input dimensions 2048 * 2048 = 4194304. The input range is [0, 65536). Multiply this by 2 or 5, the total number of tlookups.

python tlookup-bm.py 1600 4194304 0 65536 # around 80 s

Sumcheck of Hadamard: There are 40 * 40 = 1600 instances with tensor dimensions 2048 * 2048 = 4194304.

python hadamard-bm.py 1600 4194304 # around 90 s

Openings: There are 40 * 40 * 2 = 3200 instances (input and its multiplicative inverse for each tlookup) with tensor dimensions 2048 * 2048 = 4194304. Multiply this by 2 or 5, the total number of tlookups.

python com-bm.py 3200 4194304 65536 # around 15 s, 65536 is the generator dim

The above numbers are from an A6000 GPU, which is slower than the A100 used for the experiments. Moreover, even with the same model of GPU, the actual computing time can fluctuate by as much as 2x, hypothetically due to the "unnatural" data type from the lowest levels. The bottleneck costs 5 * 80 + 90 + 5 * 15 = 565 or 2 * 80 + 90 + 2 * 15 = 280 s for 13B models. This confirms the order of magnitude of the proving time.

Accuracy (Perplexity)

Refer to ./acc. Feedback indicates an overemphasis on preserving accuracy. Therefore, we are reducing the number of bits used, resulting in fewer tlookups for zkAttn.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
acc		acc
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
bls12-381.cu		bls12-381.cu
bls12-381.cuh		bls12-381.cuh
com-bm.cu		com-bm.cu
com-bm.py		com-bm.py
commit-param.cu		commit-param.cu
commitment.cu		commitment.cu
commitment.cuh		commitment.cuh
ffn.cu		ffn.cu
fileio_utils.py		fileio_utils.py
fr-tensor.cu		fr-tensor.cu
fr-tensor.cuh		fr-tensor.cuh
g1-tensor.cu		g1-tensor.cu
g1-tensor.cuh		g1-tensor.cuh
hadamard-bm.cu		hadamard-bm.cu
hadamard-bm.py		hadamard-bm.py
ioutils.cu		ioutils.cu
ioutils.cuh		ioutils.cuh
main.cu		main.cu
matmul-bm.cu		matmul-bm.cu
matmul-bm.py		matmul-bm.py
polynomial.cu		polynomial.cu
polynomial.cuh		polynomial.cuh
ppgen.cu		ppgen.cu
proof.cu		proof.cu
proof.cuh		proof.cuh
rescaling.cu		rescaling.cu
rescaling.cuh		rescaling.cuh
rmsnorm.cu		rmsnorm.cu
self-attn.cu		self-attn.cu
skip-connection.cu		skip-connection.cu
timer.cpp		timer.cpp
timer.hpp		timer.hpp
tlookup-bm.cu		tlookup-bm.cu
tlookup-bm.py		tlookup-bm.py
tlookup.cu		tlookup.cu
tlookup.cuh		tlookup.cuh
zkfc.cu		zkfc.cu
zkfc.cuh		zkfc.cuh
zkrelu.cu		zkrelu.cu
zkrelu.cuh		zkrelu.cuh
zksoftmax.cu		zksoftmax.cu
zksoftmax.cuh		zksoftmax.cuh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zkLLM: Zero Knowledge Proofs for Large Language Models

Proving Time

Quick Estimation

Accuracy (Perplexity)

About

Releases

Packages

Languages

jvhs0706/zkllm-benchmark

Folders and files

Latest commit

History

Repository files navigation

zkLLM: Zero Knowledge Proofs for Large Language Models

Proving Time

Quick Estimation

Accuracy (Perplexity)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages