lookup: add print for drafting performance #5450

JohannesGaessler · 2024-02-11T10:25:35Z

This PR measures the time needed to create the draft in the lookup example and prints this information at the end. As it turns out the time is negligible:

n_draft   = 4
n_predict = 1025
n_drafted = 712
t_draft   = 2.85 ms, 4.01 us per token, 249387.04 tokens per second
n_accept  = 410
accept    = 57.584%

ggerganov

It's better to measure the entire prompt_lookup() call:

{
    const int64_t t_start_us = ggml_time_us();

    prompt_lookup();

    t_draft_us += ggml_time_us() - t_start_us;
}

ggerganov approved these changes Feb 11, 2024

View reviewed changes

lookup: add print for drafting performance

846aaa5

JohannesGaessler force-pushed the lookup-perf-print branch from 4594871 to 846aaa5 Compare February 11, 2024 11:44

JohannesGaessler merged commit e4640d8 into ggerganov:master Feb 11, 2024
48 of 53 checks passed

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

lookup: add print for drafting performance (ggerganov#5450)

a9e9b67

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

lookup: add print for drafting performance (ggerganov#5450)

61f17fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lookup: add print for drafting performance #5450

lookup: add print for drafting performance #5450

JohannesGaessler commented Feb 11, 2024

ggerganov left a comment

lookup: add print for drafting performance #5450

lookup: add print for drafting performance #5450

Conversation

JohannesGaessler commented Feb 11, 2024

ggerganov left a comment

Choose a reason for hiding this comment