Token

Jump to bottom

A Bhat edited this page Jul 3, 2023 · 3 revisions

About

In the context of LLM, a token refers to a unit of text that the model processes.

Rule of thumb

1 token = ~4 characters of text for common English text.
Translates to roughly 0.75 words.
100 tokens ~= 75 words

Specific examples

Collected works of Shakespeare are about 900,000 words or 1.2M tokens

See also

LLM

Clone this wiki locally