Skip to content
A Bhat edited this page Jul 3, 2023 · 3 revisions

About

  • In the context of LLM, a token refers to a unit of text that the model processes.

Rule of thumb

  • 1 token = ~4 characters of text for common English text.
  • Translates to roughly 0.75 words.
  • 100 tokens ~= 75 words

Specific examples

  • Collected works of Shakespeare are about 900,000 words or 1.2M tokens

See also

Clone this wiki locally