Skip to content

ufownl/lua-cgemma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open in Kaggle Open in HF Spaces

lua-cgemma

Lua bindings for gemma.cpp.

Requirements

Before starting, you should have installed:

Installation

1st step: Clone the source code from GitHub: git clone https://github.com/ufownl/lua-cgemma.git

2nd step: Build and install:

To build and install using the default settings, just enter the repository's directory and run the following commands:

mkdir build
cd build
cmake .. && make
sudo make install

3rd step: See here to learn how to obtain model weights and tokenizer.

Usage

Synopsis

-- Create a Gemma instance
local gemma, err = require("cgemma").new({
  tokenizer = "/path/to/tokenizer.spm",
  model = "gemma3-4b",
  weights = "/path/to/4b-it-sfp.sbs"
})
if not gemma then
  error("Opoos! "..err)
end

-- Create a chat session
local session, err = gemma:session()
if not session then
  error("Opoos! "..err)
end

while true do
  print("New conversation started")

  -- Multi-turn chat loop
  while session:ready() do
    io.write("> ")
    local text = io.read()
    if not text then
      print("End of file")
      return
    end
    -- Generate reply
    local reply, err = session(text)
    if not reply then
      error("Opoos! "..err)
    end
    print("reply: ", reply)
  end

  print("Exceed the maximum number of tokens")
  session:reset()
end

APIs for Lua

cgemma.info

syntax: cgemma.info()

Show information of cgemma module.

cgemma.scheduler

syntax: <cgemma.scheduler>sched, <string>err = cgemma.scheduler([<table>options])

Create a scheduler instance.

A successful call returns a scheduler instance. Otherwise, it returns nil and a string describing the error.

Available options and default values:

{
  num_threads = 0,  -- Maximum number of threads to use. (0 = unlimited)
  pin = -1,  -- Pin threads? (-1 = auto, 0 = no, 1 = yes)
  skip_packages = 0,  -- Index of the first socket to use. (0 = unlimited)
  max_packages = 0,  -- Maximum number of sockets to use. (0 = unlimited)
  skip_clusters = 0,  -- Index of the first CCX to use. (0 = unlimited)
  max_clusters = 0,  -- Maximum number of CCXs to use. (0 = unlimited)
  skip_lps = 0,  -- Index of the first LP to use. (0 = unlimited)
  max_lps = 0,  -- Maximum number of LPs to use. (0 = unlimited)
}

cgemma.scheduler.cpu_topology

syntax: <string>desc = sched:cpu_topology()

Query CPU topology.

cgemma.new

syntax: <cgemma.instance>inst, <string>err = cgemma.new(<table>options)

Create a Gemma instance.

A successful call returns a Gemma instance. Otherwise, it returns nil and a string describing the error.

Available options:

{
  tokenizer = "/path/to/tokenizer.spm",  -- Path of tokenizer model file.
  model = "gemma3-4b",  -- Model type:
                        -- 2b-it (Gemma 2B parameters, instruction-tuned)
                        -- 2b-pt (Gemma 2B parameters, pretrained)
                        -- 7b-it (Gemma 7B parameters, instruction-tuned)
                        -- 7b-pt (Gemma 7B parameters, pretrained)
                        -- gr2b-it (Griffin 2B parameters, instruction-tuned)
                        -- gr2b-pt (Griffin 2B parameters, pretrained)
                        -- gemma2-2b-it (Gemma2 2B parameters, instruction-tuned)
                        -- gemma2-2b-pt (Gemma2 2B parameters, pretrained)
                        -- 9b-it (Gemma2 9B parameters, instruction-tuned)
                        -- 9b-pt (Gemma2 9B parameters, pretrained)
                        -- 27b-it (Gemma2 27B parameters, instruction-tuned)
                        -- 27b-pt (Gemma2 27B parameters, pretrained)
                        -- paligemma-224 (PaliGemma 224*224)
                        -- paligemma-448 (PaliGemma 448*448)
                        -- paligemma2-3b-224 (PaliGemma2 3B 224*224)
                        -- paligemma2-3b-448 (PaliGemma2 3B 448*448)
                        -- paligemma2-10b-224 (PaliGemma2 10B 224*224)
                        -- paligemma2-10b-448 (PaliGemma2 10B 448*448)
                        -- gemma3-4b (Gemma3 4B parameters)
                        -- gemma3-1b (Gemma3 1B parameters)
                        -- gemma3-12b (Gemma3 12B parameters)
                        -- gemma3-27b (Gemma3 27B parameters)
  weights = "/path/to/4b-it-sfp.sbs",  -- Path of model weights file. (requirednuq)
  weight_type = "sfp",  -- Weight type:
                        -- sfp (8-bit FP, default)
                        -- f32 (float)
                        -- bf16 (bfloat16)
                        -- nuq (non-uniform quantization)
                        -- f64 (double)
                        -- c64 (complex double)
                        -- u128 (uint128)
  seed = 42,  -- Random seed. (default is random setting)
  scheduler = sched_inst,  -- Instance of scheduler, if not provided a default
                           -- scheduler will be attached.
  disabled_words = {...},  -- Words you don't want to generate.
}

Note

If the weights file is not in the new single-file format, then tokenizer and model options are required.

cgemma.instance.disabled_tokens

syntax: <table>tokens = inst:disabled_tokens()

Query the disabled tokens of a Gemma instance.

cgemma.instance.embed_image

syntax: <cgemma.image_tokens>img, <string>err = inst:embed_image(<string>data_or_path)

Load image data from the given Lua string or a specific file (PPM format: P6, binary) and embed it into the image tokens.

syntax: <cgemma.image_tokens>img, <string>err = inst:embed_image(<integer>width, <integer>height, <table>values)

Create an image with the given width, height, and pixel values, and embed it into the image tokens.

A successful call returns a cgemma.image_tokens object containing the image tokens. Otherwise, it returns nil and a string describing the error.

cgemma.instance.session

syntax: <cgemma.session>sess, <string>err = inst:session([<table>options])

Create a chat session.

A successful call returns the session. Otherwise, it returns nil and a string describing the error.

Available options and default values:

{
  max_generated_tokens = 2048,  -- Maximum number of tokens to generate.
  prefill_tbatch = 256,  -- Prefill: max tokens per batch.
  decode_qbatch = 16,  -- Decode: max queries per batch.
  temperature = 1.0,  -- Temperature for top-K.
  top_k = 1,  -- Number of top-K tokens to sample from.
  no_wrapping = false,  -- Whether to force disable instruction-tuned wrapping.
}

cgemma.session.ready

syntax: <boolean>ok = sess:ready()

Check if the session is ready to chat.

cgemma.session.reset

syntax: sess:reset()

Reset the session to start a new conversation.

cgemma.session.dumps

syntax: <string>data, <string>err = sess:dumps()

Dump the current state of the session to a Lua string.

A successful call returns a Lua string that stores state data (binary) of the session. Otherwise, it returns nil and a string describing the error.

cgemma.session.loads

syntax: <boolean>ok, <string>err = sess:loads(<string>data)

Load the state data from the given Lua string to restore a previous session.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.dump

syntax: <boolean>ok, <string>err = sess:dump(<string>path)

Dump the current state of the session to a specific file.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.load

syntax: <boolean>ok, <string>err = sess:load(<string>path)

Load the state data from the given file to restore a previous session.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.stats

syntax: <table>statistics = sess:stats()

Get statistics for the current session.

Example of statistics:

{
  prefill_duration = 1.6746909224894,
  prefill_tokens = 26,
  prefill_tokens_per_second = 15.525252839701,
  time_to_first_token = 1.9843131969683,
  generate_duration = 38.562645539409,
  tokens_generated = 212,
  generate_tokens_per_second = 5.4975481332926
}

metatable(cgemma.session).__call

syntax: <string or boolean>reply, <string>err = sess([<cgemma.image_tokens>img, ]<string>text[, <function>stream])

Generate reply.

A successful call returns the content of the reply (without a stream function) or true (with a stream function). Otherwise, it returns nil and a string describing the error.

The stream function is defined as follows:

function stream(token, pos, prompt_size)
  if pos < prompt_size then
    -- Gemma is processing the prompt
    io.write(pos == 0 and "reading and thinking ." or ".")
  elseif token then
    -- Stream the token text output by Gemma here
    if pos == prompt_size then
      io.write("\nreply: ")
    end
    io.write(token)
  else
    -- Gemma's output reaches the end
    print()
  end
  io.flush()
  -- return `true` indicates success; return `false` indicates failure and terminates the generation
  return true
end

cgemma.batch

syntax: <cgemma.batch_result>result, <string>err = cgemma.batch([<cgemma.image_tokens>img, ]<cgemma.session>sess, <string>text[, <function>stream], ...)

Generate replies for multiple queries via the batch interface.

A successful call returns a cgemma.batch_result object. Otherwise, it returns nil and a string describing the error.

The stream function is the same as in metatable(cgemma.session).call.

Note

  1. Each element in a batch must start with a session, followed by a string and an optional stream function, with a stream function means that the corresponding session will be in stream mode instead of normal mode;
  2. All sessions in a batch must be created by the same Gemma instance;
  3. Sessions in a batch must not be duplicated;
  4. Inference arguments of batch call: max_generated_tokens, prefill_tbatch, and decode_qbatch will be the minimum value of all sessions, temperature will be the average value of all sessions, and top_k will be the maximum value of all sessions;
  5. The embedded image can only be given as the first argument to a batch call.

cgemma.batch_result.stats

syntax: <table>statistics = result:stats()

Get statistics for the batch call that returned the current result.

The statistics fields are the same as in cgemma.session.stats.

metatable(cgemma.batch_result).call

syntax: <string or boolean>reply, <string>err = result(<cgemma.session>sess)

Query the reply corresponding to the session in the result.

A successful call returns the content of the reply (normal mode) or true (stream mode). Otherwise, it returns nil and a string describing the error.

Migrating to single-file weights format

The weights file now has a new format: a single file that allows the tokenizer and the model type to be contained directly. A tool to migrate from multi-file to single-file is available.

gemma.migrate_weights \
  --tokenizer /path/to/tokenizer.spm --weights /path/to/2.0-2b-it-sfp.sbs \
  --model gemma2-2b-it --output_weights /path/to/2.0-2b-it-sfp-single.sbs

After migration, you can create a Gemma instance using the new weights file like this:

-- Create a Gemma instance
local gemma, err = require("cgemma").new({
  weights = "/path/to/2.0-2b-it-sfp-single.sbs"
})
if not gemma then
  error("Opoos! "..err)
end

License

BSD-3-Clause license. See LICENSE for details.

About

Lua bindings for gemma.cpp

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published