issue with tools + streaming + OpenAI on #133 #139

ultronozm · 2025-01-13T17:48:03Z

Evaluating

(let* ((provider (make-llm-openai
                  :key (exec-path-from-shell-getenv "OPENAI_KEY")
                  :chat-model "gpt-4")))
  (llm-tester-tool-use-streaming provider))

yields Debugger entered--Lisp error: (wrong-type-argument sequencep :null)

The text was updated successfully, but these errors were encountered:

This should fix #139.

ahyatt · 2025-01-14T06:05:38Z

Thanks for noticing this! It was indeed broken after the changing how we deserialize JSON.

ultronozm · 2025-01-14T06:51:52Z

Thanks, looks good.

Here's a further issue in the same direction (which I'd be happy to send as a new issue with the same topic, let me know): evaluating

(let* ((provider (make-llm-openai
                  :key (exec-path-from-shell-getenv "OPENAI_KEY")
                  :chat-model "gpt-4"))
       (add-fn (llm-make-tool-function
                :function (lambda (callback a b)
                            (let ((result (format "%s" (+ a b))))
                              (push (list :tool-call (cons (list a b) result)) results)
                              (funcall callback result)))
                :name "add"
                :description "Sums two numbers."
                :args '((:name "a" :description "A number." :type "integer" :required t)
                        (:name "b" :description "A number." :type "integer" :required t))
                :async t))
       (prompt (llm-make-chat-prompt
                (concat
                 "Tell a joke in ten words or less. "
                 "Then compute 2+3 and tell me what you got.")
                :tools (list add-fn))))
  (llm-chat-streaming provider prompt
                      (lambda (partial)) (lambda (final)) (lambda (err msg))))

yields a backtrace starting with Error running timer ‘plz--respond’: (wrong-type-argument sequencep t)

ahyatt · 2025-01-15T03:26:17Z

The spec changed in the past week; we now instead of having :required, have :optional. I believe that is the cause of your issue.

ultronozm · 2025-01-15T14:37:06Z

Thanks. I tried what you suggested. The issue seems to be more fundamental:

(let ((provider (make-llm-openai
                 :key (exec-path-from-shell-getenv "OPENAI_KEY")
                 :chat-model "gpt-4")))
  (llm-tester-chat-streaming provider))

=> Error running timer ‘plz--respond’: (wrong-type-argument arrayp t)

ahyatt · 2025-01-16T02:40:14Z

Thanks, I can reproduce that. I'm reopening.

This provides a better fix to #139.

ultronozm · 2025-01-16T05:49:04Z

Thanks. I'll record here (if you don't mind) a further issue on the same topic. The issue occurs for "gpt-4" but apparently not for newer models such as "gpt-4o". Evaluating

(let* ((provider (make-llm-openai
                  :key (exec-path-from-shell-getenv "OPENAI_KEY")
                  :chat-model "gpt-4"))
       (results nil)
       (add-fn (llm-make-tool-function
                :function (lambda (callback a b)
                            (let ((result (format "%s" (+ a b))))
                              (push (list :tool-call (cons (list a b) result)) results)
                              (funcall callback result)))
                :name "add"
                :description "Sums two numbers."
                :args '((:name "a" :description "A number." :type "integer" :required t)
                        (:name "b" :description "A number." :type "integer" :required t))
                :async t))
       (prompt (llm-make-chat-prompt
                (concat
                 "Tell a joke in ten words or less. "
                 "Then, compute 2+3 using the provided tool.")
                :tools
                (list add-fn)))
       responses done)
  (push (list :provider (symbol-name (type-of provider))) results)
  (push (list :chat-model (llm-openai-chat-model provider)) results)
  (push (list :prompt (copy-sequence prompt)) results)
  (llm-chat-streaming
   provider prompt
   (lambda (partial)
     (push (list :partial partial) results))
   (lambda (final)
     (push (list :final final) results)
     (push (list :prompt-after (copy-sequence prompt)) results)
     (setq done t))
   (lambda (err msg)
     (push (list :error err msg) results)
     (setq done t)))
  (while (not done)
    (sleep-for 0.1))
  (setq results (nreverse results))
  (pp-display-expression results "*test*"))

yields, e.g.,

((:provider "llm-openai") (:chat-model "gpt-4")
 (:prompt
  #s(llm-chat-prompt nil nil
                     (#s(llm-chat-prompt-interaction user
                                                     "Tell a joke in ten words or less. Then, compute 2+3 using the provided tool."
                                                     nil))
                     (#s(llm-tool-function
                         #[(callback a b)
                           ((let ((result (format "%s" (+ a b))))
                              (setq results
                                    (cons
                                     (list :tool-call
                                           (cons (list a b) result))
                                     results))
                              (funcall callback result)))
                           nil]
                         "add" "Sums two numbers."
                         ((:name "a" :description "A number." :type
                                 "integer" :required t)
                          (:name "b" :description "A number." :type
                                 "integer" :required t))
                         t))
                     nil nil nil nil))
 (:partial "Why") (:partial "Why don") (:partial "Why don't")
 (:partial "Why don't scientists")
 (:partial "Why don't scientists trust")
 (:partial "Why don't scientists trust atoms")
 (:partial "Why don't scientists trust atoms?")
 (:partial "Why don't scientists trust atoms? They")
 (:partial "Why don't scientists trust atoms? They make")
 (:partial "Why don't scientists trust atoms? They make up")
 (:partial "Why don't scientists trust atoms? They make up everything")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\n")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow,")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's add")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's add ")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's add 2")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's add 2 and")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's add 2 and ")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's add 2 and 3")
 (:partial
  "Why don't scientists trust atoms? They make up everything.\n\nNow, let's add 2 and 3.\n")
 (:tool-call ((2 3) . "5")) (:final (("add" . "5")))
 (:prompt-after
  #s(llm-chat-prompt nil nil
                     (#s(llm-chat-prompt-interaction user
                                                     "Tell a joke in ten words or less. Then, compute 2+3 using the provided tool."
                                                     nil)
                        #s(llm-chat-prompt-interaction assistant
                                                       (#s(llm-provider-utils-tool-use
                                                           "call_TcvKlZyGP1NndBE03XUwZAsM"
                                                           "add"
                                                           ((a . 2)
                                                            (b . 3))))
                                                       nil)
                        #s(llm-chat-prompt-interaction tool-results
                                                       nil
                                                       (#s(llm-chat-prompt-tool-result
                                                           "call_TcvKlZyGP1NndBE03XUwZAsM"
                                                           "add" "5"))))
                     (#s(llm-tool-function
                         #[(callback a b)
                           ((let ((result (format "%s" (+ a b))))
                              (setq results
                                    (cons
                                     (list :tool-call
                                           (cons (list a b) result))
                                     results))
                              (funcall callback result)))
                           nil]
                         "add" "Sums two numbers."
                         ((:name "a" :description "A number." :type
                                 "integer" :required t)
                          (:name "b" :description "A number." :type
                                 "integer" :required t))
                         t))
                     nil nil nil nil)))

The main issue is that the AI's partial text responses preceding the tool call are not logged in the conversation history (see :prompt-after). On a related note, there is no "final" text callback generated by llm.

ahyatt · 2025-01-17T03:19:58Z

I think this last issue is not going to work - we either do a tool call or return text, not both. I'm thinking of adding more flexibility in the future to change this, because there's a few situation in which there are essentially multiple different kinds of outputs. But that's probably going to result in a breaking change, or at least a significant addition to the API.

ultronozm · 2025-01-17T09:43:11Z

Gotcha, thanks for the explanation. I think it's something worth pursuing, even if it requires a breaking change. Also, it's not just about old API's like gpt4 -- for instance, the latest Sonnet also streams text before tool calls.

ahyatt added a commit that referenced this issue Jan 14, 2025

Fix issue in Open AI tool use, and fix tester's use of required

692f683

This should fix #139.

ahyatt closed this as completed Jan 14, 2025

ahyatt reopened this Jan 16, 2025

ahyatt added a commit that referenced this issue Jan 16, 2025

Fix to previous fix of Open AI's streaming tool use

b27cda0

This provides a better fix to #139.

ahyatt closed this as completed Jan 16, 2025

ahyatt mentioned this issue Jan 22, 2025

Reasoning models #143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with tools + streaming + OpenAI on #133 #139

issue with tools + streaming + OpenAI on #133 #139

ultronozm commented Jan 13, 2025

ahyatt commented Jan 14, 2025

ultronozm commented Jan 14, 2025

ahyatt commented Jan 15, 2025

ultronozm commented Jan 15, 2025

ahyatt commented Jan 16, 2025 •

edited

Loading

ultronozm commented Jan 16, 2025 •

edited

Loading

ahyatt commented Jan 17, 2025

ultronozm commented Jan 17, 2025

issue with tools + streaming + OpenAI on #133 #139

issue with tools + streaming + OpenAI on #133 #139

Comments

ultronozm commented Jan 13, 2025

ahyatt commented Jan 14, 2025

ultronozm commented Jan 14, 2025

ahyatt commented Jan 15, 2025

ultronozm commented Jan 15, 2025

ahyatt commented Jan 16, 2025 • edited Loading

ultronozm commented Jan 16, 2025 • edited Loading

ahyatt commented Jan 17, 2025

ultronozm commented Jan 17, 2025

ahyatt commented Jan 16, 2025 •

edited

Loading

ultronozm commented Jan 16, 2025 •

edited

Loading