-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support streaming response #43
Conversation
plz.el
Outdated
@@ -320,6 +320,10 @@ NOTE: In v0.8 of `plz', only one error will be signaled: | |||
to update their code while using v0.7 (i.e. any `condition-case' | |||
forms should now handle only `plz-error', not the other two). | |||
|
|||
DURING is an optional callback function called each time a part | |||
of the response body is arriving. It is called with one | |||
argument, the partial raw response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and in the similar documentation below, can you make clear if the partial raw response is the cumulative response or the new response since the last callback? Also, you may want to make it clear whether it is expected to get all of the content eventually or not.
This is great, thank you for adding this! |
Hi @r0man, This looks like a very good start. Two issues that I can think of quickly are:
Thanks. |
Hi @alphapapa and @ahyatt,
What do you think of the following: Streaming of any data formatTo support streaming of any data format, I think one strategy could be to call the Streaming of SSE (server sent events)There's a standard called SSE, which the OpenAI API uses for example. It is basically text separated by 2 newlines. We could write a process filter that works with this format. We pass complete lines to the Line basedThe /stream endpoint of httpbin.org seems to be line based, instead of following the SSE standard. So maybe we should support this one as well? Other data formatDo you have any other data format in mind that I did not mention? Wdyt? |
Perhaps it might be possible to two new arguments: a callback (you already did this) plus a predicate that says when the callback is to be used (a message called with the latest information, returning non- |
This is timely in a few ways: the hyperdrive.el project (presented at EmacsConf last Sunday) uses Notably, LSP uses I've been discussing some of this with João Távora on emacs-devel. Over at But it would be good to keep all of these things in mind as we try to design a solution for Given how many variables there are, I'd suggest that we not merge anything into Ideally, I think we would have test cases to support each of the three formats we've mentioned: EventSource/SSE, JSONRPC, and LSP-style JSONRPC (perhaps also considering the JSON Lines format, which has some overlap). Then we could more easily see what the code would look like and try to make it easy to use. @r0man Since your interest lies with Thanks to all for your interest in helping with this feature. |
If this could be a new branch, that would help. I could also experiment with using |
@alphapapa and @ahyatt Sounds good to me. I can take a look at the llm library and how I could use plz with it. In fact I already did some experiments on using plz in llm. I basically added a llm-request-plz.el file, that has the same interface as the Emacs native llm-request.el. I just stopped when I realized that I need streaming support for it. @alphapapa What's not clear to me from your message is, should I only focus on the llm stuff and you are looking into the streaming support in plz. Or do you want me to continue on plz.el with what I described above? |
@r0man As long as the FSF copyright assignment is or eventually gets signed by you, I'm happy for you to experiment with code and APIs to add streaming support in your branches of |
Hi @alphapapa and @ahyatt, priorities changed a bit and I might (nothing promised, priorities change quickly) look into this again. I'm also nearing the completion of the FSF copyright assignment process. I was still thinking howto extend plz.el to support streaming. One suspicion I have is that it involves a process filter and the handling of the response needs to be tweaked. While thinking about this, looking around the code, and my previous experience working on clj-http and cljs-http I would suggest the following: Right now the user specifies a function in plz.el which parses the response. This is up-front. If you specify you want JSON via passing in the json-parse function, the response is parsed as JSON. Regardless if it actually is JSON. In an ideal world if you send an Accept header ask for application/json you should also get it back. But I also have seen many cases where this breaks along the way. In the OpenAI API which I would like to use, the response format is triggered by a parameter in the request. If you set stream to true in the body of the request, it sends a response in the text/event-stream format, otherwise in application/json. So you need to coordinated this in 2 places (if you even send an accept header). An Accpet header can also contain multiple mime times with different priorities and the server might do continent negotiation, and you might not know what you get in advance. So, my proposal would be to change the response handling code to be more flexible, and actually look at the Content-Type header when parsing the response. The idea is we have a process filter that reads the status code, the headers, and then decides which function to call (maybe multiple times in the case of streaming responses) depending on the Content-Type. If the user passes in a function we use that one, otherwise we look up an handler in an alist from content type to handler function. plz.el could provide a default content-type to handler alist, which could be overridden by the user. All this with keeping the current interface intact. Another thought I had was about middleware. I'm not sure if you are familiar with how clj-http works, but they have this concept of middleware. There is a request map and there is a response map, and both are getting passed through mutiple middleware functions that have all the same signature. Each middleware can modify the request before actually sending it, and also modify the response before it actually gets passed back to the user. There is a middleware to encode input, another to decode output, yet another that handles compression and so on. At the end all the middleware gets composed together and you get a "batteries included" HTTP client. You can see an example of it here: https://github.com/dakrone/clj-http/blob/3.x/src/clj_http/client.clj#L1125 The nice thing about this design is that you can mix and match those middleware functions. If the default client does not fit your needs you can build your own one with the middleware you need and maybe add additional. Maybe that ship has already sailed, but I wanted to mention this anyway. Wdyt? (mostly about the first part) |
I'm super happy that you can hopefully devote more time on this. I like the ideas in the clj-http client - such things could be useful to implement lower-level logging. I agree with @alphapapa's proposed approach above: implement something you think is reasonable - and I'll try it out as a backend for the |
Hi @r0man,
No problem. Thanks for the update.
For non-streaming responses, this should already be doable by using For streaming ones, yes, it will be necessary to use a bespoke process filter. I'd like to keep the API unchanged as much as possible, so my suggestion would be to start by copying the Thanks for the information about To that end, if an application did have a need like that, I'd suggest that we consider implementing a layer on top of @ahyatt As you may have noticed, I've so far tried not to commit much logging code to Let me know what you think. Thanks. |
@alphapapa Alright, thanks for your answer. I will give your suggestion a try. |
5381ede
to
a8f3309
Compare
Hi @alphapapa and @ahyatt, I updated this PR. This is what changed:
I think this is the least invasive change I could come up with. @ahyatt is this something that would work in the llm library? I have a version of the llm-request.el file that uses plz.el for my custom provider I use at work and I believe it would be sufficient. We still would need to interpret the server sent event protocol by piecing together the chunks there though. Yesterday I was also experimenting with some approaches that would add support for server sent events somehow into plz.el. Basically what @alphapapa suggested in his first message as the second point/issue. I could not come up with something satisfying yet. There are multiple places (the process filter, the :plz-then callback) that would need to be kept in sync, customized and play well together. The protocols are also quite different as I understand now. Server sent events for example are not "just" line based JSON blobs, but lines with prefix, then a JSON blob. JSON RPC is different as well. I' starting to think the code parsing all those different protocols would be too complex to bake it into plz.el, but instead should be built on top of plz.el, as separate libraries. Wdyt? |
Hi @alphapapa , (and /cc @ahyatt), I pushed an update to this branch. I believe it contains now most of what we initially discussed. Handling different streaming data formats without users having to write a complete process filter. I can also be extended to formats not yet supported. It is not intended to be merged (unless you change your mind). I plan to create 1 or 2 new repositories for this. Probably To get this working, we would need 1 or 2 small changes to plz.el: 1.) Allow setting the process filter via an option to the plz function. I can do this already for asynchronous requests by setting the process filter on the process object returned by the plz function. For synchronous requests however, I'm out of luck. So what do you think of making this change here: 2.) I followed your advice and now the :as 'buffer option in the plz-media-type function. Previously I had changed plz to add the process object to the plz-response struct. I got rid of that change to keep changes to plz itself minimal. However, now I'm running into the following issue. On synchrounous requests that raise an error the I don't know how to get a handle at the buffer used for the requests. Ideally I would like to kill the buffer and not leave it around. The point where I would like to get at that buffer is here: Can you please support us with 1.) and do you have any suggestions how to handle 2.) ? Thanks, Roman. |
I added another small change in the meantime, calling I sometimes saw the following error raised:
I haven't found a way to reproduce this yet. But when it happens the status code and the headers are in the buffer, but it is narrowed. I think this is happening when The call to |
This all looks great to me, and works in the @alphapapa it's worth making sure the interfaces in the But these are syntactic matters, which are easy to adapt the |
This allow setting the process filter via an option to the plz function. This could already be done for asynchronous requests by setting the process filter on the process object returned by the plz function. This however does not work for synchronous requests, since the plz function only returns (or raises an error) when the process has already run.
When using a process filter to stream responses sometimes the following error is raised. ``` Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil) <=(200 nil 299) (let* ((status val)) (<= 200 status 299)) (if (let* ((status val)) (<= 200 status 299)) (let ((status val)) (ignore status) (ignore status) (funcall (process-get process :plz-then))) (let nil (let ((err (make-plz-error :response (plz--response)))) (let* ((val (process-get process :plz-else))) (cond ((null val) (let nil (process-put process :plz-result err))) ((functionp val) (let (...) (funcall fn err))) (t (let (...) (error "No clause matching `%S'" x80)))))))) ``` This only happens sometimes and I haven't found a reliable way to reproduce it. Whenever it happens, the code was called from the `(plz--sentinel process "finished\n")` form. Which seems to work around a race condition that might happen. The code then runs in the context of a narrowed process buffer. The call to widen seem to fix the issue. I haven't seen it since.
Hi @alphapapa and @ahyatt, I now updated this PR with 2 small changes to plz.el. It would be great if we could add this to plz.el? Can you support us with this @alphapapa? The other code I now extracted into 2 seperate repositories. I chose 2 repos, because the plz-event-source contains more of the server sent events specification than what was necessary for plz-media-type. The repositories are here: https://github.com/r0man/plz-media-type I based my README on the one from plz.el to get an info manual, so thanks for this @alphapapa. I was thinking if those repos also need a mascott. Maybe a different picture of a pidgin, or some other bird? Where did you got your picture from @alphapapa? Do you have more pidgin pictures? I plan to submit those repos to ELPA. I think I can do that only after my paperwork is ready. I sent it last week and I'm waiting for a response from the FSF. Wdyt? |
Hi @r0man, I got your email that you sent outside of GitHub. I will have to ask for your patience as I have a lot of things going on in real life right now. I view this project to support streaming requests with I will try to provide some specific feedback today, if I can find time, and some more later this week. Feel free to ping me if you need to, but note that I get a lot of email, so mentioning me on GitHub is generally a better way to get my attention on these issues. (I already saw these updates here, but I hadn't taken the time to deal with them yet.) Generally I would recommend keeping the number of additional packages to a minimum. What you've done with
Thanks. |
BTW, I have to give you credit for the funniest question I've ever been asked on GitHub:
:D No, I don't have more pigeon pictures. Digging up the various animal mascot images I've used on some of my packages is generally tedious, requiring extensive searching of public domain-type images and then editing the most suitable one I can find to look decent in a readme. The one on |
Hi @alphapapa, alright, thanks for answering. Please take your time. |
Thank you @r0man for working on this!!!
We did something similar in hyperdrive.el, which has a top-level alist that maps types to functions (similar to For the purpose of this example,
|
Hi @josephmturner, thanks for you interest, the links and the snippet of code. Your snippet is roughly where I started. I had a few more requirements though:
To get all of this working you need to align the stars in the right way:
I went through all of this and that is how I ended up with plz-media-type. I helped @ahyatt to convert the LLM library to use curl. We are using plz-media-type in the following file and I believe it is working fine so far (at least I noticed an improvement in reliability compared to the url-retrieve based implementation, and it works with my employers proxy): Personally, I would have gone even a step further in the llm-request-plz:
We went back and forth about the design on this and came to the following conclusion:
Coming back to your example and what I believe @alphapapa is looking for. You can use plz-media-type and operate "within" plz API. That would look something like this:
This however only gets you HTTP bodies in the 2xx range decoded. For other HTTP responses you can do the same in the :else option, just slightly different. But this is only the asynchronous case. For the synchronous case you can do something similar with the return value of the plz function for the 2xx range. For other HTTP responses you need to catch the signaled error and do again something slightly different than in the :else case. You haven't talked about how streaming works in your snippet. With plz-media-type you can also do it "within" the plz API (at least in the asynchronous case). You would install a process filter on the process object returned by the plz function and call That's a lot of plumbing code. I don't want to write all of this for each HTTP request I do. The I would like to complete the migration to plz in the LLM library, so I and my colleagues at work can use a LLM that sits behind a corporate proxy. My time on this is ticking and I would like to finish this project soon with (at least) @ahyatt and my colleagues being satisfied. So, what I would like to get out of this PR here is 2 small changes:
For improvements on plz-media-type and/or plz-event-source I'm happy to take suggestions and/or PRs. I personally will probably not work on any significant API changes there if no severe flaws come up. |
That should be no problem.
I'd need to know more details about that issue. The buffer is intentionally narrowed to the body so that functions that process the body content won't see the headers. Other than that, AFAICT the main issue with your list of "aligned stars" is that you want to unify handling of responses rather than having it split between successful/2xx ones and errored/non-2xx ones. That seems reasonable to me, so we could talk about ways to make that easier within What do you think? Thanks. |
Hi @alphapapa, the option for installing the process filter would be great. Thanks for supporting this. About the narrow/widen. I understand that the callbacks should be called with the narrowed buffer, and I don't want to change that. What I would like to change is that when the I tried to describe the issue in the comment above: Unfortunately I don't have a reliable way to reproduce this. It happens in rare cases which I don't understand fully. The only place where I narrow the buffer for a brief moment in plz-media-type is here through the use of calling https://github.com/r0man/plz-media-type/blob/main/plz-media-type.el#L194 But I immediately widen it again, so when the process finishes and
I'm not sure what is causing this, but I have the suspicion it might be related to this: About your suggestion to have another |
Sure. Would you be willing to send a PR that implements just that?
I think we need to understand the problem before we try to fix it. AFAIK this error has never happened in
The problem seems to be that something is trying to parse the HTTP headers, but the buffer is narrowed to the body, so the headers aren't visible, so the status code is returned as nil, which is unexpected in the condition. In any case, the code around line 515 that you cite mustn't be changed. That code was hard-won over many hours of hair-pulling spread over a few years of development. Now that it finally works reliably, changing anything in that area would have to be a last resort to fix a very serious problem.
What I'm trying to avoid is a situation in which people decide to use I understand that you've already done that work and it works for you, but my goal here is to improve |
I opened a PR over here: #50 |
Thanks for your patience, @r0man. Does the below pseudocode address all of your concerns?
If this approach does not address all of your concerns, I'll be happy to try again :) |
Hi @josephmturner, thanks for the pseudocode. Have you looked at the plz-media-type code before writing this? I think that's more or less how plz-media-type works internally. At least the overall pattern is very familiar to me. With some small differences about global state and how :else, :then and :filter are packaged up (you use a list, I use a CLOS class). So what is your idea how we/I/you should proceed on this? |
@r0man You're right that the difference between the list and CLOS approach is an implementation detail. IIUC, with
I need more time to understand how the |
Looking again at In a similar vein, the decision to use EIEIO is one that you may want to revisit in the future. I used EIEIO in some earlier work of mine and realized that it was just overkill for what was needed. A few structs, accessor functions, and other functions gets the job done more simply, and generally with better performance. With regard to downstream applications, using EIEIO in your library means that the applications must use it as well, and that might be a barrier to adoption. It's often just an unnecessary, extra layer of abstraction and complexity. (And since EIEIO isn't a full implementation of CLOS, it lacks some of the features that would make it as useful as CLOS.) So, I would recommend that we make the minimal changes to What do you think? Thanks. |
Hi, none of plz-media-type is actually specific to LLMs. The response formats are generally applicable. I also don't agree that there is no need for such a thing. Look at all those LLM packages, all of them do their own ad-hoc parsing. About EIEIO, yeah a matter of taste I guess. So far, I'm happy with it. I also read Chris article about structs and EIEIO. I think documentation for the slots and validation is useful enough. At least to me. I would say let's close the issue. I think once the process filter option is merged into the main branch an released there is nothing we would need to be added to plz. About the issue with the process filter. In the meantime I studied jsonrpc.el and saw it uses timers to move the message handling onto the Emacs main loop. This is done here: https://github.com/emacs-mirror/emacs/blob/master/lisp/jsonrpc.el#L785 I'm probably going to do something similar, and started this over here: |
@ahyatt what do you think how we should proceed on this? I see the following options:
|
I think either your first or third option seems good to me. If we do the first one, we should call it the If you think the |
@ahyatt I don't know of any other client. My suggestion is to let the libraries mature within |
FWIW, I would suggest developing the libraries within Also, I would tend to agree that
Line 739 in 12f747c
|
I think there is nothing more to be done here, except for the "filter option" PR to be merged and released once Craig responded. |
Hi @alphapapa and @ahyatt,
This changed adds support for streaming responses by adding a process filter and a callback function. The process filter is based on the default default process filter as described in the manual. It inserts the response into the process buffer. Additionally, if a callback function is provided by the
during
keyword, it is called every time a part of the response body is received.I'm interested in writing a curl based provider for the
llm
library and saw the following issue here:#42
Would you like to collaborate in resolving this issue?
Thanks, Roman.