Make use of the new stream
feature from Wllama to simplify the code
#980
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes Made
Removed the ChatCompletionOptions import since we're no longer using the onNewToken callback approach.
Added a standard JavaScript AbortController to handle interruption of the text generation process.
Implemented the new streaming API by:
Simplified the state management by setting the generating state once before starting the stream processing.
Benefits of This Approach
More efficient streaming: The new AsyncIterator approach is more efficient and follows modern JavaScript patterns.
Cleaner interruption handling: Using the standard AbortController pattern makes the code more maintainable and consistent with other JavaScript APIs.
Better resource management: The streaming approach can potentially reduce memory usage since we're processing chunks as they arrive.
Simplified code: Removed the callback-based approach in favor of a more straightforward async/await pattern.