Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add whisper, Stable diffusion instruction #28

Merged
merged 9 commits into from
Jan 17, 2025
Merged

Add whisper, Stable diffusion instruction #28

merged 9 commits into from
Jan 17, 2025

Conversation

alabulei1
Copy link
Contributor

No description provided.

Copy link
Contributor

juntao commented Jan 9, 2025

Hello, I am a PR summary agent on flows.network. Here are my reviews of code commits in this PR.


Overall Summary

Potential Issues and Errors

  1. Documentation Discrepancy in Stable Diffusion Guide:

    • In quick-start-sd.md, the section "Start the API server" mentions starting a "whisper" API server instead of "Stable Diffusion". This needs to be corrected.
  2. URL Consistency:

    • Ensure that the new URL format in the response example (e.g., http://localhost:8080/v1/files/download/...) is correct and accessible for users.
  3. Link Corrections:

    • Verify that all fixed links are correctly pointing to their respective destinations, especially after restructuring and renaming files.

Most Important Findings

  1. Comprehensive Guides:

    • Both the Whisper and Stable Diffusion guides provide detailed and user-friendly setup instructions, covering different platforms and hardware configurations.
  2. API Usage Examples:

    • Practical examples in both guides help users understand how to interact with the APIs for transcription, translation, image generation tasks.
  3. Documentation Restructuring and Categorization:

    • The documentation has been restructured into better-organized categories with _category_.json files, improving navigation and clarity.
  4. Model Version Consistency:

    • Updated model references throughout the documentation ensure consistency in versioning (e.g., changing Llama 3.1 8B to Llama 3.2 1B).
  5. Link Corrections:

    • Fixed broken links to enhance user navigation and experience.
  6. Target WASM Version Update:

    • Updated Rust build target from wasm32-wasi to wasm32-wasip1 for compatibility with newer WebAssembly versions.

These findings indicate a significant improvement in documentation clarity, organization, and usability across various AI model sections.

Details

Commit af61d75bdae64e895f5dd007cbe7b750b8ffd4b2

Key Changes Summary

  1. New Documentation File: Added a new markdown file docs/user-guide/quick-start-whisper.md with detailed instructions on setting up and using the Whisper API server.
  2. Installation Instructions: Included step-by-step installation instructions for WasmEdge and specific Whisper plugins across different platforms (Mac Apple Silicon, CUDA 12.0, CUDA 11.0).
  3. Server Setup: Provided guidance on downloading and running the Whisper API server Wasm application, including starting the server on port 8080.
  4. API Usage Examples: Included examples of using the API to transcribe audio files in different languages and translate non-English audio into English.

Most Important Findings

  1. Comprehensive Guide: The PR introduces a detailed and user-friendly guide for setting up Whisper, which is beneficial for new users.
  2. Platform-Specific Instructions: The inclusion of platform-specific installation instructions ensures broad accessibility across various systems.
  3. API Usage Examples: Practical examples provided help users quickly understand how to interact with the API for transcription and translation tasks.

Commit 1d521524fdc696a8f364bc9ebd51ec3d0cc53306

Key Changes:

  1. New Documentation File: Created quick-start-sd.md in the docs/user-guide/ directory.
  2. Installation Instructions: Added detailed installation steps for WasmEdge version 0.14.1, including commands specific to Mac Apple Silicon and Ubuntu systems with CUDA support.
  3. API Server Setup: Included instructions for downloading the portable API server application and a Stable Diffusion model.
  4. Server Start Command: Provided command to start the API server using the downloaded model.
  5. API Usage Example: Demonstrated how to use the API to generate an image from a text prompt, including example request and response.

Most Important Findings:

  • The new documentation file provides comprehensive instructions for setting up and using Stable Diffusion with WasmEdge, covering different hardware configurations.
  • There is a discrepancy in the "Start the API server" section where it mentions starting the "whisper" API server, which should likely be "Stable Diffusion." This needs to be corrected.

Commit 561737254ad03254f96f63e4c014376ee2f6b90c

  1. Updated Response Example: The response example in quick-start-sd.md has been significantly changed. The new JSON structure is more concise and includes a URL that starts with the host (http://localhost:8080/v1/files/download/...), whereas the old one was just a path.
  2. Changed Prompt Description: The prompt in the example response is updated from "A cute cat" to "A cute baby cat".
  3. Additional Information: Added a sentence at the end to instruct users on how to view the generated image based on the provided prompt.

Key Findings:

  • The most notable change is the format and content of the example response, which could impact users' expectations and understanding.
  • Ensure that the new URL in the response example is correct and accessible for the intended use case.

Commit a490151528561b30413e15ec695dbc6f3d9c8e56

-### Key Changes Summary

  1. Updated Quick Start Guide for Stable Diffusion:
    • Clarified the instruction on how to view the generated image by providing a specific URL format (http://localhost:8080/v1/files/download/file_6420f32d-0b9a-4554-8e0b-a8deac0ab02) that users need to open in their browser. This replaces the general instruction, making it more precise and user-friendly.

Commit 939ffca60c54751c73c44b55446f3b1c43b36f3d

Key Changes and Findings

  1. Target WASM Version Update:

    • Updated the Rust build target from wasm32-wasi to wasm32-wasip1 across multiple developer guide documents (basic-llm-app.md, chatbot-llm-app.md, create-embeddings-collection.md, embedding-app.md). This change is crucial for ensuring compatibility with newer versions of WebAssembly.
  2. Documentation Restructuring:

    • Created a new index.md file in the user-guide/ directory to serve as an overview, categorizing different AI models and applications (LLM, Speech to Text, Text to Speech, Text to Image, Multimodal Vision).
  3. Category JSON Creation:

    • Added _category_.json files for new categories including LLM (llm/_category_.json), Speech-to-Text (speech-to-text/_category_.json), Text-to-Speech (text-to-speech/_category_.json), and Multimodal (multimodal/_category_.json). These files help organize the documentation better.
  4. File Renaming:

    • Renamed several files to reflect the new categorization structure (e.g., api-reference.md to llm/api-reference.md, get-started-with-llamaedge.md to llm/get-started-with-llamaedge.md). This renaming is intended to improve navigation and clarity within the documentation.
  5. Additions in Text-to-Image Section:

    • Added a new file flux.md under text-to-image/. This document likely provides instructions or details about using FLUX, an image generation model.
    • Renamed quick-start-sd.md to text-to-image/quick-start-sd.md and updated it minimally.
  6. Deletions:

    • Removed unused images (lobechat-llamaedge-01.png, lobechat-llamaedge-02.png, quick-start-command-01.png) and the quick-start-command.md file, which was presumably replaced or integrated into other sections.
  7. Speech-to-Text Section Enhancements:

    • Added new files (speech-to-text/_category_.json, api-reference.md, cli.md) to support detailed documentation on using models like Whisper for speech recognition.
  8. Multimodal Vision Documentation:

    • Introduced a new section with _category_.json and llava.md for multimodal vision applications, likely focusing on Llava and Qwen-VL models.
  9. Text-to-Speech Section:

    • Added documentation (gpt-sovits.md) to describe how to use GPT-SOVITs for text-to-speech conversion.

Summary

The pull request primarily focuses on enhancing the user guide by restructuring it into better-organized categories, updating build targets, and adding detailed sections for new models like Whisper, Stable Diffusion, and FLUX. This reorganization aims to improve clarity and ease of use for developers exploring different AI model capabilities.

Commit 28797df23d7e9b10218f303b895992b68007c19d

Key Changes:

  • Corrected Links: Fixed incorrect links throughout the documentation to point to the correct paths. This includes updates in create-embeddings-collection.md, llamaedge_vs_ollama.md, index.md, get-started-with-llamaedge.md, rag-service.md, and tool-call.md. The changes primarily involve adjusting relative URLs to absolute or correcting path levels.
  • Consistency: Ensured link consistency across different documentation files, making the navigation more intuitive for users.

Commit 4e09432a91dae1de540e67ea69f9651ffda0ed87

Key Changes Summary

  • Fixed Broken Links: Corrected the URLs in two markdown files (get-started-with-llamaedge.md and tool-call.md) to point to the correct location within /docs/user-guide/openai-api/intro.md instead of /docs/openai-api/intro.md.

Commit c7a54b7c55fad61cf784b885c8b424f7cd45ba40

Key Changes:

  1. New Documentation for OpenAI API Tutorial:

    • Added a new file docs/user-guide/llm/full-openai.md providing a comprehensive guide to setting up an OpenAI-compatible API server using LlamaEdge and WasmEdge, including installation steps, model downloads, and API request details.
  2. Updated Model References in Existing Documentation:

    • Changed the reference from Llama 3.1 8B to Llama 3.2 1B in docs/user-guide/llm/get-started-with-llamaedge.md for both download links and WasmEdge runtime commands.
    • Updated the model names in command examples within the same document.
  3. Clarifications on API Server Setup:

    • Provided detailed instructions on how to configure and run an OpenAI-compatible API server with Llama 3.2 1B and nomic-embed-text-v1.5 models.
  4. Consistent Model Naming Across Documentation:

    • Ensured uniformity in model naming and file references, particularly updating the model version from 3.1 to 3.2 throughout the relevant sections.

@alabulei1 alabulei1 changed the title Add whisper Add whisper and Stable difussion Jan 9, 2025
@alabulei1 alabulei1 changed the title Add whisper and Stable difussion Add whisper, Stable diffusion instruction Jan 13, 2025
@juntao juntao merged commit 32dc9e1 into main Jan 17, 2025
3 checks passed
@juntao juntao deleted the alabulei1-patch-1 branch January 17, 2025 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants