Skip to content
View zpcoftts's full-sized avatar
  • NetEase
  • hangzhou, China

Block or report zpcoftts

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,004 197 Updated Mar 23, 2025
Python 8,061 674 Updated Mar 22, 2025

Netease cloud music song downloader, with full ID3 metadata, eg: front cover image, artist name, album name, song title and so on.

Python 530 87 Updated Jun 7, 2024

💬 MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024

Python 8 1 Updated Jun 11, 2024

No fortress, purely open ground. OpenManus is Coming.

Python 39,031 6,469 Updated Mar 22, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 5,831 568 Updated Mar 23, 2025

Fully open reproduction of DeepSeek-R1

Python 23,180 2,111 Updated Mar 23, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 14,647 1,557 Updated Dec 25, 2024

Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Python 1,251 121 Updated Mar 23, 2025

A Conversational Speech Generation Model

Python 11,037 856 Updated Mar 22, 2025

Use any web browser or WebView as GUI, with your preferred language in the backend and modern web technologies in the frontend, all in a lightweight portable library.

C 3,451 221 Updated Mar 19, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

276 7 Updated Feb 11, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,250 785 Updated Mar 20, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 8,951 957 Updated Mar 21, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 45,805 1,291 Updated Mar 23, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,646 279 Updated Mar 10, 2025

Spark-TTS Inference Code

Python 5,904 605 Updated Mar 21, 2025

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS…

C++ 5,330 604 Updated Mar 22, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,360 807 Updated Mar 1, 2025

Solve Visual Understanding with Reinforced VLMs

Python 4,273 264 Updated Mar 23, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 1,651 107 Updated Mar 21, 2025

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 145 9 Updated Feb 12, 2025

The open-source repo for docs.dify.ai

Shell 443 306 Updated Mar 21, 2025

Align Anything: Training All-modality Model with Feedback

Python 2,975 386 Updated Mar 23, 2025

SkyReels V1: The first and most advanced open-source human-centric video foundation model

Python 1,869 174 Updated Mar 10, 2025

[CVPR 2025] Official repository for “MagicArticulate: Make Your 3D Models Articulation-Ready”

Python 237 4 Updated Mar 22, 2025

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Jupyter Notebook 358 39 Updated Jul 12, 2024

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 20,957 1,713 Updated Mar 17, 2025

OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.

Python 342 20 Updated Mar 18, 2025
Next