SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"

Python 177 8 Updated Mar 8, 2025

LiuZH-19 / SongGen

Python 193 15 Updated Mar 10, 2025

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 3,008 243 Updated Mar 11, 2025

lucadellalib / focalcodec

A low-bitrate single-codebook 16 kHz speech codec based on focal modulation

Python 78 10 Updated Feb 12, 2025

nivibilla / local-llasa-tts

Examples of using the llasa-tts models locally

Jupyter Notebook 153 18 Updated Feb 3, 2025

facebookresearch / audiobox-aesthetics

Unified automatic quality assessment for speech, music, and sound.

Python 412 25 Updated Mar 7, 2025

Ola-Omni / Ola

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 302 14 Updated Feb 28, 2025

qiuqiangkong / audio_understanding

Python 99 4 Updated Feb 6, 2025

multimodal-art-projection / YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,399 478 Updated Mar 12, 2025

OpenMOSS / SpeechGPT-2.0-preview

GPT-4o-level, real-time spoken dialogue system.

Python 290 19 Updated Jan 27, 2025

deepseek-ai / DeepSeek-R1

86,129 11,109 Updated Feb 24, 2025

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,943 1,362 Updated Mar 3, 2025

john852517791 / awesome-fake-audio-detection

A list of tools, papers and code related to Fake Audio Detection.

70 2 Updated Mar 12, 2025

jishengpeng / Languagecodec

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Python 207 15 Updated Nov 22, 2024

adapter-hub / adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning

Jupyter Notebook 2,662 357 Updated Mar 9, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 224 25 Updated Mar 12, 2025

zhenye234 / LLaSA_training

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 454 34 Updated Mar 12, 2025

avishaiElmakies / unsupervised_speech_segmentation_using_slm

Python 16 Updated Jan 8, 2025

NVIDIA / Cosmos-Tokenizer

A suite of image and video neural tokenizers

Jupyter Notebook 1,573 72 Updated Feb 11, 2025

OS-Agent-Survey / OS-Agent-Survey

This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use".

219 9 Updated Feb 18, 2025

yukara-ikemiya / minimal-musicgen-for-developers

[PyTorch] Minimal codebase for MusicGen models

Python 57 Updated Jan 7, 2025

NVIDIA / Cosmos

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,670 493 Updated Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sang-Hoon Lee sh-lee-prml

Block or report sh-lee-prml

Stars

SesameAILabs / csm

Bartelds / ctc-dro

ajd12342 / paraspeechcaps

ASLP-lab / DiffRhythm

hustvl / DiffusionDrive

LTH14 / fractalgen

stepfun-ai / Step-Audio

VITA-MLLM / LUCY

slp-rl / slamkit