Skip to content
View sh-lee-prml's full-sized avatar
  • Ajou University
  • Suwon

Block or report sh-lee-prml

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Conversational Speech Generation Model

6,752 214 Updated Feb 26, 2025

Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.

Python 5 1 Updated Mar 5, 2025

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 91 3 Updated Mar 11, 2025

Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Python 998 82 Updated Mar 13, 2025

[CVPR 2025] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving

Python 547 32 Updated Feb 27, 2025

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 919 46 Updated Feb 25, 2025
Python 3,934 315 Updated Mar 12, 2025

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Python 27 3 Updated Mar 10, 2025

SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"

Python 177 8 Updated Mar 8, 2025
Python 193 15 Updated Mar 10, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,008 243 Updated Mar 11, 2025

A low-bitrate single-codebook 16 kHz speech codec based on focal modulation

Python 78 10 Updated Feb 12, 2025

Examples of using the llasa-tts models locally

Jupyter Notebook 153 18 Updated Feb 3, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 412 25 Updated Mar 7, 2025

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python 302 14 Updated Feb 28, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,399 478 Updated Mar 12, 2025

GPT-4o-level, real-time spoken dialogue system.

Python 290 19 Updated Jan 27, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,943 1,362 Updated Mar 3, 2025

A list of tools, papers and code related to Fake Audio Detection.

70 2 Updated Mar 12, 2025

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Python 207 15 Updated Nov 22, 2024

A Unified Library for Parameter-Efficient and Modular Transfer Learning

Jupyter Notebook 2,662 357 Updated Mar 9, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 224 25 Updated Mar 12, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 454 34 Updated Mar 12, 2025

A suite of image and video neural tokenizers

Jupyter Notebook 1,573 72 Updated Feb 11, 2025

This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use".

219 9 Updated Feb 18, 2025

[PyTorch] Minimal codebase for MusicGen models

Python 57 Updated Jan 7, 2025

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,670 493 Updated Mar 7, 2025
Next