-
Huazhong University of Science and Technology
Lists (1)
Sort Name ascending (A-Z)
Stars
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
The first decoder-only multimodal state space model
This repository give a guidline to learn CUDA and TensorRT from the beginning.
project page of "RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning"
[CVPR 2025] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[CVPR 2025] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
(TPAMI 2024) A Survey on Open Vocabulary Learning
Official implementation of CVPR24 Highlight paper "Open-vocabulary object 6D pose estimation"
Bridging Large Vision-Language Models and End-to-End Autonomous Driving
[ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Models
Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream [CVPR2024]
Free and Open Source, Distributed, RESTful Search Engine
Open source RabbitMQ: core server and tier 1 (built-in) plugins
🚌 The IK Analysis plugin integrates Lucene IK analyzer into Elasticsearch and OpenSearch, support customized dictionary.
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Several simple examples for popular neural network toolkits calling custom CUDA operators.
[CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
[AAAI 2025] Linear-complexity Visual Sequence Learning with Gated Linear Attention
A Framework of Small-scale Large Multimodal Models
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LAVIS - A One-stop Library for Language-Vision Intelligence
[CVPR 2024] Code for "Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation"