Stars
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
It's not a list of papers, but a list of paper reading lists...
The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
A generative world for general-purpose robotics & embodied AI learning.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Empowering Unified MLLM with Multi-granular Visual Generation
A curated list of resources for using LLMs to develop more competitive grant applications.
[CVPR 2024] Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Official implementation of FouriScale (ECCV2024)
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
The offical implementation of "Lightweight Image Super-Resolution with Superpixel Token Interaction" (ICCV2023)
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.
A latent text-to-image diffusion model
Teach-DETR: Better Training DETR with Teachers
[CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Collect super-resolution related papers, data, repositories
Sequencer: Deep LSTM for Image Classification
[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
A PyTorch Library for Meta-learning Research
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
Crawl & visualize ICLR papers and reviews.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
This is an official implementation for "Video Swin Transformers".