GitHub - jonstephens85/Cosmos-WSL2: Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.

Website | HuggingFace | GPU-free Preview | Paper | Paper Website

Read first

This is a forked Windows Installation Tutorial and the main codes will not be updated This forked GitHub project is intented for folks who want to install and generate videos with Cosmos in a Windows native machine. You can also follow the guide and skip the WSL references to run it natively on Linux. As stated above, at the time you are using this forked project, the original Cosmos project may have been updated. Refer to the original project page if parts of this guide are not working.

The section below is from the original GitHub page. Jump down to Installation to get started.

NVIDIA Cosmos is a developer-first world foundation model platform designed to help Physical AI developers build their Physical AI systems better and faster. Cosmos contains

pre-trained models, available via Hugging Face under the NVIDIA Open Model License that allows commercial use of the models for free
training scripts under the Apache 2 License, offered through NVIDIA Nemo Framework for post-training the models for various downstream Physical AI applications

Details of the platform is described in the Cosmos paper. Preview access is avaiable at build.nvidia.com.

Key Features

Pre-trained Diffusion-based world foundation models for Text2World and Video2World generation where a user can generate visual simulation based on text prompts and video prompts.
Pre-trained Autoregressive-based world foundation models for Video2World generation where a user can generate visual simulation based on video prompts and optional text prompts.
Video tokenizers for tokenizing videos into continuous tokens (latent vectors) and discrete tokens (integers) efficiently and effectively.
Video curation pipeline for building your own video dataset. [Coming soon]
Post-training scripts via NeMo Framework to post-train the pre-trained world foundation models for various Physical AI setup.
Pre-training scripts via NeMo Framework for building your own world foundation model. [Diffusion] [Autoregressive] [Tokenizer].

Model Family

Model name	Description	Try it out
Cosmos-1.0-Diffusion-7B-Text2World	Text to visual world generation	Inference
Cosmos-1.0-Diffusion-14B-Text2World	Text to visual world generation	Inference
Cosmos-1.0-Diffusion-7B-Video2World	Video + Text based future visual world generation	Inference
Cosmos-1.0-Diffusion-14B-Video2World	Video + Text based future visual world generation	Inference
Cosmos-1.0-Autoregressive-4B	Future visual world generation	Inference
Cosmos-1.0-Autoregressive-12B	Future visual world generation	Inference
Cosmos-1.0-Autoregressive-5B-Video2World	Video + Text based future visual world generation	Inference
Cosmos-1.0-Autoregressive-13B-Video2World	Video + Text based future visual world generation	Inference
Cosmos-1.0-Guardrail	Guardrail contains pre-Guard and post-Guard for safe use	Embedded in model inference scripts

Installation

The original Cosmos project was intended to be run locally or deployed on a server via a Linux operation system. This guide walks you through the installation of Cosmos on Windows via WSL2.

Hardware requirements

Cosmos requires a recent GPU with at minimum 24 GB of VRAM. To run the 16B models, a minumum of 48 GB of VRAM is needed.

Supported Hardware Microarchitecture Compatibility:

NVIDIA Blackwell
NVIDIA Hopper
NVIDIA Ampere

Minimum hardware configuration for 7B: NVIDIA RTX 3090. Recommended 4090 or better
Minimum hardware configuration for 14B: NVIDIA RTX A6000. Recommend RTX 6000 Ada or better

Follow the Cosmos Installation Guide for Windows via WSL2 to setup the docker.

Example Usage

Inference

For inference with the pretrained models, please refer to Cosmos Diffusion Inference and Cosmos Autoregressive Inference.

The code snippet below provides a gist of the inference usage.

PROMPT="A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \
The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \
A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \
suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \
The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \
field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."

# Example using 7B model
PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/text2world.py \
    --checkpoint_dir checkpoints \
    --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
    --prompt "$PROMPT" \
    --offload_prompt_upsampler \
    --video_save_name Cosmos-1.0-Diffusion-7B-Text2World

text2world_example.mp4

We also offer multi-GPU inference support for Diffusion Text2World WFM models through NeMo Framework.

Post-training

NeMo Framework provides GPU accelerated post-training with general post-training for both diffusion and autoregressive models, with other types of post-training coming soon.

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
assets		assets
checkpoints		checkpoints
cosmos1		cosmos1
release_notes		release_notes
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ATTRIBUTIONS.md		ATTRIBUTIONS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
diffusion_demo.py		diffusion_demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website | HuggingFace | GPU-free Preview | Paper | Paper Website

Read first

Key Features

Model Family

Installation

Hardware requirements

Example Usage

Inference

Post-training

License and Contact

About

Releases

Packages

Languages

License

jonstephens85/Cosmos-WSL2

Folders and files

Latest commit

History

Repository files navigation

Website | HuggingFace | GPU-free Preview | Paper | Paper Website

Read first

Key Features

Model Family

Installation

Hardware requirements

Example Usage

Inference

Post-training

License and Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages