Cosmos-Predict2 Text2Image Video2World

ComfyUI Cosmos-Predict2 Workflow

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI Cosmos-Predict2 Examples

cosmos-predict2-text2image-video2world-comfyui-workflow-1248-example_01.webp

What is the Cosmos-Predict2 ComfyUI Workflow?#

The Cosmos-Predict2 ComfyUI workflow brings NVIDIA's next-generation physical world foundation model to your fingertips, enabling both high-quality text-to-image generation and innovative video-to-world transformation. Think of it as having a digital crystal ball that can not only create stunning images from text descriptions but also predict and generate realistic video sequences that follow the laws of physics.

This Cosmos-Predict2 workflow leverages a sophisticated 2B-parameter foundation model specifically designed for physical AI scenarios. What makes Cosmos-Predict2 special? It doesn't just generate pretty pictures - it understands physics, environmental interactions, and realistic dynamics, making Cosmos-Predict2 perfect for industrial simulation, autonomous driving scenarios, urban planning, and scientific research applications.

Key Features and Benefits of Cosmos-Predict2#

Dual Generation Modes: Cosmos-Predict2 supports both text-to-image generation for creating static visuals and video-to-world transformation for dynamic scene prediction, all within a single Cosmos-Predict2 ComfyUI workflow.

Physical Accuracy: Unlike standard image generators, Cosmos-Predict2 maintains exceptional physical accuracy and environmental interactivity, ensuring Cosmos-Predict2 generated content follows real-world physics and dynamics.

Professional Applications: Cosmos-Predict2 is designed for serious use cases including industrial simulation, autonomous driving development, urban planning visualization, and scientific research where accuracy matters most.

Flexible Video Control: The Cosmos-Predict2 video generation component includes optional first and last frame control, allowing precise direction over temporal sequences and scene transitions within the Cosmos-Predict2 workflow.

How to Use Cosmos-Predict2 in ComfyUI#

Cosmos-Predict2 Text-to-Image Workflow#

Set your image dimensions

Use the EmptySD3LatentImage node to define output size for your Cosmos-Predict2 generation:
- Default: 1024x1024 pixels
- Adjust width and height based on your Cosmos-Predict2 requirements
- Keep batch_size at 1 for single image generation

Craft your text prompt

In the CLIP Text Encode (Prompt) node for Cosmos-Predict2:
- Write detailed, descriptive prompts for best Cosmos-Predict2 results
- Cosmos-Predict2 excels with physical world descriptions
- Include environmental details and spatial relationships in your Cosmos-Predict2 prompts <img src="https://cdn.runcomfy.net/workflow_assets/1248/readme01.webp" alt="Cosmos-Predict2" width="750"/>
Generate and save
- Hit Run to create your Cosmos-Predict2 image, which saves automatically to the output directory.

Cosmos-Predict2 Video-to-World Workflow#

Upload your input image
- Use the Load Image node to import your starting frame for Cosmos-Predict2 video generation.
Configure video parameters
- In the CosmosPredict2ImageToVideoLatent node:
  - Width/Height: Set to 848x480 for optimal Cosmos-Predict2 performance
  - Length: 33 frames for ~2 second Cosmos-Predict2 videos at 16fps
  - Batch_size: Keep at 1 for Cosmos-Predict2 processing <img src="https://cdn.runcomfy.net/workflow_assets/1248/readme02.webp" alt="Cosmos-Predict2" width="750"/>
Optional frame control
- enable the bypassed nodes (Ctrl+B) for first and last frame control in Cosmos-Predict2:
  - Upload additional images to guide Cosmos-Predict2 video start and end points
  - Perfect for creating specific narrative sequences with Cosmos-Predict2
Run video generation
- Execute the Cosmos-Predict2 workflow to create physics-aware video sequences that maintain temporal consistency.

Essential Settings for Cosmos-Predict2

KSampler Configuration for Cosmos-Predict2:
- Steps: 35 (default for Cosmos-Predict2 quality balance)
- CFG: 4.0 for proper Cosmos-Predict2 guidance strength
- Sampler: euler (recommended for Cosmos-Predict2)
- Scheduler: karras for smooth Cosmos-Predict2 generation
Cosmos-Predict2 Video Generation Settings:
- FPS: 16 frames per second (optimal for Cosmos-Predict2)
- Format: Auto-detects best codec for your Cosmos-Predict2 system
- Lower frame counts = faster Cosmos-Predict2 generation, higher = smoother motion

Acknowledgement#

This Cosmos-Predict2 ComfyUI workflow integrates NVIDIA's Cosmos-Predict2 foundation model, a breakthrough in physical world AI generation. Special recognition to the NVIDIA research team for developing this advanced Cosmos-Predict2 physical simulation model and to the ComfyUI community for enabling seamless Cosmos-Predict2 integration. The Cosmos-Predict2 model weights and technical implementation follow NVIDIA's official Cosmos-Predict2 specifications, ensuring authentic performance for professional applications.

More Resources About Cosmos-Predict2#

Explore technical resources and documentation related to Cosmos-Predict2:

GitHub Repository – Official Cosmos-Predict2 implementation and model files. Cosmos-predict2
HuggingFace Hub – Pre-trained Cosmos-Predict2 model weights and documentation for ComfyUI integration. Cosmos-Predict2

Want More ComfyUI Workflows?

Nvidia Cosmos | Text & Image to Video Creation

Generate videos from text prompts or create frame interpolation between two images with Nvidia's Cosmos.

Hunyuan Video 1.5 | Fast AI Video Generator

Turn text or images into smooth 1080p videos quickly and easily.

AnimateDiff + ControlNet TimeStep KeyFrame | Morphing Animation

Set ControlNet Timestep KeyFrames, such as the first and last frames, to create morphing animations.

LTX Video | Image+Text to Video

Generates videos from image+text prompts.

Wan Alpha | Transparent Video Generator

Alpha magic: instant transparent background videos for VFX and design.

Blender to ComfyUI AI Renderer 2.0 | Motion Video Maker

Turn Blender renders into rich, cinematic AI-driven animations fast.

JoyAI Image Edit ComfyUI | Smart AI Photo Editor

Transforms images with precise prompt-driven AI edits.

Wan 2.2 FLF2V | First-Last Frame Video Generation

Generate smooth videos from a start and end frame using Wan 2.2 FLF2V.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Cosmos-Predict2 | Text2Image & Video2World