What is the Cosmos-Predict2 ComfyUI Workflow?
The Cosmos-Predict2 ComfyUI workflow brings NVIDIA's next-generation physical world foundation model to your fingertips, enabling both high-quality text-to-image generation and innovative video-to-world transformation. Think of it as having a digital crystal ball that can not only create stunning images from text descriptions but also predict and generate realistic video sequences that follow the laws of physics.
This Cosmos-Predict2 workflow leverages a sophisticated 2B-parameter foundation model specifically designed for physical AI scenarios. What makes Cosmos-Predict2 special? It doesn't just generate pretty pictures - it understands physics, environmental interactions, and realistic dynamics, making Cosmos-Predict2 perfect for industrial simulation, autonomous driving scenarios, urban planning, and scientific research applications.
Key Features and Benefits of Cosmos-Predict2
Dual Generation Modes: Cosmos-Predict2 supports both text-to-image generation for creating static visuals and video-to-world transformation for dynamic scene prediction, all within a single Cosmos-Predict2 ComfyUI workflow.
Physical Accuracy: Unlike standard image generators, Cosmos-Predict2 maintains exceptional physical accuracy and environmental interactivity, ensuring Cosmos-Predict2 generated content follows real-world physics and dynamics.
Professional Applications: Cosmos-Predict2 is designed for serious use cases including industrial simulation, autonomous driving development, urban planning visualization, and scientific research where accuracy matters most.
Flexible Video Control: The Cosmos-Predict2 video generation component includes optional first and last frame control, allowing precise direction over temporal sequences and scene transitions within the Cosmos-Predict2 workflow.
How to Use Cosmos-Predict2 in ComfyUI
Cosmos-Predict2 Text-to-Image Workflow
Set your image dimensions
- Use the EmptySD3LatentImage node to define output size for your Cosmos-Predict2 generation:
- Default: 1024x1024 pixels
- Adjust width and height based on your Cosmos-Predict2 requirements
- Keep batch_size at 1 for single image generation
Craft your text prompt
- In the CLIP Text Encode (Prompt) node for Cosmos-Predict2:
- Write detailed, descriptive prompts for best Cosmos-Predict2 results
- Cosmos-Predict2 excels with physical world descriptions
- Include environmental details and spatial relationships in your Cosmos-Predict2 prompts <img src="https://cdn.runcomfy.net/workflow_assets/1248/readme01.webp" alt="Cosmos-Predict2" width="750"/>
- Generate and save
- Hit
Runto create your Cosmos-Predict2 image, which saves automatically to the output directory.
- Hit
Cosmos-Predict2 Video-to-World Workflow
- Upload your input image
- Use the Load Image node to import your starting frame for Cosmos-Predict2 video generation.
- Configure video parameters
- In the CosmosPredict2ImageToVideoLatent node:
- Width/Height: Set to 848x480 for optimal Cosmos-Predict2 performance
- Length: 33 frames for ~2 second Cosmos-Predict2 videos at 16fps
- Batch_size: Keep at 1 for Cosmos-Predict2 processing <img src="https://cdn.runcomfy.net/workflow_assets/1248/readme02.webp" alt="Cosmos-Predict2" width="750"/>
- In the CosmosPredict2ImageToVideoLatent node:
- Optional frame control
- enable the bypassed nodes (Ctrl+B) for first and last frame control in Cosmos-Predict2:
- Upload additional images to guide Cosmos-Predict2 video start and end points
- Perfect for creating specific narrative sequences with Cosmos-Predict2
- enable the bypassed nodes (Ctrl+B) for first and last frame control in Cosmos-Predict2:
- Run video generation
- Execute the Cosmos-Predict2 workflow to create physics-aware video sequences that maintain temporal consistency.
Essential Settings for Cosmos-Predict2
- KSampler Configuration for Cosmos-Predict2:
- Steps: 35 (default for Cosmos-Predict2 quality balance)
- CFG: 4.0 for proper Cosmos-Predict2 guidance strength
- Sampler: euler (recommended for Cosmos-Predict2)
- Scheduler: karras for smooth Cosmos-Predict2 generation
- Cosmos-Predict2 Video Generation Settings:
- FPS: 16 frames per second (optimal for Cosmos-Predict2)
- Format: Auto-detects best codec for your Cosmos-Predict2 system
- Lower frame counts = faster Cosmos-Predict2 generation, higher = smoother motion
Acknowledgement
This Cosmos-Predict2 ComfyUI workflow integrates NVIDIA's Cosmos-Predict2 foundation model, a breakthrough in physical world AI generation. Special recognition to the NVIDIA research team for developing this advanced Cosmos-Predict2 physical simulation model and to the ComfyUI community for enabling seamless Cosmos-Predict2 integration. The Cosmos-Predict2 model weights and technical implementation follow NVIDIA's official Cosmos-Predict2 specifications, ensuring authentic performance for professional applications.
More Resources About Cosmos-Predict2
Explore technical resources and documentation related to Cosmos-Predict2:
- GitHub Repository – Official Cosmos-Predict2 implementation and model files. Cosmos-predict2
- HuggingFace Hub – Pre-trained Cosmos-Predict2 model weights and documentation for ComfyUI integration. Cosmos-Predict2



