Cosmos-Predict2 | Text2Image & Video2World
This comprehensive ComfyUI workflow harnesses NVIDIA's Cosmos-Predict2, a cutting-edge physical world foundation model designed for high-quality visual generation. Create stunning images from text descriptions or transform videos with exceptional physical accuracy and environmental interactivity. The model excels at simulating complex physical phenomena and dynamic scenes, making it perfect for industrial simulation, autonomous driving visualization, urban planning, and scientific research applications.ComfyUI Cosmos-Predict2 Workflow

- Fully operational workflows
- No missing nodes or models
- No manual setups required
- Features stunning visuals
ComfyUI Cosmos-Predict2 Examples

ComfyUI Cosmos-Predict2 Description
What is the Cosmos-Predict2 ComfyUI Workflow?
The Cosmos-Predict2 ComfyUI workflow brings NVIDIA's next-generation physical world foundation model to your fingertips, enabling both high-quality text-to-image generation and innovative video-to-world transformation. Think of it as having a digital crystal ball that can not only create stunning images from text descriptions but also predict and generate realistic video sequences that follow the laws of physics.
This Cosmos-Predict2 workflow leverages a sophisticated 2B-parameter foundation model specifically designed for physical AI scenarios. What makes Cosmos-Predict2 special? It doesn't just generate pretty pictures - it understands physics, environmental interactions, and realistic dynamics, making Cosmos-Predict2 perfect for industrial simulation, autonomous driving scenarios, urban planning, and scientific research applications.
Key Features and Benefits of Cosmos-Predict2
Dual Generation Modes: Cosmos-Predict2 supports both text-to-image generation for creating static visuals and video-to-world transformation for dynamic scene prediction, all within a single Cosmos-Predict2 ComfyUI workflow.
Physical Accuracy: Unlike standard image generators, Cosmos-Predict2 maintains exceptional physical accuracy and environmental interactivity, ensuring Cosmos-Predict2 generated content follows real-world physics and dynamics.
Professional Applications: Cosmos-Predict2 is designed for serious use cases including industrial simulation, autonomous driving development, urban planning visualization, and scientific research where accuracy matters most.
Flexible Video Control: The Cosmos-Predict2 video generation component includes optional first and last frame control, allowing precise direction over temporal sequences and scene transitions within the Cosmos-Predict2 workflow.
How to Use Cosmos-Predict2 in ComfyUI
Cosmos-Predict2 Text-to-Image Workflow
Set your image dimensions
- Use the EmptySD3LatentImage node to define output size for your Cosmos-Predict2 generation:
- Default: 1024x1024 pixels
- Adjust width and height based on your Cosmos-Predict2 requirements
- Keep batch_size at 1 for single image generation Craft your text prompt
- In the CLIP Text Encode (Prompt) node for Cosmos-Predict2:
- Write detailed, descriptive prompts for best Cosmos-Predict2 results
- Cosmos-Predict2 excels with physical world descriptions
- Include environmental details and spatial relationships in your Cosmos-Predict2 prompts
- Generate and save
- Hit
Run
to create your Cosmos-Predict2 image, which saves automatically to the output directory.
- Hit
Cosmos-Predict2 Video-to-World Workflow
- Upload your input image
- Use the Load Image node to import your starting frame for Cosmos-Predict2 video generation.
- Configure video parameters
- In the CosmosPredict2ImageToVideoLatent node:
- Width/Height: Set to 848x480 for optimal Cosmos-Predict2 performance
- Length: 33 frames for ~2 second Cosmos-Predict2 videos at 16fps
- Batch_size: Keep at 1 for Cosmos-Predict2 processing
- In the CosmosPredict2ImageToVideoLatent node:
- Optional frame control
- enable the bypassed nodes (Ctrl+B) for first and last frame control in Cosmos-Predict2:
- Upload additional images to guide Cosmos-Predict2 video start and end points
- Perfect for creating specific narrative sequences with Cosmos-Predict2
- enable the bypassed nodes (Ctrl+B) for first and last frame control in Cosmos-Predict2:
- Run video generation
- Execute the Cosmos-Predict2 workflow to create physics-aware video sequences that maintain temporal consistency.
Essential Settings for Cosmos-Predict2
- KSampler Configuration for Cosmos-Predict2:
- Steps: 35 (default for Cosmos-Predict2 quality balance)
- CFG: 4.0 for proper Cosmos-Predict2 guidance strength
- Sampler: euler (recommended for Cosmos-Predict2)
- Scheduler: karras for smooth Cosmos-Predict2 generation
- Cosmos-Predict2 Video Generation Settings:
- FPS: 16 frames per second (optimal for Cosmos-Predict2)
- Format: Auto-detects best codec for your Cosmos-Predict2 system
- Lower frame counts = faster Cosmos-Predict2 generation, higher = smoother motion
Acknowledgement
This Cosmos-Predict2 ComfyUI workflow integrates NVIDIA's Cosmos-Predict2 foundation model, a breakthrough in physical world AI generation. Special recognition to the NVIDIA research team for developing this advanced Cosmos-Predict2 physical simulation model and to the ComfyUI community for enabling seamless Cosmos-Predict2 integration. The Cosmos-Predict2 model weights and technical implementation follow NVIDIA's official Cosmos-Predict2 specifications, ensuring authentic performance for professional applications.
More Resources About Cosmos-Predict2
Explore technical resources and documentation related to Cosmos-Predict2:
- GitHub Repository – Official Cosmos-Predict2 implementation and model files.
- HuggingFace Hub – Pre-trained Cosmos-Predict2 model weights and documentation for ComfyUI integration.
Want More ComfyUI Workflows?
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.