ComfyUI > Workflows > OmniGen2 | Text-to-Image & Editing

OmniGen2 | Text-to-Image & Editing

Experience the power of OmniGen2's unified multimodal generation in ComfyUI. This workflow uses a 7B parameter model with dual-path Transformer architecture to deliver exceptional text-to-image generation and text-guided image editing. Built on Qwen 2.5 VL foundation, OmniGen2 excels at compositional understanding, long prompt following, and precise image modifications while maintaining visual quality and consistency.

ComfyUI OmniGen2 Workflow

OmniGen2 ComfyUI Workflow | Unified Text-to-Image Generation

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI OmniGen2 Examples

omnigen2-comfyui-workflow-unified-text-to-image-generation-1247-example_01.webp

omnigen2-comfyui-workflow-unified-text-to-image-generation-1247-example_02.webp

omnigen2-comfyui-workflow-unified-text-to-image-generation-1247-example_03.webp

omnigen2-comfyui-workflow-unified-text-to-image-generation-1247-example_04.webp

ComfyUI OmniGen2 Description

What is the OmniGen2 ComfyUI Workflow?

The OmniGen2 ComfyUI workflow brings unified multimodal generation to your fingertips, combining text-to-image synthesis and instruction-based image editing in a single, powerful framework. Think of it as having a creative AI assistant that not only generates stunning images from your text descriptions but also understands and executes complex editing commands with remarkable precision.

This workflow leverages a sophisticated 7B parameter model built on the Qwen 2.5 VL foundation, featuring a unique dual-path Transformer architecture. What makes this model special is its decoupled design - using separate pathways for text and image generation, allowing it to maintain exceptional language understanding while delivering high-fidelity visual outputs that stay true to your creative vision.

Key Features and Benefits of OmniGen2

Dual Generation Modes: OmniGen2 creates new images from text or edits existing ones with natural language commands through the intuitive interface.

Advanced Architecture: The OmniGen2 dual-path design separates text and image processing for optimal performance.

Compositional Understanding: OmniGen2 handles complex multi-element prompts with exceptional accuracy in every generation.

Precise Image Editing: Make targeted changes while preserving the rest of your image perfectly using OmniGen2 advanced algorithms.

Multimodal Reflection: OmniGen2 self-analyzes and refines outputs for improved results.

How to Use OmniGen2 in ComfyUI

OmniGen2 Text-to-Image Workflow

Set your image dimensions

Use the EmptySD3LatentImage node to define output size for OmniGen2:
- Adjust width and height based on your OmniGen2 needs
- Keep batch_size at 1 for single image generation

Craft your text prompt

In the CLIP Text Encode (Prompt) nodes for OmniGen2:
- Write detailed, descriptive prompts in the first encoder
- Leave the second encoder empty or add negative prompts
- OmniGen2 excels with complex compositional descriptions

Generate and save

Hit Run to create your OmniGen2 image
The VAE Decode converts latents to final image
Save Image automatically saves your OmniGen2 creation to output folder

OmniGen2 Image Editing Workflow

Upload your source image

Use the Load Image node to import the image you want to edit with OmniGen2

Write your editing instruction

In the CLIP Text Encode (Prompt) node for OmniGen2:
- Describe what changes you want clearly and specifically
- Examples: "Transform character's hair color into natural silver-white", "Add aviator sunglasses"
- Natural language commands work perfectly with OmniGen2

Configure OmniGen2 editing parameters

Scale Image to Total Pixels node:
- upscale_method: area (maintains quality during resizing)
- megapixels: 2.00 (controls total pixel count)
  - This resizes your image to approximately 2 million pixels total
  - For example: would scale a 1920x1080 image to maintain ~2MP
  - Higher values = more detail but slower processing
  - Lower values = faster generation but less detail
  - 2.00 is optimal for editing capabilities
VAE Encode converts your scaled image to latent space

Optional: Enable second image input

The purple (bypassed) nodes allow multi-image operations:
- Press Ctrl+B to toggle bypass mode
- Upload a second image for style transfer or object insertion
- Perfect for tasks like "combine elements from image 1 and image 2"

Generate edited result

Execute the OmniGen2 workflow to see your edits applied
Results maintain high fidelity while following instructions precisely

Acknowledgments

This ComfyUI workflow integrates the groundbreaking OmniGen2 model developed by researchers at Beijing Academy of Artificial Intelligence. Special recognition goes to the team for creating this unified multimodal generation system that pushes the boundaries of what's possible with a 7B parameter model. The architecture represents a significant advancement in balancing model efficiency with generation quality.

More Resources About OmniGen2

OmniGen2 is released under open-source licensing, making it freely available for both research and commercial applications. For more information about OmniGen2:

GitHub Repository - Official implementation and model architecture details:
Project Page - Comprehensive overview with demos and technical insights:
ComfyUI Examples - Step-by-step tutorials and additional workflows:

Want More ComfyUI Workflows?

FLUX Kontext Dev | Intelligent Image Editing

Kontext Dev = Controllable + All Graphic Design Needs in One Tool

DreamO | Unified Multi-Task Image Customization Framework

Perform identity, style, try-on, and multi-condition image generation from 1–3 references

BAGEL AI | T2I + I2T + I2I

Multimodal understanding and generation with open-source AI.

Step1X-Edit | AI Image Editing Tool

Perform 11 editing operations with natural language in Step1X-Edit.

Cosmos-Predict2 | Text2Image & Video2World

Fast and real! NVIDIA Cosmos with true physics.

InfiniteYou | Identity-Preserving Face Generation

Dual-mode identity-preserving generation with Face Combine and Zero-Shot workflows using InfiniteYou.

InstantID | Face to Sticker

Utilize Instant ID and IPAdapter to create customizable, amazing face stickers.

Consistent Style Transfer with Unsampling

Controlling latent noise with Unsampling helps dramatically increase consistency in video style transfer.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.