Stable Diffusion 3 (SD3) | Text to Image

Stable Diffusion 3 Node is now readily available in the RunComfy Beta Version, making it easily accessible for your projects. You can use the Stable Diffusion 3 Node directly within this workflow or integrate the Stable Diffusion 3 Node into your existing workflows. Please ensure you first obtain your API key from the Stability API key page to.

ComfyUI Workflow

Stable Diffusion 3 in ComfyUI
Want to run this workflow?
  • Fully operational workflows
  • No missing nodes or models
  • No manual setups required
  • Features stunning visuals




1. Integrating Stable Diffusion 3 Into Your Creative Workflow

1.1. Starting with Stable Diffusion 3 API

To incorporate Stable Diffusion 3 into your projects, begin by accessing the APIs for both the standard version and the Turbo variant through the Stability AI Developer Platform API.

  • Obtaining Your API Key: First, grab your Stability API key. You'll receive 25 free credits to start, which you can use to generate images.
  • Usage Costs:
    • SD3: Each image generation costs 6.5 credits.
    • SD3 Turbo: A more cost-effective option at 4 credits per image.

Please ensure that your API key has enough credit. If you queue a prompt but do not receive a result, check your credit balance on the Stability Platform. 😃

1.2. Integrating Stable Diffusion 3 Node into Your Workflow (Use RunComfy Beta Version)

Stable Diffusion 3 Node is now preloaded into the RunComfy Beta Version, making it effortlessly accessible for your projects. You have the flexibility to either use the Stable Diffusion 3 node directly within this workflow or integrate the Stable Diffusion 3 Node into your existing workflows.

Here are some key features of the Stable Diffusion 3 Node:

  • Positive Prompts: Direct the model to focus on specific themes or elements in your artwork.
  • Negative Prompts: Specify what elements should be avoided in the images. (Note: The SD3 Turbo model does not support negative prompts.)
  • Aspect Ratios: Choose from a wide range, including "21:9", "16:9", "5:4", "3:2", "1:1", "2:3", "4:5", "9:16", "9:21". (Note: SD3's image-to-image mode does not support aspect ratio selection.)
  • Mode: Configurable for both text-to-image and image-to-image mode.
  • Model Options: Includes support for both SD3 and SD3 Turbo models.
  • Seed: Ensure consistency across generated images.
  • Strength: These is applicable for image-to-image mode.
ComfyUI Stable Diffusion 3

2. What is Stable Diffusion 3

Stable Diffusion 3 is a cutting-edge AI model specifically designed for generating images from text prompts. It represents the third iteration in the Stable Diffusion series and aims to deliver improved accuracy, better adherence to the nuances of prompts, and superior visual aesthetics compared to earlier versions and other models like DALL·E 3, Midjourney v6, and Ideogram v1.

3. Technical Architecture of Stable Diffusion 3

At the core of Stable Diffusion 3 lies the Multimodal Diffusion Transformer (MMDiT) architecture. This innovative framework enhances how the model processes and integrates textual and visual information. Unlike its predecessors that utilized a single set of neural network weights for both image and text processing, Stable Diffusion 3 employs separate weight sets for each modality. This separation allows for more specialized handling of text and image data, leading to improved text understanding and spelling in the generated images.

Components of MMDiT Architecture

  • Text Embedders: Stable Diffusion 3 uses a combination of three text embedding models, including two CLIP models and T5, to convert text into a format that the AI can understand and process.
  • Image Encoder: An enhanced autoencoding model is used for converting images into a form suitable for the AI to manipulate and generate new visual content.
  • Dual Transformer Approach: The architecture features two distinct transformers for text and images, which operate independently but are interconnected for attention operations. This setup allows both modalities to influence each other directly, enhancing the coherence between the text input and the image output.

4. What’s New and Improved in Stable Diffusion 3

  • Adherence to Prompts: SD3 excels in closely following the specifics of user prompts, particularly those that involve complex scenes or multiple subjects. This precision in understanding and rendering detailed prompts allows it to outperform other leading models such as DALL·E 3, Midjourney v6, and Ideogram v1, making it highly reliable for projects requiring strict adherence to given instructions.
  • Text in Images: With its advanced Multimodal Diffusion Transformer (MMDiT) architecture, SD3 significantly enhances the clarity and readability of text within images. By employing separate sets of weights for processing image and language data, the model achieves superior text comprehension and spelling accuracy. This is a substantial improvement over earlier versions of Stable Diffusion, addressing one of the common challenges in text-to-image AI applications.
  • Visual Quality: SD3 not only matches but in many cases surpasses the visual quality of images generated by its competitors. The images produced are not only aesthetically pleasing but also maintain high fidelity to the prompts, thanks to the model's refined ability to interpret and visualize textual descriptions. This makes SD3 a top choice for users seeking exceptional visual aesthetics in their generated imagery.
ComfyUI Stable Diffusion 3

For detailed insights into the model, please visit Stable Diffusion 3 research paper.

Want More ComfyUI Workflows?