logo
RunComfy
  • Models
  • ComfyUI
  • TrainerNew
  • API
  • Pricing
discord logo
MODELS
Explore
All Models
LIBRARY
Generations
MODEL APIS
API Docs
API Keys
ACCOUNT
Usage

Step1x Edit2: Region-Precise Text-Guided Image Editing on playground and API | RunComfy

stepfun-ai/stepx-edit2

Transform any image with natural-language edits for identity-true, region-precise results, streamlining product retouching, background replacement, and creative variations through API or browser.

The negative prompt to use. Use it to address details that you don't want in the image. This could be colors, objects, scenery and even the small details (e.g. moustache, blurry, low resolution).
Enable thinking mode. Uses multimodal language model knowledge to interpret abstract editing instructions.
Enable reflection mode. Reviews outputs, corrects unintended changes, and determines when editing is complete.
The format of the generated image.
Idle
The rate is $0.2 per image.

Introduction to Step1x Edit2

StepFun AI's Step1X-Edit v2 turns natural-language instructions and a reference image into high-fidelity edits at $0.2 per image with identity-preserving, region-precise control. Trading manual masking and layer-by-layer retouching for reasoning-led multimodal edits that follow your brief and preserve context, Step1x Edit2 streamlines production by eliminating tedious selections and rework, built for e-commerce teams, design studios, and marketing workflows. For developers, Step1x Edit2 on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.
Ideal for: SKU-Accurate Product Retouching | Region-Precise Background Replacement | Brand-Consistent Creative Variations

Examples of Step1x Edit2 in Action

Vintage floral hand mirror with gold frame and yellow rose handle styled beside a book and chain, created using Step1x Edit2
Off-road vehicle at the edge of a black volcanic cliff and golden desert landscape, aligned with Step1x Edit2 model realism.
Futuristic girl with a vinyl record and glittery outfit in holographic fashion, styled using Step1x Edit2.
A Middle Eastern farmer with cows in a desert village setting, created using Step1x Edit2 model.
Minimalist still-life interior with persimmons in glass vase and ocean view, created using Step1x Edit2.
Blue shark swimming near ocean surface in clear water, captured using Step1x Edit2 for marine wildlife composition

Model Overview


  • Provider: StepFun
  • Task: text-to-image
  • Max Resolution/Duration: Up to 1024×1024
  • Summary: Step1x Edit2 is a text-driven visual model built to transform images with identity-true, region-precise edits and generate creative variations at up to 1024×1024. It leverages reasoning-enhanced prompt understanding (thinking + reflection) to follow complex instructions, minimizing unintended changes. Step1x Edit2 is production-ready for product retouching, background replacement, and controlled creative outcomes via API or browser.

Key Capabilities


Identity-true, region-precise outcomes

  • Step1x Edit2 preserves subject identity and unedited areas while executing localized, instruction-driven changes.
  • Expect high consistency on faces, logos, and product details, with spatially targeted edits that avoid global drift.

Reasoning-enhanced instruction following

  • Step1x Edit2 optionally activates a thinking–editing–reflection loop to interpret abstract or multi-step prompts.
  • This improves adherence to nuanced goals (e.g., color/material changes while keeping composition) and reduces over/under-editing.

Flexible prompt control for production workflows

  • Step1x Edit2 supports negative prompts, guidance scaling, seeding, and step counts to tune fidelity vs. creativity.
  • Combined with a safety checker and synchronous delivery mode, Step1x Edit2 integrates cleanly into automated pipelines.

Input Parameters


Core Prompts


ParameterTypeDefault/RangeDescription
promptstringDefault: ""Text instruction describing the desired result.
image_urlimage_uri (string)Default: ""Source image URL for editing-driven generation.

Guidance & Sampling


ParameterTypeDefault/RangeDescription
negative_promptstringDefault: ""Terms to avoid (objects, colors, styles, artifacts).
seedintegerDefault: 0Set to control stochasticity and reproducibility.
guidance_scalefloatDefault: 6Classifier-free guidance strength; higher follows prompt more strictly.
num_inference_stepsintegerDefault: 50Diffusion steps; higher can improve detail at the cost of latency.
output_formatstring (jpeg, png)Default: jpegOutput image encoding.

Advanced & Runtime


ParameterTypeDefault/RangeDescription
enable_thinking_modebooleanDefault: trueEnables reasoning to re-interpret complex instructions before editing.
enable_reflection_modebooleanDefault: truePost-edit review to fix unintended changes and decide completion.

How Step1x Edit2 compares to other models


  • Vs Step1X-Edit v1.0/v1.1/v1p2 Generation: Compared to earlier family versions, Step1x Edit2 delivers stronger instruction following with optional thinking and reflection, improved precision on localized modifications, and streamlined inference controls. Key improvements include better handling of abstract edits and fewer unintended changes to unedited regions. Ideal Use Case: choose Step1x Edit2 when you need robust, identity-safe edits and instruction fidelity at up to 1024×1024.
  • Vs Flux 2 Generation: Compared to Flux 2, Step1x Edit2 delivers superior identity preservation on edit tasks and precise region targeting, while Flux 2 often excels at ultra-high-resolution, large-scene T2I. Ideal Use Case: choose Step1x Edit2 for edit-critical workflows where controlled changes are more important than 4K scene generation.
  • Vs Z-Image-Turbo Generation: Compared to Z-Image-Turbo, Step1x Edit2 delivers more reliable targeted edits and semantic consistency, while Z-Image-Turbo emphasizes speed for pure T2I at moderate resolutions. Ideal Use Case: choose Step1x Edit2 when your pipeline prioritizes edit accuracy over raw generation speed.
  • Vs Seedream 4.5 Generation: Compared to Seedream, Step1x Edit2 delivers better region-precise edits and identity fidelity, while Seedream focuses on creative scene composition. Ideal Use Case: choose Step1x Edit2 for product retouching, background swaps, and identity-safe creative variations.
  • Vs Nano Banana Pro Generation: Compared to Nano Banana Pro, Step1x Edit2 delivers more conservative, content-faithful changes on existing images; Nano Banana Pro emphasizes stylized design and high-res composition. Ideal Use Case: choose Step1x Edit2 for brand/asset integrity and controlled edits.

API Integration


Developers can integrate Step1x Edit2 using the RunComfy API with standard HTTP requests and JSON payloads. Step1x Edit2 supports straightforward parameterization for prompts, guidance, safety, and reasoning, enabling fast adoption into existing pipelines and CI/CD.


Note: API Endpoint for Step1x Edit2


Official resources and licensing


  • Hugging Face: https://huggingface.co/stepfun-ai/Step1X-Edit-v1p1-diffusers
  • GitHub: https://github.com/stepfun-ai/Step1X-Edit
  • Official Website: https://fal.ai/models/fal-ai/stepx-edit2
  • License: Apache-2.0 for the open-source Step1X-Edit family. Commercial use is permitted under the Apache-2.0 license.

Related Playgrounds

reve/edit

Transform visuals with smart region edits and multi-image blending for precise, high-fidelity results.

flux-1-kontext/pro/text-to-image

Fast, precise, iterative AI image editing model.

flux-2/pro/text-to-image

Create reliable, studio-grade visuals with precise color and layout control.

qwen-image/qwen-image-edit-2511

Advanced image-to-image tool with geometry-aware edits and consistent identity control for creative workflows.

imagen-4/text-to-image

Sharp visual clarity and fast output for layout-rich image design

seedream-4-5/edit

Transform visuals with Seedream 4.5 for coherent, photoreal image creation and precise brand consistency.

Frequently Asked Questions

What are the main capabilities of Step1x Edit2 when used for text-to-image generation?

Step1x Edit2 excels at both precise image editing and text-to-image creation, allowing users to add, remove, or restyle visual elements through natural language prompts. Its reasoning loop enhances understanding of abstract instructions, producing consistent, high-quality visual results suitable for advanced creative pipelines.

How does Step1x Edit2 differ from earlier versions in terms of text-to-image output quality?

Compared with v1.0 and v1.1, Step1x Edit2 introduces reasoning and reflection modes that significantly improve prompt fidelity in both editing and text-to-image modes. The resulting images show higher realism, better lighting consistency, and improved control over edits based on user instructions.

What are the typical technical limitations of Step1x Edit2 for image resolution and token length?

Step1x Edit2 generally supports up to 1024×1024 output resolution per generation and accepts text prompts up to roughly 512 tokens for text-to-image or edit-based tasks. Beyond these parameters, output quality may degrade or inference may fail due to memory constraints.

How many reference inputs can Step1x Edit2 handle for combined text-to-image and editing modes?

Step1x Edit2 typically allows one primary reference image plus up to two auxiliary control references when using extensions such as ControlNet or IP-Adapter. This enables finer control over layout, depth, or style when blending reference-guided and text-to-image synthesis.

What improvements make Step1x Edit2 stand out against models like Nano Banana Pro or Seedream 4.5?

Step1x Edit2 offers open-source deployment, instruction-driven editing, and reasoning-assisted outputs not found in most proprietary systems. While Nano Banana Pro excels at realism and narrative imagery, Step1x Edit2 provides interpretable and reproducible results, particularly for precise text-to-image corrections and localized edits.

How can developers move from testing Step1x Edit2 in the RunComfy Playground to full production integration?

To transition Step1x Edit2 from the RunComfy Playground to production, developers should use the RunComfy API, which mirrors playground behavior. Through API keys, usd-based billing, and secure endpoints, text-to-image or edit requests can be automated and scaled while maintaining consistent model fidelity.

Does Step1x Edit2 require high-end hardware for optimal text-to-image results?

While Step1x Edit2 benefits from GPUs with 40–80 GB VRAM for maximum quality, it can run efficiently on smaller devices using FP8 quantization or LoRA fine-tuning. For light workloads or testing, the RunComfy Playground automatically manages hardware selection to optimize both speed and cost.

Can Step1x Edit2 be fine-tuned for specific visual domains or tasks such as product design?

Yes. Step1x Edit2 supports LoRA-based fine-tuning, enabling developers and artists to adapt the model for domain-specific stylistic or object categories. This process enhances accuracy in text-to-image synthesis where brand or thematic consistency is critical.

What licensing terms govern the use of Step1x Edit2 outputs in commercial settings?

Step1x Edit2 is released under the Apache-2.0 license, allowing commercial usage provided attribution and license terms are respected. However, users generating text-to-image content via external tools like RunComfy should also review their platform-specific usage and billing policies.

What kind of output quality benchmarks demonstrate Step1x Edit2’s progress?

Benchmarks such as GEdit-Bench and KRIS-Bench show Step1x Edit2 achieving improved scores in sharpness, realism, and prompt faithfulness, particularly for complex text-to-image edits. Its reflective reasoning mechanism reduces artifact rates and enhances the precision of modified regions.

Follow us
  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
Support
  • Discord
  • Email
  • System Status
  • Affiliate
Video Models/Tools
  • Wan 2.6
  • Wan 2.6 Text to Video
  • Veo 3.1 Fast Video Extend
  • Seedance Lite
  • Wan 2.2
  • Seedance 1.0 Pro Fast
  • View All Models →
Image Models
  • GPT Image 1.5 Image to Image
  • Flux 2 Max Edit
  • GPT Image 1.5 Text To Image
  • Gemini 3 Pro
  • seedream 4.0
  • Nano Banana Pro
  • View All Models →
Legal
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.