Step1x Edit2: Region-Precise Text-Guided Image Editing on playground and API | RunComfy

stepfun-ai/stepx-edit2

Transform any image with natural-language edits for identity-true, region-precise results, streamlining product retouching, background replacement, and creative variations through API or browser.

The negative prompt to use. Use it to address details that you don't want in the image. This could be colors, objects, scenery and even the small details (e.g. moustache, blurry, low resolution).
Enable thinking mode. Uses multimodal language model knowledge to interpret abstract editing instructions.
Enable reflection mode. Reviews outputs, corrects unintended changes, and determines when editing is complete.
The format of the generated image.

Introduction to Step1x Edit2

StepFun AI's Step1X-Edit v2 turns natural-language instructions and a reference image into high-fidelity edits at $0.2 per image with identity-preserving, region-precise control. Trading manual masking and layer-by-layer retouching for reasoning-led multimodal edits that follow your brief and preserve context, Step1x Edit2 streamlines production by eliminating tedious selections and rework, built for e-commerce teams, design studios, and marketing workflows. For developers, Step1x Edit2 on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.
Ideal for: SKU-Accurate Product Retouching | Region-Precise Background Replacement | Brand-Consistent Creative Variations

Examples of Step1x Edit2 in Action

Related Playgrounds

Frequently Asked Questions

What are the main capabilities of Step1x Edit2 when used for text-to-image generation?

Step1x Edit2 excels at both precise image editing and text-to-image creation, allowing users to add, remove, or restyle visual elements through natural language prompts. Its reasoning loop enhances understanding of abstract instructions, producing consistent, high-quality visual results suitable for advanced creative pipelines.

How does Step1x Edit2 differ from earlier versions in terms of text-to-image output quality?

Compared with v1.0 and v1.1, Step1x Edit2 introduces reasoning and reflection modes that significantly improve prompt fidelity in both editing and text-to-image modes. The resulting images show higher realism, better lighting consistency, and improved control over edits based on user instructions.

What are the typical technical limitations of Step1x Edit2 for image resolution and token length?

Step1x Edit2 generally supports up to 1024×1024 output resolution per generation and accepts text prompts up to roughly 512 tokens for text-to-image or edit-based tasks. Beyond these parameters, output quality may degrade or inference may fail due to memory constraints.

How many reference inputs can Step1x Edit2 handle for combined text-to-image and editing modes?

Step1x Edit2 typically allows one primary reference image plus up to two auxiliary control references when using extensions such as ControlNet or IP-Adapter. This enables finer control over layout, depth, or style when blending reference-guided and text-to-image synthesis.

What improvements make Step1x Edit2 stand out against models like Nano Banana Pro or Seedream 4.5?

Step1x Edit2 offers open-source deployment, instruction-driven editing, and reasoning-assisted outputs not found in most proprietary systems. While Nano Banana Pro excels at realism and narrative imagery, Step1x Edit2 provides interpretable and reproducible results, particularly for precise text-to-image corrections and localized edits.

How can developers move from testing Step1x Edit2 in the RunComfy Playground to full production integration?

To transition Step1x Edit2 from the RunComfy Playground to production, developers should use the RunComfy API, which mirrors playground behavior. Through API keys, usd-based billing, and secure endpoints, text-to-image or edit requests can be automated and scaled while maintaining consistent model fidelity.

Does Step1x Edit2 require high-end hardware for optimal text-to-image results?

While Step1x Edit2 benefits from GPUs with 40–80 GB VRAM for maximum quality, it can run efficiently on smaller devices using FP8 quantization or LoRA fine-tuning. For light workloads or testing, the RunComfy Playground automatically manages hardware selection to optimize both speed and cost.

Can Step1x Edit2 be fine-tuned for specific visual domains or tasks such as product design?

Yes. Step1x Edit2 supports LoRA-based fine-tuning, enabling developers and artists to adapt the model for domain-specific stylistic or object categories. This process enhances accuracy in text-to-image synthesis where brand or thematic consistency is critical.

What licensing terms govern the use of Step1x Edit2 outputs in commercial settings?

Step1x Edit2 is released under the Apache-2.0 license, allowing commercial usage provided attribution and license terms are respected. However, users generating text-to-image content via external tools like RunComfy should also review their platform-specific usage and billing policies.

What kind of output quality benchmarks demonstrate Step1x Edit2’s progress?

Benchmarks such as GEdit-Bench and KRIS-Bench show Step1x Edit2 achieving improved scores in sharpness, realism, and prompt faithfulness, particularly for complex text-to-image edits. Its reflective reasoning mechanism reduces artifact rates and enhances the precision of modified regions.