GPT Image 1.5: Identity-Preserving Image Editing & Generation on playground and API

openai/gpt-image-1-5/image-to-image

Generate and edit realistic images from text or photos with 4x faster renders, precise multi-step edits, consistent lighting, and accurate small-text for design, e-commerce, and marketing visuals.

Idle

The rate is $0.009 per image for low quality, $0.034 per image for medium quality, and $0.133 per image for high quality.

Introduction to GPT Image 1.5 Capabilities

OpenAI's GPT Image 1.5 generates and edits images from text and existing photos, starting at $0.009 per image with up to 4x faster renders and default 1024x1024 outputs, delivering precise multi-step editing and faithful small-text rendering. Trading manual masking and round-trips between apps for context-aware, identity-preserving transformations with precise add-remove-combine controls, GPT Image 1.5 streamlines production by removing tedious selection steps and keeping lighting, composition, and text consistent across edits, built for e-commerce teams, designers, marketers, and enterprise content pipelines. For developers, GPT Image 1.5 on RunComfy can be used both in the browser and via an HTTP API, so you don’t need to host or scale the model yourself.
Ideal for: Identity-Consistent Product Photography Edits | Photorealistic Try-On and Style Variations | Campaign-Ready Visuals with Accurate Text

Examples Created with GPT Image 1.5

Hyper-realistic winter portrait of a woman in a snow-covered fur-lined hood.

Silhouetted fisherman on a bridge at sunset near a mosque skyline, reflecting soft golden tones, inspired by GPT Image 1.5.

Overhead view of a vintage typewriter, journal, pencil, and coffee cup floating on fabric in a pool, crafted using GPT Image 1.5.

Two men wearing Santa hats sharing Coca-Cola in a festive setting, created using GPT Image 1.5.

Golden retriever, tabby cat, and pet rat snuggled on a couch watching TV, depicted using GPT Image 1.5 Image model.

Man reading a newspaper featuring a comparison between a humanoid robot and a banana with a computer chip, highlighting GPT Image 1.5 model.

GPT Image 1.5 Image to Image on X Platform

Model Overview

Provider: OpenAI
Task: image-to-image
Max Resolution/Duration: Up to 1536×1024 (or 1024×1536); default 1024×1024
Summary: GPT Image 1.5 is a high-fidelity image-to-image model built for precise, multi-step edits and fast iteration. It preserves lighting, composition, and subject likeness while following detailed instructions, and it renders small, dense text with greater clarity. GPT Image 1.5 is optimized for production workflows in design, e-commerce, and marketing that demand consistent visual quality and speed.

Key Capabilities

Precision edits that preserve lighting, composition, and likeness

GPT Image 1.5 performs targeted transformations on existing images (style changes, add/remove elements, apparel/hairstyle adjustments) while maintaining scene lighting, camera composition, and subject identity.
Results stay consistent across iterative edits, enabling reliable multi-step workflows without degradation.

4× faster generation for iterative creative cycles

GPT Image 1.5 delivers up to four times faster renders than GPT Image 1, reducing turnaround for review-and-revise loops.
Faster sampling makes A/B exploration and fine control over edits practical at scale.

Stronger prompt adherence and clearer small text

GPT Image 1.5 follows complex, instruction-heavy prompts more reliably than prior GPT Image / DALL·E models.
It improves the rendering of small and dense text (labels, UI elements, packaging), critical for e-commerce and brand assets.

Input Parameters

Core Prompts

Parameter	Type	Default/Range	Description
prompt	string	default: ""	Required. Instruction text describing the generation or the edit to apply.
image_urls	array[string]	default: []	One or more image URLs to use as sources or references for image-to-image edits.

Dimensions & Settings

Parameter	Type	Default/Range	Description
image_size	string (enum)	auto, 1024x1024, 1536x1024, 1024x1536 (default: auto)	Target aspect/size. Use auto to let the model choose; specify exact dimensions for square, landscape, or portrait.
background	string (enum)	auto, transparent, opaque (default: auto)	Background handling. Transparent enables export-ready assets; opaque keeps a solid background.
quality	string (enum)	low, medium, high (default: high)	Rendering quality/performance tradeoff. High emphasizes fidelity.
input_fidelity	string (enum)	low, high (default: high)	Degree to preserve content from the first input image; high maintains stronger likeness and layout.

Output & Delivery

Parameter	Type	Default/Range	Description
output_format	string (enum)	jpeg, png, webp (default: png)	Output file format. Use PNG for transparency, JPEG for smaller size, WEBP for modern compression.

How GPT Image 1.5 compares to other models

Vs GPT Image 1 (gpt-image-1): Compared to GPT Image 1, GPT Image 1.5 delivers roughly 4× faster generation, stronger adherence to complex prompts, clearer small text, and better preservation of lighting, composition, and likeness across edits. Ideal when iterative precision and turnaround speed are both critical.
Vs DALL·E 3: Compared to DALL·E 3, GPT Image 1.5 emphasizes instruction-following and image-to-image editing fidelity, maintaining scene integrity during object additions/removals and style shifts. Choose GPT Image 1.5 for multi-step edits that must retain identity and layout.
Vs Flux 2: Flux 2 can target very high native resolutions and local deployment scenarios, but GPT Image 1.5 focuses on end-to-end speed, consistent editing, realistic transformations (e.g., try-ons), and streamlined UI/API integration. Use GPT Image 1.5 when enterprise-ready workflows and fast, repeatable edits matter most.

Detailed Research: GPT Image 1.5 vs. Google Nano Banana Pro

0. Executive Takeaways

Both models are production-grade image generation systems, but they optimize for different workflows:

- GPT Image 1.5 focuses on strong instruction-following, fast iteration, and precise image editing (OpenAI claims up to 4× faster generation).

- Nano Banana Pro emphasizes studio-style control, higher output resolution (up to 4K), multi-reference composition (up to 14 images), and optional Search grounding for factual visuals.

Neither OpenAI nor Google publicly discloses full architecture details or parameter counts. What is available and reliable are their interfaces, modalities, limits, and workflow primitives.

LMArena human-preference tests rank GPT Image 1.5 #1 in Text-to-Image (as of Dec 16, 2025), with Nano Banana Pro close behind.

Microsoft Foundry benchmarks show GPT Image 1.5 outperforming Nano Banana Pro on prompt alignment and diagram/flowchart tasks.

Community feedback suggests: GPT Image 1.5 excels at prompt adherence and reference-image conditioning. Nano Banana Pro excels at design-heavy outputs (text-in-image, infographics) but can show occasional artifacts or style drift.

1. Research Methodology

1.1 Official Sources

OpenAI: ChatGPT Images release notes, Images API documentation, and prompting guides.
Google / DeepMind: Nano Banana Pro (Gemini 3 Pro Image) launch posts, Gemini API docs, and Google Cloud announcements.

1.2 Community Sources (Qualitative)

Reddit and X (Twitter) discussions focusing on generation quality, prompt control, and editing behavior.

1.3 Third-Party Benchmarks

LMArena (human preference leaderboards).
Microsoft Azure AI Foundry published benchmark tables.
Open benchmarks and research projects (GenExam, RISEBench), where applicable.

2. Official Technical Comparison

2.1 Model Naming & Release Context

OpenAI: gpt-image-1.5 (snapshot: gpt-image-1.5-2025-12-16), marketed in ChatGPT as ChatGPT Images.
Google: Nano Banana Pro, also referred to as Gemini 3 Pro Image or gemini-3-pro-image-preview.

2.2 Architecture & Parameter Disclosure

Neither model publicly discloses:(1) Core generative architecture (e.g., diffusion vs. autoregressive internals). (2) Training recipe. (3) Parameter count.

GPT Image 1.5 is described as a natively multimodal language model capable of image generation and editing.
Nano Banana Pro is built on Gemini 3, integrating reasoning, real-world knowledge, and optional Search grounding.
Google applies SynthID watermarking to generated images for provenance.

2.3 Inputs, Outputs, and Limits

2.3.1 GPT Image 1.5 (OpenAI)

Limits & Formats

Images per request: 1–10
Edit inputs: up to 16 images, ≤50MB each
Supported formats: PNG, JPEG, WEBP
Output sizes: 1024×1024, 1536×1024, 1024×1536, auto
Prompt length: up to 32,000 characters

Workflow Characteristics

Strong preservation of lighting, composition, and subject identity during edits
Emphasis on fast iteration and controllable edits

2.3.2 Nano Banana Pro (Gemini 3 Pro Image)

Limits & Formats

Maximum resolution: up to 4K
Reference images: up to 14
Input formats: PNG, JPEG, WEBP, HEIC, HEIF
Inline image payload limit: <20MB (File API recommended for larger inputs)

Workflow Characteristics

Strong studio-style controls for layout, typography, and composition
Optional Search grounding for factual and real-world accuracy
SynthID watermarking applied to outputs

3. Users Community Feedback

3.1 GPT Image 1.5

Common Praise

Strong prompt adherence
Reliable use of reference images
Predictable behavior during iterative edits

Common Criticism

Occasional fine-detail artifacts when zoomed in

3.2 Nano Banana Pro

Common Praise

Excellent text-in-image and infographic generation
Strong layout and design-oriented outputs

Common Criticism

Style fidelity issues when matching references
Occasional unexpected or inconsistent edits

3.3 Production Risk Notes

Public discussions highlight potential bias or stereotyping risks in certain Nano Banana Pro generations, which may be relevant for production pipelines.

4. Benchmarks & Comparative Evaluations

4.1 Human Preference (LMArena)

Text-to-Image: GPT Image 1.5 ranked #1; Nano Banana Pro ranked slightly lower.
Image Editing: GPT Image 1.5 marginally outperforms Nano Banana Pro.

4.2 Microsoft Foundry Benchmarks

Prompt Alignment: GPT Image 1.5 > Nano Banana Pro
Diagram / Flowchart Accuracy: GPT Image 1.5 slightly higher

These results are based on Microsoft’s internal datasets and evaluation criteria.

4.3 Open Benchmarks

GenExam and RISEBench evaluations show Nano Banana Pro performing strongly relative to earlier Gemini and GPT-Image-1 models.
These benchmarks do not yet directly evaluate GPT Image 1.5 and should be interpreted as contextual signals.

4.4 Metrics Availability

FID: No authoritative public FID comparison exists for these two proprietary models.
Prompt Adherence: Supported by Microsoft Foundry metrics and LMArena rankings.
Generation Speed: OpenAI and Microsoft report up to 4× faster generation for GPT Image 1.5; Google does not publish an equivalent speed multiplier.

5. Practical Selection Guide

Choose GPT Image 1.5 When:

Tight prompt adherence is critical
Fast iteration and precise edits are required
A simple, production-friendly Images API is preferred

Choose Nano Banana Pro When:

High-resolution (4K) output is required
Workflows involve typography, infographics, or UI-style visuals
Grounded, real-world knowledge improves output quality

6. Licensing & Usage Notes

GPT Image 1.5: Proprietary; usage governed by OpenAI API and platform terms.
Nano Banana Pro: Proprietary; usage governed by Google Cloud / Gemini API terms; SynthID watermarking applied.

API Integration

Developers can integrate GPT Image 1.5 through the RunComfy API using standard HTTP requests. Send prompts plus optional image URLs, select size and quality, and receive rendered outputs in common formats. Integration is streamlined for both synchronous responses and typical job histories.
Note: API Endpoint for GPT Image 1.5

Official resources

Official Website: https://openai.com/blog/chatgpt-images-gpt-image-1-5
Official Documentation: https://platform.openai.com/docs/guides/images/image-generation
License: Proprietary (OpenAI Terms). Commercial use is permitted via the OpenAI API under applicable terms; some enterprise uses may require a separate agreement.

Explore Related Capabilities

If you require generating images from scratch rather than editing an existing image, use the same model configured for text-to-image: GPT Image 1.5 – Generation at GPT 1.5 Text to Image. It is optimized for prompt-driven creation while retaining the instruction-following strengths of GPT Image 1.5.

Related Playgrounds

nano-banana/pro/edit

Turn sketches into precise 2K-4K visuals with smart correction and seamless creative control.

flux-2/lora/edit

Refine images with adaptive style control, LoRA merging, and high-res rendering for consistent design output.

sam-3/image-to-image

Advanced concept-driven image editing with unified segmentation and detection for creators.

gemini-3-pro-image-preview/text-to-image

Create precise, consistent visuals with 4K detail and adaptive text-to-image rendering for design and production needs.

flux-2/max/edit

Precision visual editing tool for consistent, photorealistic brand assets

nano-banana/text-to-image

Seamlessly craft, edit, and fuse images for storytelling, branding, and beyond

Frequently Asked Questions

What are the main capabilities of GPT Image 1.5 in image-to-image generation?

GPT Image 1.5 can create original visuals from text or modify existing images using image-to-image workflows. It excels in preserving fine details, lighting, and texture across multiple edits, offering up to 4× faster generation compared to GPT Image 1. This makes it ideal for creative professionals who need consistency and realism in iterative edits.

How does GPT Image 1.5 differ from earlier models like GPT Image 1 in image-to-image editing?

Compared to GPT Image 1, GPT Image 1.5 introduces improved prompt adherence, more realistic lighting and composition, and richer texture handling in image-to-image transformations. It also provides smoother iterative editing and better text fidelity, which helps developers and technical artists retain visual consistency through complex editing workflows.

What technical limitations should developers know about when working with GPT Image 1.5 image-to-image generation?

GPT Image 1.5 currently outputs up to 1024×1024 pixels (about 1 MP) for most aspect ratios, with prompt token limits near 1000 tokens. It accepts one reference image per image-to-image edit. Developers needing multiple reference compositing should combine them manually before upload or consider alternate workflows.

Are there aspect ratio constraints or format restrictions in GPT Image 1.5 image-to-image outputs?

Yes. GPT Image 1.5 supports square (1:1), landscape (16:9), and portrait (9:16) ratios. Nonstandard aspect ratios are auto-cropped or padded. Supported formats include PNG and JPEG for both input and output in image-to-image editing sessions.

How can I transition from testing GPT Image 1.5 in the RunComfy Playground to full production via API?

Once your prototype using GPT Image 1.5 works as expected in the RunComfy Playground, you can migrate by using the RunComfy API, which mirrors the playground’s parameters, including image-to-image calls. You’ll authenticate with your API key, use the ‘generation’ endpoint, and manage usd credits or paid tiers for production-level scalability.

What makes GPT Image 1.5 superior to competitors in the image-to-image editing space?

GPT Image 1.5 stands out for its balanced blend of image quality, speed, and consistency across edits. While rivals like Flux 2 may offer higher resolution, GPT Image 1.5 provides more stable identity preservation, coherent lighting, and semantic prompt accuracy—especially useful in image-to-image editing scenarios for commercial applications.

Does GPT Image 1.5 handle text rendering inside images better than earlier versions during image-to-image edits?

Yes. GPT Image 1.5 improves legibility of small or dense text elements embedded in generated graphics. When performing image-to-image edits involving logos or signage, the model retains crisp outlines and consistent font rendering, surpassing GPT Image 1 and many competing systems in text fidelity.

Can GPT Image 1.5 be used for commercial image-to-image projects?

In general, you may use GPT Image 1.5 outputs commercially, but always confirm the applicable licensing terms on the official OpenAI platform or RunComfy policy pages. Commercial workflows involving image-to-image editing should verify output rights and data policies, as these may differ depending on API integration modes.

How does GPT Image 1.5 ensure consistent visual identity in multi-step image-to-image processes?

GPT Image 1.5 employs advanced internal representation tracking that preserves facial likeness, textures, and lighting consistency over successive edits. This helps developers or technical artists perform multi-stage image-to-image transformations such as character or product retexturing without introducing visual drift.

Is there a way to optimize generation cost while using GPT Image 1.5 image-to-image features?

Yes. Efficient prompting and batching can reduce usd consumption in RunComfy’s GPT Image 1.5 API. Reusing masked edits for image-to-image tasks instead of full regenerations preserves credits and lowers processing costs while maintaining control over fine visual adjustments.

Support

Video Models/Tools

Image Models

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.

GPT Image 1.5: Identity-Preserving Image Editing & Generation on playground and API | RunComfy