meta/sam-3/image-to-image

Unify detection, segmentation, and editing with SAM 3 image-to-image, transforming text or visual prompts into precise, editable results for seamless creative, AR, and research workflows.

Introduction to SAM 3 Image-to-Image Generator

Unveiled by Meta on November 19, 2025, SAM 3 (Segment Anything Model 3) represents the next generation of vision foundation models in the Segment Anything family. Built with promptable concept segmentation (PCS), it unifies detection, segmentation, and tracking across images and video, all driven by text or visual cues. SAM 3 introduces a DETR-based architecture with a presence head for refined concept discrimination and shares its backbone with a memory-based tracker derived from SAM 2. Its expanded SA-Co dataset, featuring millions of diverse concept annotations, doubles prior benchmarks and enables open-vocabulary segmentation that captures every instance of a concept with near-human precision. From real-time creative editing to AR, VR, and 3D reconstruction, SAM 3 pushes the limits of image analysis and generation. The generation tool of SAM 3 image-to-image empowers you to transform prompts into segmented or reconstructed visual outputs directly from any RGB image or exemplar. Designed for creators, researchers, and developers, SAM 3 image-to-image accelerates your workflow—allowing you to detect, edit, or generate detailed scene elements effortlessly. You get precision segmentation and intelligent image understanding in one streamlined tool built for high-impact visual creation.

Examples Created with SAM 3

What makes SAM 3 stand out

SAM 3 is a high-fidelity image-to-image system that unifies detection, segmentation, and editing in a structure-aware workflow. SAM 3 preserves geometry, materials, and composition while applying targeted, text-driven changes, producing realistic results without full-frame resynthesis. With strong region understanding and mask precision, SAM 3 enables reliable localized edits, global restyling, and asset preparation across creative, AR, and research pipelines. Thanks to its adaptability to varied scenes and lighting, SAM 3 remains robust on cluttered layouts and dependable in production where consistency and editability are crucial. Key capabilities of SAM 3:

Structure-preserving edits that maintain pose, layout, depth cues, and material response.
Segmentation-driven control with high-quality masks for region-constrained operations.
Text-conditioned changes to add, remove, or modify elements without drifting the base scene.
Background cleanup or replacement while keeping the subject intact and edges stable.
Consistent lighting and realism with perspective-aware harmonization of shadows and reflections.
Edge-safe transformations that avoid halos, bleed, and unintended warping on fine details.
Resolution- and aspect-robust behavior suitable for batch workflows and pipelines with SAM 3.

Prompting guide for SAM 3

To use SAM 3, start by providing image_url as the required base image. Optionally include text_prompt to describe the edit in clear, concrete terms. State what to change and what to preserve so SAM 3 can limit its operations to the intended regions. Use spatial language to help SAM 3 localize changes, and specify lighting, material, or style targets only when needed. Keep prompts for SAM 3 concise and iterative to refine outcomes without destabilizing structure.

Example prompts for SAM 3:

"Preserve the person and pose; remove the power lines from the background only."
"Only modify the background; replace the sky with overcast clouds; keep building edges sharp."
"Add a wooden bench to the right of the subject; match perspective and lighting; preserve shadows."
"Replace the storefront sign with 'BLOSSOM' in white sans-serif; perspective-matched; do not change facade."
"Subtle cinematic grade across the scene; do not alter skin tone or composition."
"Remove reflections on the left window; keep interior and signage unchanged."

Pro tips for SAM 3:

Specify scope and constraints first: what to keep, what to edit, and where.
Use precise spatial terms like left, right, foreground, background, upper-right quadrant.
Prefer a few strong descriptors over many competing adjectives.
Iterate with short updates; adjust nouns and constraints rather than rewriting the whole prompt.
Supply a high-resolution, well-cropped image_url; remove irrelevant regions before processing with SAM 3.

Related Playgrounds

imagen-4/ultra/text-to-image

Generate photorealistic images from text with Google Imagen 4 Ultra.

ideogram-v3/reframe

Change an image’s aspect ratio cleanly with Ideogram 3 Reframe.

qwen-edit-2509/lora

Next-gen visual tool with refined editing, bilingual text control, and seamless image blending.

recraft-v3/text-to-image

Nail the art of text and vector imagery.

ideogram-v3/replace-background

Replace a photo’s background with a new scene using Ideogram 3.

qwen-edit-2509/lora/fusion

Blend and refine visuals with advanced image editing, depth control, and multilingual design precision.

Frequently Asked Questions

What is SAM 3 and what can it do for image-to-image applications?

SAM 3, also known as Segment Anything Model 3, is Meta’s latest vision foundation model designed for open-vocabulary segmentation and tracking. It excels at identifying and masking objects in both stills and videos, enabling detailed image-to-image transformations such as content editing, object replacement, and layout refinement.

How does SAM 3 perform compared to earlier versions for image-to-image processing?

SAM 3 significantly outperforms SAM 1 and SAM 2 in accuracy and versatility. It introduces Promptable Concept Segmentation (PCS), which allows broader natural language input and improved object-level consistency in image-to-image tasks like transferring texture or color between objects.

Is SAM 3 free to use for image-to-image segmentation projects?

Users can try SAM 3 through Runcomfy’s AI playground with free trial credits. After that, usage of SAM 3 for image-to-image generation or segmentation consumes credits based on each run. The credit usage policy can be found in the platform’s ‘Generation’ section.

Who should consider using SAM 3 for image-to-image editing and research?

SAM 3 is best suited for computer vision researchers, developers, and digital content creators. It’s ideal for anyone working on tasks like image-to-image manipulation, augmented reality development, annotation automation, or e-commerce visual previews.

What kinds of inputs and outputs does SAM 3 support for image-to-image workflows?

SAM 3 supports RGB images as input and outputs segmentation masks, tracked object identities, and refined results suitable for image-to-image enhancement workflows. It also connects with SAM 3D to generate single-image 3D reconstructions.

What are the main benefits of using SAM 3 for image-to-image segmentation?

SAM 3’s main strengths include open-vocabulary understanding, fast detection, and high-quality segmentation. It enables realistic image-to-image transformations by correctly identifying all instances of a concept in complex images or videos, enhancing productivity for creative and analytical applications.

How does SAM 3 ensure accuracy during image-to-image segmentation or object tracking?

SAM 3 incorporates a presence head and a DETR-based architecture that boosts fine-grained recognition. For image-to-image segmentation, it maintains contextual consistency and tracks object identities over frames, resulting in cleaner and more coherent outputs.

Where can I access SAM 3 for image-to-image experimentation?

Users can access SAM 3 directly at Runcomfy’s AI playground via a web browser. The tool works smoothly on desktops and mobile devices, making it convenient for experimenting with image-to-image segmentation and visual prompt refinement.

What are some current limitations of SAM 3 for image-to-image use cases?

While SAM 3 delivers excellent segmentation quality, its image-to-image capabilities depend on input clarity and prompt precision. It may require GPU power for real-time performance, and results can vary in low-light or highly abstract scenes.