imagen-4/ultra/text-to-image
Generate photorealistic images from text with Google Imagen 4 Ultra.
Unify detection, segmentation, and editing with SAM 3 image-to-image, transforming text or visual prompts into precise, editable results for seamless creative, AR, and research workflows.






SAM 3 is a high-fidelity image-to-image system that unifies detection, segmentation, and editing in a structure-aware workflow. SAM 3 preserves geometry, materials, and composition while applying targeted, text-driven changes, producing realistic results without full-frame resynthesis. With strong region understanding and mask precision, SAM 3 enables reliable localized edits, global restyling, and asset preparation across creative, AR, and research pipelines. Thanks to its adaptability to varied scenes and lighting, SAM 3 remains robust on cluttered layouts and dependable in production where consistency and editability are crucial. Key capabilities of SAM 3:
To use SAM 3, start by providing image_url as the required base image. Optionally include text_prompt to describe the edit in clear, concrete terms. State what to change and what to preserve so SAM 3 can limit its operations to the intended regions. Use spatial language to help SAM 3 localize changes, and specify lighting, material, or style targets only when needed. Keep prompts for SAM 3 concise and iterative to refine outcomes without destabilizing structure.
Example prompts for SAM 3:
Pro tips for SAM 3:
Generate photorealistic images from text with Google Imagen 4 Ultra.
Change an image’s aspect ratio cleanly with Ideogram 3 Reframe.
Next-gen visual tool with refined editing, bilingual text control, and seamless image blending.
Nail the art of text and vector imagery.
Replace a photo’s background with a new scene using Ideogram 3.
Blend and refine visuals with advanced image editing, depth control, and multilingual design precision.
SAM 3, also known as Segment Anything Model 3, is Meta’s latest vision foundation model designed for open-vocabulary segmentation and tracking. It excels at identifying and masking objects in both stills and videos, enabling detailed image-to-image transformations such as content editing, object replacement, and layout refinement.
SAM 3 significantly outperforms SAM 1 and SAM 2 in accuracy and versatility. It introduces Promptable Concept Segmentation (PCS), which allows broader natural language input and improved object-level consistency in image-to-image tasks like transferring texture or color between objects.
Users can try SAM 3 through Runcomfy’s AI playground with free trial credits. After that, usage of SAM 3 for image-to-image generation or segmentation consumes credits based on each run. The credit usage policy can be found in the platform’s ‘Generation’ section.
SAM 3 is best suited for computer vision researchers, developers, and digital content creators. It’s ideal for anyone working on tasks like image-to-image manipulation, augmented reality development, annotation automation, or e-commerce visual previews.
SAM 3 supports RGB images as input and outputs segmentation masks, tracked object identities, and refined results suitable for image-to-image enhancement workflows. It also connects with SAM 3D to generate single-image 3D reconstructions.
SAM 3’s main strengths include open-vocabulary understanding, fast detection, and high-quality segmentation. It enables realistic image-to-image transformations by correctly identifying all instances of a concept in complex images or videos, enhancing productivity for creative and analytical applications.
SAM 3 incorporates a presence head and a DETR-based architecture that boosts fine-grained recognition. For image-to-image segmentation, it maintains contextual consistency and tracks object identities over frames, resulting in cleaner and more coherent outputs.
Users can access SAM 3 directly at Runcomfy’s AI playground via a web browser. The tool works smoothly on desktops and mobile devices, making it convenient for experimenting with image-to-image segmentation and visual prompt refinement.
While SAM 3 delivers excellent segmentation quality, its image-to-image capabilities depend on input clarity and prompt precision. It may require GPU power for real-time performance, and results can vary in low-light or highly abstract scenes.