SAM 3 in ComfyUI Workflow | Precision Image Segmentation AI

SAM 3 Image and Video Segmentation Workflow for ComfyUI

This workflow brings SAM 3 to ComfyUI for fast, accurate object detection and segmentation on both images and videos. It is designed for artists and technical users who need reliable masks for VFX, rotoscoping, compositing, and AI-assisted editing. With text prompts, box selection, and frame-to-frame propagation, SAM 3 delivers consistent masks that hold up in complex scenes.

The graph includes two image pipelines and one video pipeline. You can segment by describing the target with text, by drawing boxes around it, or by initializing on the first video frame and letting SAM 3 propagate masks through the entire clip. The workflow previews results inline and saves visualization overlays and mask-only outputs.

Key models in Comfyui SAM 3 workflow

SAM 3. The next-generation segmentation model that powers both image and video masking in this graph. It is provided via the ComfyUI integration in PozzettiAndrea/ComfyUI-SAM3 and supplies robust masks and region proposals across diverse content.

How to use Comfyui SAM 3 workflow

At a glance, the workflow has three lanes: Image with semantic text prompting, Image with box prompting, and Video with initialization plus propagation. All lanes use the same SAM 3 weights and converge on previews and saves.

Image

The Image group loads a picture with LoadImage (#4) and the SAM 3 weights with LoadSAM3Model (#1). From there, the image flows to two alternative SAM 3 segmentation branches so you can choose the fastest way to get a clean mask. Each branch returns a visualization overlay for quick QC and a binary mask for downstream work. Use the image lane when you need a single high-quality SAM 3 mask quickly.

Image Solution One: Semantic Segmentation

This path segments with language cues. DeepTranslatorTextNode (#16) lets you type a natural language description in your preferred language, which is then routed into SAM3Segmentation (#82). SAM 3 interprets the text and returns a mask plus a colorized overlay you can save via SaveImage (#23) and inspect with MaskPreview (#15). Use short, concrete nouns for best results, and refine by being more specific if multiple objects match.

Image Solution Three: Boxes

This path segments with region-of-interest boxes. Use SAM3BBoxCollector (#84) to draw one or more boxes around what you want, then run SAM3Segmentation (#81) to compute the mask guided by those boxes. You can add exclusion boxes to suppress nearby distractors and get a tighter SAM 3 mask. Results are previewed with PreviewImage (#65) and MaskPreview (#66) and can be exported for comp work.

Video

The Video group loads your clip with VHS_LoadVideo (#75) from the Video Helper Suite and initializes the model with SAM3VideoModelLoader (#69). Use SAM3VideoSegmentation (#78) to set the initial selection on the first frame, optionally aided by points via SAM3PointCollector (#79) or boxes if needed. Then SAM3Propagate (#77) drives SAM 3 forward and backward through the clip to maintain consistent masks even with motion and occlusion. SAM3VideoOutput (#76) yields both an overlay visualization and per-frame masks, which are turned into MP4s with CreateVideo (#70, #74) and saved via SaveVideo (#71, #72). Use this lane when you need clean, temporally stable SAM 3 masks for editing or compositing.

Key nodes in Comfyui SAM 3 workflow

LoadSAM3Model (#1) Loads the SAM 3 weights for image tasks. If you swap weights, keep your image lanes consistent so previews and saves reflect the same SAM 3 backbone.

SAM3Segmentation (#82) Text-driven image segmentation. Provide a clear text prompt describing the target class. If multiple objects are detected, make the description more specific or run multiple passes to collect separate SAM 3 masks.

SAM3Segmentation (#81) Box-driven image segmentation. Draw one or more tight boxes around the object. Use additional boxes to exclude adjacent regions if the mask bleeds, then re-run to refine the SAM 3 output.

SAM3VideoModelLoader (#69) Initializes the SAM 3 video model for the clip lane. Keep this consistent with your image model choice if you plan to match looks across stills and footage.

SAM3VideoSegmentation (#78) Sets the initial selection on the first frame using text, points, or boxes. Start with the simplest cue that cleanly isolates the subject. If the first-frame mask is perfect, propagation will be easier and faster across the rest of the video.

SAM3Propagate (#77) Propagates the initial mask through the sequence. Adjust its behavior when subjects move quickly, change scale, or partially occlude. If drift appears after a scene change or cut, re-initialize near the cut and propagate again to keep SAM 3 results stable.

SAM3VideoOutput (#76) Packages the propagated SAM 3 masks and a visualization overlay. Use the overlay MP4 to review quality frame by frame, and use the mask-only MP4 for direct ingest in comp or editorial.

SAM3BBoxCollector (#84) Interactive box tool for image selection. Draw tight positive boxes and optional negative boxes to guide SAM 3 toward precise boundaries, then preview and iterate.

SAM3PointCollector (#79) Interactive point tool for video initialization. Add a few well-placed positive and negative clicks on the first frame to steer SAM 3 when text or boxes alone are ambiguous.

VHS_LoadVideo (#75) Video ingestion from the Video Helper Suite Kosinkadink/ComfyUI-VideoHelperSuite. Use it to load your clip, inspect frames, and hand off images to the SAM 3 video nodes for initialization and propagation.

Optional extras

Combine text and boxes on tough images. Use a specific SAM 3 text description, then add boxes to suppress nearby clutter.
For multiple objects, run separate passes and save each SAM 3 mask, then layer them in your compositor.
On videos with hard cuts, re-initialize right after the cut before running SAM 3 propagation again for consistent masks.
Save both the overlay and the mask-only video. The overlay is ideal for QC, while the mask-only file drops straight into rotoscoping or keying pipelines.

Acknowledgements

This workflow implements and builds upon the following works and resources. We gratefully acknowledge PozzettiAndrea for ComfyUI-SAM3 for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources

PozzettiAndrea/ComfyUI-SAM3
- GitHub: ComfyUI-SAM3

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

Segment Anything V2 (SAM2) | Video Segmentation

Object segmentation of videos with unrivaled accuracy.

AnimateDiff + AutoMask + ControlNet | Visual Effects (VFX)

Enhance VFX with AnimateDiff, AutoMask, and ControlNet for precise, controlled outcomes.

ComfyUI Grounding | Object Tracking Workflow

Track any subject with pixel-perfect accuracy for stunning VFX results.

MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

AnimateDiff + ControlNet + AutoMask | Comic Style

Effortlessly restyle videos, converting realistic characters into anime while keeping the original backgrounds intact.

Ace Step 1.5 | Commercial-Grade AI Music Generator

Turns text into full songs with smart planning and diffusion power.

Stable Cascade | Text to Image

Stable Cascade, a text-to-image model excelling in prompt alignment and aesthetics.

IPAdapter V1 FaceID Plus | Consistent Characters

Leverage IPAdapter FaceID Plus V2 model to create consistent characters.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

SAM 3 | Advanced Object Segmentation Tool

SAM 3 Image and Video Segmentation Workflow for ComfyUI

Key models in Comfyui SAM 3 workflow

How to use Comfyui SAM 3 workflow

Image

Image Solution One: Semantic Segmentation

Image Solution Three: Boxes

Video

Key nodes in Comfyui SAM 3 workflow

Optional extras

Acknowledgements

Resources

Want More ComfyUI Workflows?

Segment Anything V2 (SAM2) | Video Segmentation

AnimateDiff + AutoMask + ControlNet | Visual Effects (VFX)

ComfyUI Grounding | Object Tracking Workflow

MV-Adapter | High-Resolution Multi-view Generator

AnimateDiff + ControlNet + AutoMask | Comic Style

Ace Step 1.5 | Commercial-Grade AI Music Generator

Stable Cascade | Text to Image

IPAdapter V1 FaceID Plus | Consistent Characters