logo
RunComfy
ComfyUIPlaygroundPricing
discord logo
ComfyUI>Workflows>SAM 3 | Advanced Object Segmentation Tool

SAM 3 | Advanced Object Segmentation Tool

Workflow Name: RunComfy/SAM-3
Workflow ID: 0000...1312
With this segmentation workflow, you can easily identify and isolate objects from any image or video frame. It enables precise mask generation and consistent object tracking across frames, saving time in compositing, editing, and post-production. Built for creators seeking control and accuracy, it handles complex visuals effortlessly. The system ensures fast, reliable detection no matter the scene type. Perfect for VFX designers and AI artists aiming for clean and consistent segmentation outputs.

SAM 3 Image and Video Segmentation Workflow for ComfyUI

This workflow brings SAM 3 to ComfyUI for fast, accurate object detection and segmentation on both images and videos. It is designed for artists and technical users who need reliable masks for VFX, rotoscoping, compositing, and AI-assisted editing. With text prompts, box selection, and frame-to-frame propagation, SAM 3 delivers consistent masks that hold up in complex scenes.

The graph includes two image pipelines and one video pipeline. You can segment by describing the target with text, by drawing boxes around it, or by initializing on the first video frame and letting SAM 3 propagate masks through the entire clip. The workflow previews results inline and saves visualization overlays and mask-only outputs.

Key models in Comfyui SAM 3 workflow

  • SAM 3. The next-generation segmentation model that powers both image and video masking in this graph. It is provided via the ComfyUI integration in PozzettiAndrea/ComfyUI-SAM3 and supplies robust masks and region proposals across diverse content.

How to use Comfyui SAM 3 workflow

At a glance, the workflow has three lanes: Image with semantic text prompting, Image with box prompting, and Video with initialization plus propagation. All lanes use the same SAM 3 weights and converge on previews and saves.

Image

The Image group loads a picture with LoadImage (#4) and the SAM 3 weights with LoadSAM3Model (#1). From there, the image flows to two alternative SAM 3 segmentation branches so you can choose the fastest way to get a clean mask. Each branch returns a visualization overlay for quick QC and a binary mask for downstream work. Use the image lane when you need a single high-quality SAM 3 mask quickly.

Image Solution One: Semantic Segmentation

This path segments with language cues. DeepTranslatorTextNode (#16) lets you type a natural language description in your preferred language, which is then routed into SAM3Segmentation (#82). SAM 3 interprets the text and returns a mask plus a colorized overlay you can save via SaveImage (#23) and inspect with MaskPreview (#15). Use short, concrete nouns for best results, and refine by being more specific if multiple objects match.

Image Solution Three: Boxes

This path segments with region-of-interest boxes. Use SAM3BBoxCollector (#84) to draw one or more boxes around what you want, then run SAM3Segmentation (#81) to compute the mask guided by those boxes. You can add exclusion boxes to suppress nearby distractors and get a tighter SAM 3 mask. Results are previewed with PreviewImage (#65) and MaskPreview (#66) and can be exported for comp work.

Video

The Video group loads your clip with VHS_LoadVideo (#75) from the Video Helper Suite and initializes the model with SAM3VideoModelLoader (#69). Use SAM3VideoSegmentation (#78) to set the initial selection on the first frame, optionally aided by points via SAM3PointCollector (#79) or boxes if needed. Then SAM3Propagate (#77) drives SAM 3 forward and backward through the clip to maintain consistent masks even with motion and occlusion. SAM3VideoOutput (#76) yields both an overlay visualization and per-frame masks, which are turned into MP4s with CreateVideo (#70, #74) and saved via SaveVideo (#71, #72). Use this lane when you need clean, temporally stable SAM 3 masks for editing or compositing.

Key nodes in Comfyui SAM 3 workflow

LoadSAM3Model (#1)
Loads the SAM 3 weights for image tasks. If you swap weights, keep your image lanes consistent so previews and saves reflect the same SAM 3 backbone.

SAM3Segmentation (#82)
Text-driven image segmentation. Provide a clear text prompt describing the target class. If multiple objects are detected, make the description more specific or run multiple passes to collect separate SAM 3 masks.

SAM3Segmentation (#81)
Box-driven image segmentation. Draw one or more tight boxes around the object. Use additional boxes to exclude adjacent regions if the mask bleeds, then re-run to refine the SAM 3 output.

SAM3VideoModelLoader (#69)
Initializes the SAM 3 video model for the clip lane. Keep this consistent with your image model choice if you plan to match looks across stills and footage.

SAM3VideoSegmentation (#78)
Sets the initial selection on the first frame using text, points, or boxes. Start with the simplest cue that cleanly isolates the subject. If the first-frame mask is perfect, propagation will be easier and faster across the rest of the video.

SAM3Propagate (#77)
Propagates the initial mask through the sequence. Adjust its behavior when subjects move quickly, change scale, or partially occlude. If drift appears after a scene change or cut, re-initialize near the cut and propagate again to keep SAM 3 results stable.

SAM3VideoOutput (#76)
Packages the propagated SAM 3 masks and a visualization overlay. Use the overlay MP4 to review quality frame by frame, and use the mask-only MP4 for direct ingest in comp or editorial.

SAM3BBoxCollector (#84)
Interactive box tool for image selection. Draw tight positive boxes and optional negative boxes to guide SAM 3 toward precise boundaries, then preview and iterate.

SAM3PointCollector (#79)
Interactive point tool for video initialization. Add a few well-placed positive and negative clicks on the first frame to steer SAM 3 when text or boxes alone are ambiguous.

VHS_LoadVideo (#75)
Video ingestion from the Video Helper Suite Kosinkadink/ComfyUI-VideoHelperSuite. Use it to load your clip, inspect frames, and hand off images to the SAM 3 video nodes for initialization and propagation.

Optional extras

  • Combine text and boxes on tough images. Use a specific SAM 3 text description, then add boxes to suppress nearby clutter.
  • For multiple objects, run separate passes and save each SAM 3 mask, then layer them in your compositor.
  • On videos with hard cuts, re-initialize right after the cut before running SAM 3 propagation again for consistent masks.
  • Save both the overlay and the mask-only video. The overlay is ideal for QC, while the mask-only file drops straight into rotoscoping or keying pipelines.

Acknowledgements

This workflow implements and builds upon the following works and resources. We gratefully acknowledge PozzettiAndrea for ComfyUI-SAM3 for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources

  • PozzettiAndrea/ComfyUI-SAM3
    • GitHub: ComfyUI-SAM3

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

Segment Anything V2 (SAM2) | Video Segmentation

Object segmentation of videos with unrivaled accuracy.

AnimateDiff + AutoMask + ControlNet | Visual Effects (VFX)

Enhance VFX with AnimateDiff, AutoMask, and ControlNet for precise, controlled outcomes.

ComfyUI Grounding | Object Tracking Workflow

Track any subject with pixel-perfect accuracy for stunning VFX results.

MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

AnimateDiff + ControlNet + AutoMask | Comic Style

Effortlessly restyle videos, converting realistic characters into anime while keeping the original backgrounds intact.

SeedVR2 V2.5 | AI Video Upscaling Workflow

Upscale videos fast with sharp, smooth, cinematic results.

ControlNet Tile + 4x UltraSharp | Image/Video Upscaler

Use ControlNet Tile, 4xUltraSharp, and frame interpolation for a high-resolution outcome.

ACE++ Face Swap | Image Editing

Swap faces in images with natural language instructions while preserving style and context.

Follow us
  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
Support
  • Discord
  • Email
  • System Status
  • Affiliate
Resources
  • Free ComfyUI Online
  • ComfyUI Guides
  • RunComfy API
  • ComfyUI Tutorials
  • ComfyUI Nodes
  • Learn More
Legal
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.