This workflow brings SAM 3 to ComfyUI for fast, accurate object detection and segmentation on both images and videos. It is designed for artists and technical users who need reliable masks for VFX, rotoscoping, compositing, and AI-assisted editing. With text prompts, box selection, and frame-to-frame propagation, SAM 3 delivers consistent masks that hold up in complex scenes.
The graph includes two image pipelines and one video pipeline. You can segment by describing the target with text, by drawing boxes around it, or by initializing on the first video frame and letting SAM 3 propagate masks through the entire clip. The workflow previews results inline and saves visualization overlays and mask-only outputs.
At a glance, the workflow has three lanes: Image with semantic text prompting, Image with box prompting, and Video with initialization plus propagation. All lanes use the same SAM 3 weights and converge on previews and saves.
The Image group loads a picture with LoadImage (#4) and the SAM 3 weights with LoadSAM3Model (#1). From there, the image flows to two alternative SAM 3 segmentation branches so you can choose the fastest way to get a clean mask. Each branch returns a visualization overlay for quick QC and a binary mask for downstream work. Use the image lane when you need a single high-quality SAM 3 mask quickly.
This path segments with language cues. DeepTranslatorTextNode (#16) lets you type a natural language description in your preferred language, which is then routed into SAM3Segmentation (#82). SAM 3 interprets the text and returns a mask plus a colorized overlay you can save via SaveImage (#23) and inspect with MaskPreview (#15). Use short, concrete nouns for best results, and refine by being more specific if multiple objects match.
This path segments with region-of-interest boxes. Use SAM3BBoxCollector (#84) to draw one or more boxes around what you want, then run SAM3Segmentation (#81) to compute the mask guided by those boxes. You can add exclusion boxes to suppress nearby distractors and get a tighter SAM 3 mask. Results are previewed with PreviewImage (#65) and MaskPreview (#66) and can be exported for comp work.
The Video group loads your clip with VHS_LoadVideo (#75) from the Video Helper Suite and initializes the model with SAM3VideoModelLoader (#69). Use SAM3VideoSegmentation (#78) to set the initial selection on the first frame, optionally aided by points via SAM3PointCollector (#79) or boxes if needed. Then SAM3Propagate (#77) drives SAM 3 forward and backward through the clip to maintain consistent masks even with motion and occlusion. SAM3VideoOutput (#76) yields both an overlay visualization and per-frame masks, which are turned into MP4s with CreateVideo (#70, #74) and saved via SaveVideo (#71, #72). Use this lane when you need clean, temporally stable SAM 3 masks for editing or compositing.
LoadSAM3Model (#1)
Loads the SAM 3 weights for image tasks. If you swap weights, keep your image lanes consistent so previews and saves reflect the same SAM 3 backbone.
SAM3Segmentation (#82)
Text-driven image segmentation. Provide a clear text prompt describing the target class. If multiple objects are detected, make the description more specific or run multiple passes to collect separate SAM 3 masks.
SAM3Segmentation (#81)
Box-driven image segmentation. Draw one or more tight boxes around the object. Use additional boxes to exclude adjacent regions if the mask bleeds, then re-run to refine the SAM 3 output.
SAM3VideoModelLoader (#69)
Initializes the SAM 3 video model for the clip lane. Keep this consistent with your image model choice if you plan to match looks across stills and footage.
SAM3VideoSegmentation (#78)
Sets the initial selection on the first frame using text, points, or boxes. Start with the simplest cue that cleanly isolates the subject. If the first-frame mask is perfect, propagation will be easier and faster across the rest of the video.
SAM3Propagate (#77)
Propagates the initial mask through the sequence. Adjust its behavior when subjects move quickly, change scale, or partially occlude. If drift appears after a scene change or cut, re-initialize near the cut and propagate again to keep SAM 3 results stable.
SAM3VideoOutput (#76)
Packages the propagated SAM 3 masks and a visualization overlay. Use the overlay MP4 to review quality frame by frame, and use the mask-only MP4 for direct ingest in comp or editorial.
SAM3BBoxCollector (#84)
Interactive box tool for image selection. Draw tight positive boxes and optional negative boxes to guide SAM 3 toward precise boundaries, then preview and iterate.
SAM3PointCollector (#79)
Interactive point tool for video initialization. Add a few well-placed positive and negative clicks on the first frame to steer SAM 3 when text or boxes alone are ambiguous.
VHS_LoadVideo (#75)
Video ingestion from the Video Helper Suite Kosinkadink/ComfyUI-VideoHelperSuite. Use it to load your clip, inspect frames, and hand off images to the SAM 3 video nodes for initialization and propagation.
This workflow implements and builds upon the following works and resources. We gratefully acknowledge PozzettiAndrea for ComfyUI-SAM3 for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.