logo
RunComfy
  • ComfyUI
  • TrainerNew
  • Models
  • API
  • Pricing
discord logo
ComfyUI>Workflows>SAM 3.1 ComfyUI | Native Segmentation & Tracking

SAM 3.1 ComfyUI | Native Segmentation & Tracking

Workflow Name: RunComfy/SAM-3.1-ComfyUI
Workflow ID: 0000...1407
This workflow lets you segment images and track objects across frames with pinpoint accuracy. Using comfy-core detection and tracking, it enables real-time bounding-box previews and mask extraction without third-party nodes. You can isolate elements for compositing or editing easily. Ideal for motion designers, editors, and AI creators seeking reusable masks. Achieve clean, native integration and enhance workflow efficiency with full visual control.

SAM 3.1 ComfyUI workflow for prompt-guided segmentation, bounding-box preview, and video tracking#

This SAM 3.1 ComfyUI workflow delivers native, promptable image segmentation with instant bounding-box visualization and frame-accurate video object tracking with mask extraction. It uses the built-in comfy-core SAM 3.1 nodes, so you get first-class performance and stability without third‑party custom nodes. The result is fast, reusable mattes for compositing, isolation, or downstream editing across still images and full videos.

Designed for artists, editors, and pipeline engineers, SAM 3.1 ComfyUI makes it easy to start with a text cue or a bounding box, validate the selection in-place, then propagate a clean mask through an entire clip. Under the hood it loads the sam3.1_multiplex_fp16 checkpoint and runs the official SAM3_Detect, SAM3_VideoTrack, SAM3_TrackToMask, and SAM3_TrackPreview nodes that were added as native support to ComfyUI. See the model files on Hugging Face and the ComfyUI pull request for background: Comfy-Org/sam3.1, ComfyUI PR #13408.

Key models in Comfyui SAM 3.1 ComfyUI workflow#

  • Comfy-Org SAM 3.1 Multiplex FP16 checkpoint. The sam3.1_multiplex_fp16 weights power promptable image segmentation and the tracker used by the SAM 3.1 nodes. Load it with CheckpointLoaderSimple and it supplies the model and text-conditioning used throughout the workflow. Source: Comfy-Org/sam3.1.

How to use Comfyui SAM 3.1 ComfyUI workflow#

The graph has two independent lanes. Image Masking lets you segment a still image and preview bounding boxes for quick QA. Video Masking initializes a mask on a reference frame, tracks the object across the clip, previews the track, and exports masks for editing or compositing.

Image Masking#

This lane is ideal for single frames or for prototyping your prompt before you run tracking. Start by loading an image with LoadImage (#4) and writing a short text cue in CLIPTextEncode (#3), for example “a bird” or “red car”. The text conditioning and image are fed to SAM3_Detect (#1), which returns both a mask and automatic bounding boxes around the detected subject. Use MaskPreview+ (#5) to visually inspect the matte and DrawBBoxes (#6) plus PreviewImage (#7) to confirm the box placement. If the selection is ambiguous, refine the text, add positive or negative points, or provide a tighter box to steer SAM 3.1 ComfyUI toward the intended object.

Video Masking#

This lane scales the same promptable segmentation to full clips. Load a video in VHS_LoadVideoPath (#12); it provides frames and metadata to the rest of the graph. A reference frame is chosen with ImageFromBatch (#15) and described in text via CLIPTextEncode (#14). SAM3_Detect (#13) generates the initial mask on that frame, which serves as the seed for SAM3_VideoTrack (#8) to follow the object across remaining frames using the same model and text conditioning. Convert the resulting track into per-frame mattes with SAM3_TrackToMask (#9). For a quick binary preview or to invert foreground/background, the masks pass through InvertMask (#19) and MaskToImage (#16), then VHS_VideoCombine (#17) can render a simple mask video. For an interactive look at the result over the original frames, SAM3_TrackPreview (#10) plays the overlay at the source frame rate provided by VHS_VideoInfoLoaded (#18). Adjust the starting frame or prompt if you see drift, then re-run to lock the track before exporting.

Key nodes in Comfyui SAM 3.1 ComfyUI workflow#

SAM3_Detect (#1)#

Generates an object mask and bounding boxes for a still image based on your prompt and optional points or boxes. Use it to validate your subject choice quickly in SAM 3.1 ComfyUI. If the mask feels too broad or includes lookalikes, tighten the textual description or draw a more constrained box to improve separation.

SAM3_Detect (#13)#

Seeds the video tracker by producing a clean mask on a chosen reference frame. Tracking quality in SAM 3.1 ComfyUI strongly depends on this seed, so pick a frame where the target is visible and minimally occluded. If the subject changes appearance later, reinitialize from another frame and concatenate results in your editor.

SAM3_VideoTrack (#8)#

Propagates the initial mask through the clip using the same model and text cue. Keep the conditioning consistent with the seed to avoid latching onto similar objects. When tracking a small or fast-moving subject, start from a frame with a confident seed and consider shortening the segment if lighting or scale shifts dramatically.

SAM3_TrackToMask (#9)#

Converts the tracker output to a mask sequence for export. You can output all frames or select a subset by entering indices or simple ranges. This is the handoff point to either write a video preview or to save a PNG sequence for compositing in your preferred tool.

SAM3_TrackPreview (#10)#

Plays back the tracked result over the original frames for instant quality control. The preview uses the source frame rate reported by VHS_VideoInfoLoaded (#18) so timing matches your clip. Use it to spot drift, occlusion failures, or identity swaps before committing to a full export.

Optional extras#

  • Use bounding boxes to disambiguate when your text prompt matches multiple subjects in frame.
  • If the target changes scale or lighting mid-clip, split the video into logical segments and re-seed SAM3_Detect (#13) per segment for steadier tracking.
  • For matte exports as an image sequence, route SAM3_TrackToMask (#9) to a SaveImage node instead of VHS_VideoCombine (#17).
  • Keep prompts short and specific. In SAM 3.1 ComfyUI, concise nouns with a key attribute often outperform long prose.
  • When you only need a still mask from a specific frame, run Image Masking on that frame directly to bypass tracking and save time.

Acknowledgements#

This workflow implements and builds upon the following works and resources. We gratefully acknowledge Innovate Futures @ Benji for the ComfyUI with SAM 3.1 segmentation workflow, Comfy-Org for the SAM 3.1 model files, and Comfy-Org for the Native ComfyUI SAM 3.1 support PR for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.

Resources#

  • Innovate Futures @ Benji/Workflow source
    • Docs / Release Notes: ComfyUI With SAM 3.1 Segmentation Native Support! No Custom Node Needed @Benji's AI Playground
  • Comfy-Org/SAM 3.1 model files
    • GitHub: facebookresearch/sam3
    • Hugging Face: Comfy-Org/sam3.1
    • arXiv: SAM 3: Segment Anything with Concepts (2511.16719)
    • Docs / Release Notes: RELEASE_SAM3p1.md
  • Comfy-Org/Native ComfyUI SAM 3.1 support PR
    • GitHub: Comfy-Org/ComfyUI#13408

Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.

Want More ComfyUI Workflows?

ComfyUI Grounding | Object Tracking Workflow

Track any subject with pixel-perfect accuracy for stunning VFX results.

Segment Anything V2 (SAM2) | Video Segmentation

Object segmentation of videos with unrivaled accuracy.

MatAnyone Video Matting | Single Mask Removal

Remove video backgrounds with one mask frame for perfect subject isolation.

Vid2Vid Part 1 | Composition and Masking

The ComfyUI Vid2Vid offers two distinct workflows to creating high-quality, professional animations: Vid2Vid Part 1, which enhances your creativity by focusing on the composition and masking of your original video, and Vid2Vid Part 2, which utilizes SDXL Style Transfer to transform the style of your video to match your desired aesthetic. This page specifically covers Vid2Vid Part 1

Motion Graphics Animation Effects | Vid2Vid

Achieve motion graphics animation effects starting from a pre-existing video input.

ComfyUI Vid2Vid Dance Transfer

Transfers the motion and style from a source video onto a target image or object.

AnimateDiff + ControlNet + IPAdapter V1 | Cartoon Style

Convert the original video into the desired animation by using only a few images to define the preferred style.

Wan2.2 Fun Camera | Cinematic Motion from Images

Turn still images into lively cinematic shots with smooth camera moves.

Follow us
  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
Support
  • Discord
  • Email
  • System Status
  • Affiliate
Resources
  • Free ComfyUI Online
  • ComfyUI Guides
  • RunComfy API
  • RunComfy MCP
  • ComfyUI Tutorials
  • ComfyUI Nodes
  • Learn More
Legal
  • Terms of Service
  • Privacy Policy
  • Cookie Policy
RunComfy
Copyright 2026 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.