SAM3 Video Segmentation

Facilitates video object tracking and segmentation in ComfyUI using SAM3 model efficiently.

SAM3 Video Segmentation:

The SAM3VideoSegmentation node is designed to facilitate video object tracking and segmentation within the ComfyUI framework using the SAM3 model. This node is part of a stateless architecture, meaning it does not retain any global mutable state, which ensures that all state information is encoded in immutable outputs. The primary function of this node is to process video frames and apply segmentation techniques to identify and track objects across frames. It leverages a sophisticated hashing mechanism to ensure that video states are computed efficiently, avoiding redundant calculations by caching results based on unique video content and configuration parameters. This node is particularly beneficial for AI artists and developers who need to perform video segmentation tasks without delving into the complexities of state management, as it automatically handles session initialization, prompt addition, and inference state reconstruction.

SAM3 Video Segmentation Input Parameters:

video_frames

The video_frames parameter represents the sequence of frames from the video that you want to process for segmentation. This parameter is crucial as it forms the basis of the segmentation task, allowing the node to analyze and track objects across the provided frames. The frames should be in a format compatible with the node's processing capabilities, typically as a tensor or array. There are no explicit minimum or maximum values for this parameter, but the quality and resolution of the frames can impact the segmentation accuracy and performance.

score_threshold

The score_threshold parameter is used to set the minimum confidence level required for object detection within the video frames. This threshold helps filter out less confident detections, ensuring that only objects with a detection score above this value are considered for segmentation. Adjusting this parameter can significantly impact the results, with higher values leading to more precise but potentially fewer detections, while lower values may increase the number of detected objects but with reduced accuracy. The default value is typically set to a balanced level, but it can be adjusted based on the specific requirements of your task.

prompt_mode

The prompt_mode parameter determines the method by which prompts are added to the video segmentation process. Prompts can be used to guide the segmentation algorithm, providing additional context or constraints to improve the accuracy of object tracking. This parameter is mutually exclusive, meaning only one prompt mode can be active at a time. The available options for this parameter depend on the specific implementation and can include modes such as text prompts, positive/negative points, or bounding boxes. Selecting the appropriate prompt mode is essential for achieving the desired segmentation results.

text_prompt

The text_prompt parameter allows you to provide a textual description or keyword that can be used to influence the segmentation process. This parameter is particularly useful when you want to focus the segmentation on specific objects or features described by the text. The effectiveness of this parameter depends on the model's ability to interpret and apply the text prompt to the video frames. There are no strict constraints on the content of the text prompt, but it should be relevant to the objects or features you wish to segment.

positive_points

The positive_points parameter is a list of coordinates that specify points within the video frames that should be positively reinforced during the segmentation process. These points act as anchors, guiding the algorithm to focus on specific areas of interest. This parameter is useful for refining the segmentation results, especially in complex scenes where certain objects may be difficult to detect. The number and placement of positive points can vary based on the complexity of the video and the desired level of segmentation detail.

negative_points

The negative_points parameter is similar to positive_points, but it specifies points that should be negatively reinforced, indicating areas that should be ignored or de-emphasized during segmentation. This parameter helps reduce false positives by explicitly marking regions that are not of interest. Like positive points, the number and placement of negative points can be adjusted to suit the specific requirements of your video segmentation task.

positive_boxes

The positive_boxes parameter allows you to define bounding boxes around objects or regions of interest within the video frames. These boxes serve as a more structured form of guidance compared to individual points, providing clear boundaries for the segmentation algorithm to focus on. This parameter is particularly useful for segmenting larger or more complex objects that may not be adequately captured by points alone. The size and position of the positive boxes should be carefully chosen to encompass the desired objects without including extraneous areas.

negative_boxes

The negative_boxes parameter functions similarly to positive_boxes, but it defines bounding boxes around areas that should be excluded from the segmentation process. These boxes help the algorithm avoid regions that are not relevant to the task, reducing the likelihood of false detections. As with positive boxes, the size and placement of negative boxes should be carefully considered to effectively exclude unwanted areas while maintaining the focus on relevant objects.

frame_idx

The frame_idx parameter specifies the index of the frame within the video sequence that is currently being processed. This parameter is essential for maintaining the correct order of frames during segmentation and ensuring that the results are consistent across the entire video. The value of frame_idx should correspond to the position of the frame within the sequence, starting from zero for the first frame.