RunComfy

InfiniteTalk | Lip-Synced Avatar Generator

Photo + Voice = Perfectly Synced Talking Avatar in Minutes

Wan 2.2 Lightning T2V I2V | 4-Step Ultra Fast

Wan 2.2 now 20x faster! T2V + I2V in 4 steps.

Animatediff V2 & V3 | Text to Video

Explore AnimateDiff V3, AnimateDiff SDXL and AnimateDiff V2, and use Upscale for high-resolution results.

Hunyuan Video | Video to Video

Combine text prompt and source video to generate new video.

ComfyUI > Nodes > HunyuanVideo-1.5 nodes > HunyuanVideo Vision Encode

ComfyUI Node: HunyuanVideo Vision Encode

Class Name

HyVideo15VisionEncode

Category
HunyuanVideoWrapper1.5

Author
yuanyuan-spec (Account age: 32days) Extension
HunyuanVideo-1.5 nodes Latest Updated
2025-12-02 Github Stars
0.02K

Github Ask yuanyuan-spec Current Questions Past Questions

Table of Content

Description
HyVideo15VisionEncode:
HyVideo15VisionEncode Input Parameters:
HyVideo15VisionEncode Output Parameters:
HyVideo15VisionEncode Usage Tips:
HyVideo15VisionEncode Common Errors and Solutions:
Related Nodes

How to Install HunyuanVideo-1.5 nodes

Install this extension via the ComfyUI Manager by searching for HunyuanVideo-1.5 nodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter HunyuanVideo-1.5 nodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

HunyuanVideo Vision Encode Description

Encodes visual data into vision states for seamless integration in video generation.

HunyuanVideo Vision Encode:

The HyVideo15VisionEncode node is a component of the HunyuanVideo 1.5 suite, designed to facilitate the encoding of visual data into a format suitable for further processing within the video generation pipeline. This node leverages advanced vision encoding techniques to transform input images into a set of vision states, which are essential for generating high-quality video content. By utilizing this node, you can efficiently convert visual information into a structured format that can be seamlessly integrated with other components of the HunyuanVideo system. The primary goal of this node is to ensure that visual data is accurately represented and ready for subsequent stages of video synthesis, making it a crucial element in the creation of visually compelling AI-generated videos.

HunyuanVideo Vision Encode Input Parameters:

vision_encoder

The vision_encoder parameter specifies the vision encoder model to be used for processing the input images. This model is responsible for extracting meaningful features from the visual data, which are then encoded into vision states. The choice of vision encoder can significantly impact the quality and characteristics of the encoded output.

hyvid_cfg

The hyvid_cfg parameter provides configuration settings for the HunyuanVideo system. These settings may include various options that control the behavior of the vision encoding process, such as model parameters and processing preferences. Proper configuration ensures that the encoding process aligns with the desired output characteristics.

latents_dict

The latents_dict parameter contains latent variables that are used during the encoding process. These variables may represent additional information or constraints that guide the encoding, ensuring that the resulting vision states are consistent with the intended video output.

enable_offloading

The enable_offloading parameter is a boolean option that determines whether offloading is enabled during the encoding process. Offloading can help manage computational resources by distributing tasks across available hardware, potentially improving performance. The default value is True.

reference_image

The reference_image parameter allows you to provide an optional reference image that can be used to guide the encoding process. This image serves as a visual reference, helping to ensure that the encoded vision states align with specific visual characteristics. The default value is None.

vision_num_semantic_tokens

The vision_num_semantic_tokens parameter specifies the number of semantic tokens to be used in the vision encoding process. These tokens represent distinct visual features or concepts extracted from the input images. The default value is 729.

vision_states_dim

The vision_states_dim parameter defines the dimensionality of the vision states produced by the encoder. This dimension determines the size and complexity of the encoded representation, with a default value of 1152.

HunyuanVideo Vision Encode Output Parameters:

vision_states

The vision_states output parameter represents the encoded visual data in the form of vision states. These states are a structured representation of the input images, capturing essential visual features and characteristics. The vision states are crucial for subsequent stages of video generation, as they provide the foundational data needed to synthesize high-quality video content.

HunyuanVideo Vision Encode Usage Tips:

Ensure that the vision_encoder model is appropriately selected based on the desired output quality and characteristics, as different models may offer varying levels of detail and feature extraction capabilities.
Utilize the reference_image parameter to guide the encoding process when specific visual characteristics are required, ensuring that the resulting vision states align with your creative vision.

HunyuanVideo Vision Encode Common Errors and Solutions:

Error: "Invalid vision encoder model"

Explanation: This error occurs when the specified vision_encoder model is not recognized or is incompatible with the node.
Solution: Verify that the vision_encoder parameter is set to a valid and supported model. Consult the documentation for a list of compatible models.

Error: "Configuration settings missing in hyvid_cfg"

Explanation: This error indicates that essential configuration settings are missing from the hyvid_cfg parameter.
Solution: Ensure that all necessary configuration options are included in the hyvid_cfg parameter. Refer to the documentation for required settings.

Error: "Latents dictionary is empty"

Explanation: This error suggests that the latents_dict parameter does not contain any latent variables, which are needed for the encoding process.
Solution: Populate the latents_dict with appropriate latent variables to guide the encoding process effectively.

HunyuanVideo Vision Encode Related Nodes

Go back to the extension to check out more related nodes.

HunyuanVideo-1.5 nodes

Table of Content

Description
HyVideo15VisionEncode:
HyVideo15VisionEncode Input Parameters:
HyVideo15VisionEncode Output Parameters:
HyVideo15VisionEncode Usage Tips:
HyVideo15VisionEncode Common Errors and Solutions:
Related Nodes

Flux Krea Dev | Natural Text to Image

The best open-source FLUX model! Absolutely incredible natural results.

Wan 2.2 Low Vram | Kijai Wrapper

Low VRAM. No longer waiting. Kijai wrapper included.

Qwen Image Edit Plus 2509 LoRA Inference | AI Toolkit ComfyUI

Apply AI Toolkit-trained Qwen Image Edit Plus 2509 LoRAs in ComfyUI with preview-aligned edits using a single RCQwenImageEditPlus custom node.

FLUX IPAdapter V1 | XLabs

Adapt pre-trained models to specific image styles for stunning 512x512 and 1024x1024 visuals.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: HunyuanVideo Vision Encode

HyVideo15VisionEncode

How to Install HunyuanVideo-1.5 nodes

HunyuanVideo Vision Encode Description

HunyuanVideo Vision Encode:

HunyuanVideo Vision Encode Input Parameters:

vision_encoder

hyvid_cfg

latents_dict

enable_offloading

reference_image

vision_num_semantic_tokens

vision_states_dim

HunyuanVideo Vision Encode Output Parameters:

vision_states

HunyuanVideo Vision Encode Usage Tips:

HunyuanVideo Vision Encode Common Errors and Solutions:

Error: "Invalid vision encoder model"

Error: "Configuration settings missing in hyvid_cfg"

Error: "Latents dictionary is empty"

HunyuanVideo Vision Encode Related Nodes