ComfyUI
Playground
Pricing

RunComfy

Pyramid Flow | Video Generation

Including both text-to-video and image-to-video mode.

SkyReels-A2 | Multi-Element Video Generation

Combine multi elements into dynamic videos with precision.

VACE 14B: All-in-One Video Creation & Editing

Create, edit and transform videos with the powerful VACE Wan2.1 14B.

Wonder3D | ComfyUI 3D Pack

Generate multi-view normal maps and color images for 3D assets.

ComfyUI > Nodes > ComfyUI > TextEncodeHunyuanVideo_ImageToVideo

ComfyUI Node: TextEncodeHunyuanVideo_ImageToVideo

Class Name

TextEncodeHunyuanVideo_ImageToVideo

Category
advanced/conditioning

Author
ComfyAnonymous (Account age: 872days) Extension
ComfyUI Latest Updated
2025-05-13 Github Stars
76.71K

Github Ask ComfyAnonymous Current Questions Past Questions

Table of Content

Description
TextEncodeHunyuanVideo_ImageToVideo:
TextEncodeHunyuanVideo_ImageToVideo Input Parameters:
TextEncodeHunyuanVideo_ImageToVideo Output Parameters:
TextEncodeHunyuanVideo_ImageToVideo Usage Tips:
TextEncodeHunyuanVideo_ImageToVideo Common Errors and Solutions:
Related Nodes

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

TextEncodeHunyuanVideo_ImageToVideo Description

Transforms text and images into video conditioning data for nuanced video creation influenced by both text and visuals.

TextEncodeHunyuanVideo_ImageToVideo:

The TextEncodeHunyuanVideo_ImageToVideo node is designed to facilitate the transformation of textual prompts into video conditioning data, leveraging both text and image inputs to guide the video generation process. This node is particularly useful for AI artists who wish to create videos that are influenced by both descriptive text and visual elements. By integrating the capabilities of the CLIP model and vision outputs, it allows for a nuanced and dynamic interaction between text prompts and image references, enabling the creation of videos that are richly detailed and contextually relevant. The node's primary function is to encode these inputs into a format that can be used to condition video generation models, ensuring that the resulting videos align closely with the specified themes, styles, and visual cues.

TextEncodeHunyuanVideo_ImageToVideo Input Parameters:

clip

The clip parameter refers to the CLIP model instance used for processing the text and image inputs. It plays a crucial role in tokenizing the text prompt and integrating the visual data from the image, ensuring that both elements are effectively combined to influence the video generation process.

clip_vision_output

The clip_vision_output parameter represents the output from the CLIP model's vision component. This output is used to provide visual context to the text prompt, allowing the node to create a more cohesive and visually informed video conditioning data. It is essential for aligning the visual style and content of the video with the reference image.

prompt

The prompt parameter is a string input that allows you to specify the textual description or theme for the video. This parameter supports multiline and dynamic prompts, enabling you to craft detailed and complex descriptions that guide the video generation process. The prompt serves as the primary narrative or thematic guide for the video.

image_interleave

The image_interleave parameter is an integer that determines the degree of influence the image has over the text prompt in the video generation process. With a default value of 2, it can range from 1 to 512, where a higher value indicates a stronger influence from the text prompt. This parameter allows you to balance the impact of the text and image inputs, tailoring the video output to your creative vision.

TextEncodeHunyuanVideo_ImageToVideo Output Parameters:

CONDITIONING

The CONDITIONING output is the encoded data that results from processing the text and image inputs through the node. This output is crucial for conditioning video generation models, as it encapsulates the combined influence of the text prompt and visual data, ensuring that the generated video aligns with the specified themes and visual cues.

TextEncodeHunyuanVideo_ImageToVideo Usage Tips:

Experiment with different image_interleave values to find the right balance between text and image influence for your specific project needs.
Use detailed and descriptive prompts to fully leverage the node's ability to create contextually rich video conditioning data.
Consider using high-quality and relevant images as input to enhance the visual coherence and style of the generated video.

TextEncodeHunyuanVideo_ImageToVideo Common Errors and Solutions:

Invalid CLIP model instance

Explanation: This error occurs when the clip parameter is not properly initialized or is incompatible with the node's requirements.
Solution: Ensure that the clip parameter is correctly set to a valid CLIP model instance that supports both text and vision processing.

Incompatible clip_vision_output

Explanation: This error arises when the clip_vision_output does not match the expected format or is not derived from a compatible CLIP model.
Solution: Verify that the clip_vision_output is obtained from the vision component of a compatible CLIP model and is correctly formatted.

Prompt exceeds token limit

Explanation: The text prompt may be too long, exceeding the tokenization limit of the CLIP model.
Solution: Shorten the prompt or break it into smaller segments to fit within the tokenization limits of the model.

TextEncodeHunyuanVideo_ImageToVideo Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI

Table of Content

Description
TextEncodeHunyuanVideo_ImageToVideo:
TextEncodeHunyuanVideo_ImageToVideo Input Parameters:
TextEncodeHunyuanVideo_ImageToVideo Output Parameters:
TextEncodeHunyuanVideo_ImageToVideo Usage Tips:
TextEncodeHunyuanVideo_ImageToVideo Common Errors and Solutions:
Related Nodes

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

Hunyuan Video | Image-Prompt to Video

Convert an image and a text prompt into a dynamic video.

IPAdapter Plus (V2) | One-Image Style Transfer

Use IPAdapter Plus and ControlNet for precise style transfer with a single reference image.

DreamO | Unified Multi-Task Image Customization Framework

Perform identity, style, try-on, and multi-condition image generation from 1–3 references

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy