RunComfy

Flux Klein Face Swap | Realistic AI Face Editor

Swap faces perfectly. Natural, lifelike, and fast AI-powered editing.

Z Image Turbo | Ultra-Fast Photorealistic Generator

Generate ultra-clear visuals fast with unmatched real-time detail.

SeedVR2 V2.5 | AI Video Upscaling Workflow

Upscale videos fast with sharp, smooth, cinematic results.

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

ComfyUI > Nodes > ComfyUI > Kling 2.6 Text to Video with Audio

ComfyUI Node: Kling 2.6 Text to Video with Audio

Class Name

KlingTextToVideoWithAudio

Category
api node/video/Kling

Author
ComfyAnonymous (Account age: 763days) Extension
ComfyUI Latest Updated
2026-05-13 Github Stars
112.77K

Github Ask ComfyAnonymous Current Questions Past Questions

Table of Content

Description
KlingTextToVideoWithAudio:
KlingTextToVideoWithAudio Input Parameters:
KlingTextToVideoWithAudio Output Parameters:
KlingTextToVideoWithAudio Usage Tips:
KlingTextToVideoWithAudio Common Errors and Solutions:
Related Nodes

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Kling 2.6 Text to Video with Audio Description

Transform text prompts into engaging videos with synchronized audio using advanced audio-to-video modeling techniques for immersive multimedia content creation.

Kling 2.6 Text to Video with Audio:

The KlingTextToVideoWithAudio node is designed to transform text prompts into engaging video content with synchronized audio. This node leverages advanced audio-to-video modeling techniques to create videos that not only visually represent the input text but also incorporate audio elements, enhancing the overall storytelling experience. By integrating audio, the node provides a more immersive and dynamic output, making it ideal for creating multimedia content that requires both visual and auditory components. This node is particularly beneficial for AI artists and content creators looking to produce videos that are both visually appealing and audibly engaging, without needing extensive technical expertise in video production.

Kling 2.6 Text to Video with Audio Input Parameters:

video

The video parameter is an input that requires a video file where the text-to-video transformation will be applied. This video serves as the base visual content that will be modified according to the text prompt and synchronized with the audio. The video file should be between 2 to 10 seconds in length, with a resolution ranging from 720px to 1920px. This ensures that the video is of sufficient quality for processing and that the duration is manageable for the node's capabilities.

audio

The audio parameter accepts an audio file that will be synchronized with the video. This audio file should contain clear and distinguishable vocals to ensure accurate lip-syncing and should not exceed 5MB in size. The audio content is crucial for creating a cohesive multimedia output, as it provides the auditory component that complements the visual elements of the video.

voice_language

The voice_language parameter specifies the language of the audio content. It is important for ensuring that the lip-syncing process accurately matches the spoken language in the audio file. The default language is set to English (en), but other language options are available depending on the supported languages in the KlingLipSyncVoiceLanguage enumeration.

Kling 2.6 Text to Video with Audio Output Parameters:

video

The video output is the final video file that has been processed to include the text-to-video transformation and synchronized audio. This output video is the primary result of the node's operation, showcasing the integration of visual and auditory elements based on the input parameters.

video_id

The video_id output is a string that uniquely identifies the processed video. This identifier can be used to reference the video in subsequent operations or for organizational purposes within a larger project.

duration

The duration output provides the length of the processed video in seconds. This information is useful for verifying that the video meets the desired specifications and for planning further editing or integration into other media.

Kling 2.6 Text to Video with Audio Usage Tips:

Ensure that the audio file contains clear vocals to achieve accurate lip-syncing with the video.
Use video files that are within the specified resolution and duration limits to optimize processing efficiency and output quality.
Experiment with different voice languages to see how the node handles various linguistic nuances in the lip-syncing process.

Kling 2.6 Text to Video with Audio Common Errors and Solutions:

"Audio file too large"

Explanation: The audio file exceeds the maximum size limit of 5MB.
Solution: Compress the audio file to reduce its size or select a different audio file that meets the size requirements.

"Video file too large or incorrect dimensions"

Explanation: The video file is either larger than 100MB or does not meet the required resolution specifications.
Solution: Resize or compress the video file to fit within the 720px to 1920px resolution range and ensure it is under 100MB.

"Unsupported voice language"

Explanation: The specified voice language is not supported by the node.
Solution: Choose a supported language from the KlingLipSyncVoiceLanguage options, ensuring it matches the language of the audio content.

Kling 2.6 Text to Video with Audio Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI

Table of Content

Description
KlingTextToVideoWithAudio:
KlingTextToVideoWithAudio Input Parameters:
KlingTextToVideoWithAudio Output Parameters:
KlingTextToVideoWithAudio Usage Tips:
KlingTextToVideoWithAudio Common Errors and Solutions:
Related Nodes

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

ComfyUI UltraShape 1.0 | 3D Mesh Refinement Tool

Refines 3D meshes fast for clean, smooth, optimized models.

Push-In Camera - A Motion LoRA for Wan 2.1

One image in, blockbuster push-in shots out. Zero complexity.

SAM 3D ComfyUI | Object & Body Animation

Create realistic 3D motion and animation from static images instantly.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: Kling 2.6 Text to Video with Audio

KlingTextToVideoWithAudio

How to Install ComfyUI

Kling 2.6 Text to Video with Audio Description

Kling 2.6 Text to Video with Audio:

Kling 2.6 Text to Video with Audio Input Parameters:

video

audio

voice_language

Kling 2.6 Text to Video with Audio Output Parameters:

video

video_id

duration

Kling 2.6 Text to Video with Audio Usage Tips:

Kling 2.6 Text to Video with Audio Common Errors and Solutions:

"Audio file too large"

"Video file too large or incorrect dimensions"

"Unsupported voice language"

Kling 2.6 Text to Video with Audio Related Nodes