ComfyUI > Nodes > ComfyUI > Kling 2.6 Text to Video with Audio

ComfyUI Node: Kling 2.6 Text to Video with Audio

Class Name

KlingTextToVideoWithAudio

Category
api node/video/Kling
Author
ComfyAnonymous (Account age: 763days)
Extension
ComfyUI
Latest Updated
2026-05-13
Github Stars
112.77K

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Kling 2.6 Text to Video with Audio Description

Transform text prompts into engaging videos with synchronized audio using advanced audio-to-video modeling techniques for immersive multimedia content creation.

Kling 2.6 Text to Video with Audio:

The KlingTextToVideoWithAudio node is designed to transform text prompts into engaging video content with synchronized audio. This node leverages advanced audio-to-video modeling techniques to create videos that not only visually represent the input text but also incorporate audio elements, enhancing the overall storytelling experience. By integrating audio, the node provides a more immersive and dynamic output, making it ideal for creating multimedia content that requires both visual and auditory components. This node is particularly beneficial for AI artists and content creators looking to produce videos that are both visually appealing and audibly engaging, without needing extensive technical expertise in video production.

Kling 2.6 Text to Video with Audio Input Parameters:

video

The video parameter is an input that requires a video file where the text-to-video transformation will be applied. This video serves as the base visual content that will be modified according to the text prompt and synchronized with the audio. The video file should be between 2 to 10 seconds in length, with a resolution ranging from 720px to 1920px. This ensures that the video is of sufficient quality for processing and that the duration is manageable for the node's capabilities.

audio

The audio parameter accepts an audio file that will be synchronized with the video. This audio file should contain clear and distinguishable vocals to ensure accurate lip-syncing and should not exceed 5MB in size. The audio content is crucial for creating a cohesive multimedia output, as it provides the auditory component that complements the visual elements of the video.

voice_language

The voice_language parameter specifies the language of the audio content. It is important for ensuring that the lip-syncing process accurately matches the spoken language in the audio file. The default language is set to English (en), but other language options are available depending on the supported languages in the KlingLipSyncVoiceLanguage enumeration.

Kling 2.6 Text to Video with Audio Output Parameters:

video

The video output is the final video file that has been processed to include the text-to-video transformation and synchronized audio. This output video is the primary result of the node's operation, showcasing the integration of visual and auditory elements based on the input parameters.

video_id

The video_id output is a string that uniquely identifies the processed video. This identifier can be used to reference the video in subsequent operations or for organizational purposes within a larger project.

duration

The duration output provides the length of the processed video in seconds. This information is useful for verifying that the video meets the desired specifications and for planning further editing or integration into other media.

Kling 2.6 Text to Video with Audio Usage Tips:

  • Ensure that the audio file contains clear vocals to achieve accurate lip-syncing with the video.
  • Use video files that are within the specified resolution and duration limits to optimize processing efficiency and output quality.
  • Experiment with different voice languages to see how the node handles various linguistic nuances in the lip-syncing process.

Kling 2.6 Text to Video with Audio Common Errors and Solutions:

"Audio file too large"

  • Explanation: The audio file exceeds the maximum size limit of 5MB.
  • Solution: Compress the audio file to reduce its size or select a different audio file that meets the size requirements.

"Video file too large or incorrect dimensions"

  • Explanation: The video file is either larger than 100MB or does not meet the required resolution specifications.
  • Solution: Resize or compress the video file to fit within the 720px to 1920px resolution range and ensure it is under 100MB.

"Unsupported voice language"

  • Explanation: The specified voice language is not supported by the node.
  • Solution: Choose a supported language from the KlingLipSyncVoiceLanguage options, ensuring it matches the language of the audio content.

Kling 2.6 Text to Video with Audio Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Kling 2.6 Text to Video with Audio