RunComfy

Wan 2.2 Animate | Character Swap & Lip-Sync

Transforms any face to speak and move like the original with ease.

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

MatAnyone Video Matting | Single Mask Removal

Remove video backgrounds with one mask frame for perfect subject isolation.

Wan 2.1 LoRA

Enhance Wan 2.1 video generation with LoRA models for improved style and customization.

ComfyUI > Nodes > ComfyUI > LTXV Reference Audio (ID-LoRA)

ComfyUI Node: LTXV Reference Audio (ID-LoRA)

Class Name

LTXVReferenceAudio

Category
conditioning/audio

Author
ComfyAnonymous (Account age: 763days) Extension
ComfyUI Latest Updated
2026-05-13 Github Stars
112.77K

Github Ask ComfyAnonymous Current Questions Past Questions

Table of Content

Description
LTXVReferenceAudio:
LTXVReferenceAudio Input Parameters:
LTXVReferenceAudio Output Parameters:
LTXVReferenceAudio Usage Tips:
LTXVReferenceAudio Common Errors and Solutions:
Related Nodes

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

LTXV Reference Audio (ID-LoRA) Description

Specialized node for speaker identity transfer using conditioning format to enhance speaker identity effect in audio synthesis tasks.

LTXV Reference Audio (ID-LoRA):

LTXVReferenceAudio is a specialized node designed for the purpose of speaker identity transfer using ID-LoRA technology. This node encodes a reference audio clip into a conditioning format that can be used to guide the identity of a speaker in audio synthesis tasks. By leveraging this node, you can effectively transfer the unique vocal characteristics of a reference speaker to another audio sample, enhancing the speaker identity effect. This is achieved through an additional forward pass without the reference audio, which amplifies the speaker identity effect in the generated audio. The node is particularly useful in applications where maintaining or transferring speaker identity is crucial, such as in voice cloning or personalized text-to-speech systems.

LTXV Reference Audio (ID-LoRA) Input Parameters:

model

The model parameter refers to the audio synthesis model that will be used for processing the reference audio. It is crucial for defining the framework within which the speaker identity transfer will occur. This parameter does not have specific minimum, maximum, or default values as it depends on the model architecture you are working with.

positive

This parameter represents the positive conditioning set, which is used to guide the model towards desired outcomes. It is essential for setting the context in which the reference audio will be applied. The positive conditioning set is typically a collection of attributes or features that the model should emphasize.

negative

The negative parameter is the counterpart to the positive conditioning set. It is used to specify attributes or features that the model should avoid or minimize in the output. This helps in refining the speaker identity transfer by providing a balanced conditioning context.

reference_audio

The reference_audio parameter is the core input for this node, containing the audio clip that serves as the reference for speaker identity transfer. It must be at least 1.8 seconds long and no longer than 15.1 seconds. The audio is encoded into latents and patchified for integration into the model.

audio_vae

This parameter specifies the Audio Variational Autoencoder (VAE) model used for encoding the reference audio into a latent representation. The VAE model is crucial for transforming the audio waveform into a format that can be processed by the node.

identity_guidance_scale

The identity_guidance_scale parameter controls the strength of the identity guidance applied during the speaker identity transfer. It influences how prominently the reference speaker's identity is reflected in the output. This parameter typically ranges from 0 to a higher value, with higher values increasing the identity effect.

start_percent

This parameter defines the starting point of the identity guidance effect as a percentage of the total processing time. It allows you to control when the identity transfer begins during the audio synthesis process.

end_percent

Similar to start_percent, this parameter specifies the endpoint of the identity guidance effect as a percentage of the total processing time. It helps in determining the duration over which the speaker identity transfer is applied.

LTXV Reference Audio (ID-LoRA) Output Parameters:

waveform

The waveform output is the processed audio waveform that incorporates the speaker identity transfer. It reflects the unique vocal characteristics of the reference speaker as applied to the target audio.

sample_rate

This output parameter indicates the sample rate of the processed audio waveform. It is crucial for ensuring that the audio is played back at the correct speed and quality.

LTXV Reference Audio (ID-LoRA) Usage Tips:

Ensure that your reference audio is between 1.8 and 15.1 seconds long to avoid errors related to audio duration.
Experiment with the identity_guidance_scale to find the optimal balance between maintaining the reference speaker's identity and achieving natural-sounding audio.

LTXV Reference Audio (ID-LoRA) Common Errors and Solutions:

Reference audio is too short: `<duration>`s. Minimum duration is 1.8 seconds.

Explanation: The reference audio provided is shorter than the required minimum duration of 1.8 seconds.
Solution: Use a longer reference audio clip that meets the minimum duration requirement.

Total reference audio duration is `<duration>`s. Maximum is 15.1 seconds.

Explanation: The combined duration of all reference audio clips exceeds the maximum allowed duration of 15.1 seconds.
Solution: Reduce the total duration of the reference audio clips to comply with the maximum limit.

LTXV Reference Audio (ID-LoRA) Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI

Table of Content

Description
LTXVReferenceAudio:
LTXVReferenceAudio Input Parameters:
LTXVReferenceAudio Output Parameters:
LTXVReferenceAudio Usage Tips:
LTXVReferenceAudio Common Errors and Solutions:
Related Nodes

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

Wan2.2 Animate | Photo to Realistic Motion Video

Turn images into lifelike, moving characters with natural body and face motion.

Z-Image | Fast Photorealistic Base Model

Super-fast image maker with stunning clarity and total control.

IPAdapter V1 FaceID Plus | Consistent Characters

Leverage IPAdapter FaceID Plus V2 model to create consistent characters.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy

Save 4 hours! We auto-setup your workflow! Free!

ComfyUI Node: LTXV Reference Audio (ID-LoRA)

LTXVReferenceAudio

How to Install ComfyUI

LTXV Reference Audio (ID-LoRA) Description

LTXV Reference Audio (ID-LoRA):

LTXV Reference Audio (ID-LoRA) Input Parameters:

model

positive

negative

reference_audio

audio_vae

identity_guidance_scale

start_percent

end_percent

LTXV Reference Audio (ID-LoRA) Output Parameters:

waveform

sample_rate

LTXV Reference Audio (ID-LoRA) Usage Tips:

LTXV Reference Audio (ID-LoRA) Common Errors and Solutions:

Reference audio is too short: <duration>s. Minimum duration is 1.8 seconds.

Total reference audio duration is <duration>s. Maximum is 15.1 seconds.

LTXV Reference Audio (ID-LoRA) Related Nodes

Reference audio is too short: `<duration>`s. Minimum duration is 1.8 seconds.

Total reference audio duration is `<duration>`s. Maximum is 15.1 seconds.