RunComfy

Wan 2.2 FLF2V | First-Last Frame Video Generation

Generate smooth videos from a start and end frame using Wan 2.2 FLF2V.

Face Detailer | Fix Faces

Use Face Detailer first for facial restoration, followed by the 4x UltraSharp Model for superior upscaling.

VACE Wan2.1 | V2V

Transform videos with a reference style image using VACE Wan2.1.

Controllable Animation in AI Video | Motion Control Tool

Make videos obey your motion rules instantly and precisely.

ComfyUI > Nodes > ComfyUI-Qwen3-TTS > Qwen3-TTS Audio Compare

ComfyUI Node: Qwen3-TTS Audio Compare

Class Name

Qwen3AudioCompare

Category
Qwen3-TTS/Evaluation

Author
wanaigc (Account age: 0days) Extension
ComfyUI-Qwen3-TTS Latest Updated
2026-03-21 Github Stars
0.09K

Github Ask wanaigc Current Questions Past Questions

Table of Content

Description
Qwen3AudioCompare:
Qwen3AudioCompare Input Parameters:
Qwen3AudioCompare Output Parameters:
Qwen3AudioCompare Usage Tips:
Qwen3AudioCompare Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-Qwen3-TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-Qwen3-TTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Qwen3-TTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Qwen3-TTS Audio Compare Description

Compares audio samples for speaker similarity, mel spectrogram distance, and speaking rate.

Qwen3-TTS Audio Compare:

The Qwen3AudioCompare node is designed to evaluate and compare two audio samples, specifically focusing on their speaker similarity, mel spectrogram distance, and speaking rate. This node is particularly useful for AI artists and developers working with text-to-speech (TTS) systems, as it provides a comprehensive analysis of how closely a generated audio sample matches a reference audio in terms of voice characteristics and pacing. By leveraging a speaker encoder model, the node calculates a speaker similarity score, which indicates how well the generated voice matches the reference voice. Additionally, it computes the mel spectrogram distance to assess the acoustic similarity and evaluates the speaking rate to ensure the generated audio maintains a natural pace. The node outputs a detailed report that includes these metrics, offering valuable insights into the quality and accuracy of TTS outputs.

Qwen3-TTS Audio Compare Input Parameters:

reference_audio

The reference_audio parameter is the original audio sample that serves as the benchmark for comparison. It is crucial for determining the baseline characteristics of the speaker's voice, which the generated audio will be compared against. This parameter should be provided in the ComfyUI audio format, which includes a waveform and a sample rate. The quality and clarity of the reference audio can significantly impact the accuracy of the comparison results.

generated_audio

The generated_audio parameter is the audio sample produced by a TTS system that you wish to evaluate. Like the reference audio, it should be in the ComfyUI audio format. This parameter is analyzed against the reference audio to determine how closely it matches in terms of speaker similarity, acoustic features, and speaking rate. Ensuring that the generated audio is of high quality will help in obtaining more reliable comparison results.

speaker_encoder_model

The speaker_encoder_model parameter specifies the model used to extract speaker embeddings from the audio samples. This model plays a critical role in calculating the speaker similarity score, which measures how closely the generated voice matches the reference voice. The choice of speaker encoder model can affect the sensitivity and accuracy of the similarity assessment.

local_model_path

The local_model_path parameter is an optional path to a locally stored speaker encoder model. If provided, the node will use this model instead of a default or pre-loaded model. This allows for flexibility in using custom or specialized models that may better suit specific use cases or provide improved performance for certain types of voices.

Qwen3-TTS Audio Compare Output Parameters:

report

The report output parameter is a comprehensive text report that summarizes the results of the audio comparison. It includes the speaker similarity score, mel spectrogram distance, speaking rate ratio, and an overall rating of the voice match quality. The report also provides an interpretation guide to help you understand the significance of the scores and metrics, making it easier to assess the performance of TTS systems and make informed decisions about potential improvements.

Qwen3-TTS Audio Compare Usage Tips:

Ensure that both the reference and generated audio samples are of high quality and free from noise to improve the accuracy of the comparison results.
Use a speaker encoder model that is well-suited to the type of voices you are working with, as this can significantly impact the speaker similarity score.
Consider using the local_model_path parameter to experiment with different speaker encoder models and find the one that provides the best results for your specific application.

Qwen3-TTS Audio Compare Common Errors and Solutions:

"Speaker encoder model not found"

Explanation: This error occurs when the specified speaker encoder model cannot be located or loaded.
Solution: Ensure that the speaker_encoder_model parameter is correctly specified and that the model file is accessible. If using a local model, verify the local_model_path is correct.

"Mismatch in sample rates"

Explanation: This error indicates that the reference and generated audio samples have different sample rates, which can affect the comparison.
Solution: Ensure both audio samples have the same sample rate before inputting them into the node. You may need to resample one of the audio files to match the other's sample rate.

"Invalid audio format"

Explanation: This error suggests that the audio inputs are not in the expected ComfyUI format.
Solution: Verify that both reference_audio and generated_audio are provided in the correct format, including a waveform tensor and a sample rate integer.

Qwen3-TTS Audio Compare Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-Qwen3-TTS

Table of Content

Description
Qwen3AudioCompare:
Qwen3AudioCompare Input Parameters:
Qwen3AudioCompare Output Parameters:
Qwen3AudioCompare Usage Tips:
Qwen3AudioCompare Common Errors and Solutions:
Related Nodes

Flux Kontext Character Turnaround Sheet LoRA

Generate 5-pose character turnaround sheets from single image

SAM 3 | Advanced Object Segmentation Tool

Next-gen segmentation tool for precise object masking and tracking.

SCAIL Model | Pose-Guided Animation Maker

Pose-driven animation with identity stability and motion precision.

Advanced Live Portrait | Parameter Control

Use customizable parameters to control every feature, from eye blinks to head movements, for natural results.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy