ACE-Step Settings:
The AceStepSettings node is designed to configure and manage the settings for generating audio using the ACE-Step framework. This node is integral for setting up the parameters that influence the behavior of language models (LM) and diffusion models (DiT) in the audio generation process. By providing a comprehensive set of options, it allows you to fine-tune the generation process, ensuring that the output aligns with your creative vision. The node's primary goal is to offer flexibility and control over the audio generation process, making it a valuable tool for AI artists looking to explore and create unique audio experiences.
ACE-Step Settings Input Parameters:
seed
The seed parameter is a string that determines the randomness of the audio generation process. By default, it is set to "-1", which means a random seed is used each time, ensuring varied outputs. If a specific seed is provided, it will produce the same output for the same input, allowing for reproducibility.
thinking
The thinking parameter is a boolean that, when enabled, activates the 5Hz language model audio code generation in llm_dit mode. This setting is crucial for generating audio codes that are influenced by the language model, providing a more dynamic and context-aware audio output. The default value is True.
use_cot_caption
The use_cot_caption parameter is a boolean that enables the language model to rewrite or enhance captions using Chain of Thought (CoT) reasoning. This feature allows for more sophisticated and contextually rich captions, enhancing the overall quality of the generated audio. The default value is True.
use_cot_language
The use_cot_language parameter is a boolean that allows the language model to automatically detect the vocal language. This feature is particularly useful for generating audio in multiple languages or when the input language is not specified. The default value is True.
temperature
The temperature parameter is a float that controls the sampling temperature of the language model. It influences the randomness of the output, with lower values leading to more deterministic results and higher values producing more varied outputs. The default value is 0.85, with a range from 0.0 to 2.0.
lm_cfg_scale
The lm_cfg_scale parameter is a float that sets the classifier-free guidance scale for the language model. This parameter helps balance the influence of the model's predictions and the input prompt, allowing for more controlled outputs. The default value is 2.0, with a range from 1.0 to 5.0.
lm_top_p
The lm_top_p parameter is a float that determines the nucleus sampling top-p for the language model. It controls the diversity of the output by selecting from the top-p probability mass, with a default value of 0.9 and a range from 0.0 to 1.0.
lm_top_k
The lm_top_k parameter is an integer that specifies the top-k sampling for the language model. It limits the sampling to the top-k most probable tokens, with 0 disabling this feature. The default value is 0, with a range from 0 to 200.
dit_guidance_scale
The dit_guidance_scale parameter is a float that sets the classifier-free guidance scale for the diffusion model. This parameter influences the strength of the guidance during the diffusion process, with a default value of 7.0 and a range from 0.0 to 20.0.
dit_inference_steps
The dit_inference_steps parameter is an integer that determines the number of diffusion steps during the inference process. More steps can lead to higher quality outputs but may increase computation time. The default value is 8, with a range from 1 to 200.
dit_infer_method
The dit_infer_method parameter is a list of options that specifies the inference method for the diffusion model, either "ode" or "sde". This choice affects the underlying mathematical approach used during the diffusion process, with "ode" as the default option.
ACE-Step Settings Output Parameters:
settings
The settings output parameter encapsulates all the configured settings for the ACE-Step audio generation process. It serves as a comprehensive package of parameters that guide the behavior of the language and diffusion models, ensuring that the generated audio aligns with the specified configurations. This output is crucial for passing the configured settings to other nodes or processes within the ACE-Step framework.
ACE-Step Settings Usage Tips:
- Experiment with the
temperatureparameter to find the right balance between creativity and coherence in your audio outputs. Lower values will produce more predictable results, while higher values can introduce more variation and surprise. - Utilize the
use_cot_captionanduse_cot_languageparameters to enhance the contextual richness and language adaptability of your audio outputs, especially when working with multilingual content.
ACE-Step Settings Common Errors and Solutions:
Invalid seed value
- Explanation: The seed value provided is not a valid string or number.
- Solution: Ensure that the seed is either a valid string or set to "-1" for random generation.
Temperature out of range
- Explanation: The temperature value is set outside the allowed range of
0.0to2.0. - Solution: Adjust the temperature to be within the specified range to ensure proper functioning.
Unsupported inference method
- Explanation: The
dit_infer_methodis set to a value other than "ode" or "sde". - Solution: Choose either "ode" or "sde" as the inference method to proceed with the diffusion process.
