AceStep 1.5 SFT Generate:
AceStepSFTGenerate is an all-in-one node designed for the ComfyUI platform, specifically tailored for generating audio outputs using the AceStep 1.5 SFT pipeline. This node integrates several complex processes into a single, streamlined operation, including latent creation, text encoding, sampling with Adaptive Projected Guidance (APG) or Adaptive Dynamic Guidance (ADG), and VAE decoding. The primary goal of this node is to simplify the audio generation process while maintaining high-quality outputs that match the standards of the official AceStep Gradio pipeline. By consolidating these functions, AceStepSFTGenerate offers a user-friendly interface for AI artists to produce sophisticated audio content without needing to manage multiple nodes or processes separately.
AceStep 1.5 SFT Generate Input Parameters:
T
This parameter controls the timestep schedule shift applied to model sampling. It directly influences the denoising process, with a default value of 3.0, a minimum of 0.0, and a maximum of 5.0. Adjusting this value can affect the smoothness and clarity of the generated audio, where a value of 1.0 corresponds to linear sigma mapping.
lora
This parameter allows you to stack one or more AceStep 1.5 SFT LoRA Loader nodes. It is used to incorporate additional learned representations into the generation process, enhancing the model's ability to produce diverse and high-quality audio outputs.
style_tags
This parameter accepts tags from the Music Analyzer node, which are appended to the caption when connected. It is a string input that allows for stylistic customization of the generated audio, ensuring that the output aligns with specific artistic or thematic requirements.
style_bpm
This integer parameter sets the beats per minute (BPM) from the Music Analyzer node. It overrides the default BPM when a value greater than 0 is provided, allowing for precise control over the tempo of the generated audio.
style_keyscale
This string parameter specifies the key or scale from the Music Analyzer node. When not empty, it overrides the default keyscale, enabling users to tailor the harmonic structure of the audio output to their preferences.
omega
This parameter applies AceStep's official omega logistic rescale to each model output before the sampler step. It has a default value of 0.0, with a range from -8.0 to 8.0, and adjusts the granularity of the output, affecting the overall tonal balance and dynamics.
erg_scale
This parameter approximates AceStep's official ERG by building a weaker auxiliary tag/lyric branch for guidance. Positive values enhance this effect, with a default of 0.0 and a range from -0.9 to 2.0. It influences the guidance baseline within the active interval, impacting the expressiveness of the audio.
cfg_interval_start
This parameter defines the start of the CFG/APG guidance application as a fraction of the schedule. It ranges from 0.0 to 1.0, with a default of 0.0, and determines when the guidance begins, affecting the initial stages of audio generation.
cfg_interval_end
This parameter sets the end of the CFG/APG guidance application as a fraction of the schedule. It ranges from 0.0 to 1.0, with a default of 1.0, and specifies when the guidance stops, influencing the latter stages of audio generation.
AceStep 1.5 SFT Generate Output Parameters:
audio_output
This output parameter provides the final audio result generated by the node. It represents the culmination of the latent creation, text encoding, sampling, and decoding processes, delivering a high-quality audio file that reflects the input parameters and guidance applied.
out_latent
This parameter outputs the latent representation used during the generation process. It is crucial for understanding the intermediate state of the audio before decoding, offering insights into the model's internal workings and potential areas for further refinement.
positive_conditioning
This output provides the positive conditioning data used during the generation process. It is essential for replicating or modifying the generation process, as it contains the specific conditions that guided the model towards the desired output.
negative_conditioning
This output delivers the negative conditioning data, which is used to steer the model away from undesired outcomes. It complements the positive conditioning by providing a balanced approach to guidance, ensuring the audio output meets the intended artistic goals.
AceStep 1.5 SFT Generate Usage Tips:
- Experiment with the
Tparameter to find the optimal balance between smoothness and clarity in your audio outputs, especially if you are aiming for a specific style or genre. - Utilize the
loraparameter to incorporate additional learned representations, which can significantly enhance the diversity and quality of your audio outputs. - Adjust the
style_bpmandstyle_keyscaleparameters to align the generated audio with specific musical requirements, ensuring that the tempo and harmonic structure meet your artistic vision.
AceStep 1.5 SFT Generate Common Errors and Solutions:
"[AceStep SFT] WARNING: denoise < 1.0 is being used with LATENT/AUDIO from another node, but external positive_conditioning/negative_conditioning was not connected."
- Explanation: This warning indicates that the denoise parameter is set to a value less than 1.0, but the necessary conditioning data from another node is not connected, which may lead to artifacts in the audio output.
- Solution: Ensure that the positive and negative conditioning data from the relevant nodes are properly connected to avoid inconsistencies and potential artifacts in the generated audio.
