RunComfy

InfiniteTalk | Lip-Synced Avatar Generator

Photo + Voice = Perfectly Synced Talking Avatar in Minutes

Wan 2.2 Image Generation | 2-in-1 Workflow Pack

MoE Mix + Low-Only with upscale. Pick one.

SDXL Turbo | Rapid Text to Image

Experience fast text-to-image synthesis with SDXL Turbo.

SeedVR2 | Image & Video Upscaler

Fixes blur instantly. Better than Keep/PMRF.

ComfyUI > Nodes > ComfyUI > Make Training Dataset

ComfyUI Node: Make Training Dataset

Class Name

MakeTrainingDataset

Category
dataset

Author
ComfyAnonymous (Account age: 763days) Extension
ComfyUI Latest Updated
2026-05-13 Github Stars
112.77K

Github Ask ComfyAnonymous Current Questions Past Questions

Table of Content

Description
MakeTrainingDataset:
MakeTrainingDataset Input Parameters:
MakeTrainingDataset Output Parameters:
MakeTrainingDataset Usage Tips:
MakeTrainingDataset Common Errors and Solutions:
Related Nodes

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Make Training Dataset Description

Facilitates creation of training dataset by encoding images and text for ML models using VAE and CLIP.

Make Training Dataset:

The MakeTrainingDataset node is designed to facilitate the creation of a training dataset by encoding images and text into a format suitable for machine learning models. This node leverages Variational Autoencoders (VAE) to transform images into latent representations and uses the CLIP model to encode text into conditioning data. The primary benefit of this node is its ability to convert raw data into a structured format that can be directly used for training AI models, particularly in tasks involving image and text data. By automating the encoding process, it simplifies the preparation of datasets, making it easier for AI artists to focus on creative aspects rather than technical preprocessing. This node is particularly useful for those looking to train models on custom datasets, as it ensures that the data is encoded consistently and efficiently.

Make Training Dataset Input Parameters:

images

This parameter accepts a list of images that you wish to encode into latent representations. The images are processed using a VAE model, which compresses them into a lower-dimensional space while preserving essential features. This transformation is crucial for reducing the complexity of the data and making it manageable for training purposes. There are no specific minimum or maximum values for this parameter, but the quality and size of the images can impact the encoding results.

vae

The VAE model input is used to encode the images into latent representations. This model is responsible for transforming high-dimensional image data into a compact latent space, which is essential for efficient training. The choice of VAE model can affect the quality of the latent representations, so selecting a model that aligns with your training goals is important.

clip

This parameter requires a CLIP model, which is used to encode text data into conditioning information. The CLIP model is adept at understanding and representing text in a way that can be used alongside image data in training. The encoded text serves as additional context or guidance for the model during training, enhancing its ability to learn meaningful patterns.

texts

The texts parameter is an optional input that accepts a list of text captions corresponding to the images. These captions can be of varying lengths: they can match the number of images, be a single caption repeated for all images, or be omitted entirely, in which case an empty string is used. The text data provides additional context for the images, which can improve the model's understanding and performance during training.

Make Training Dataset Output Parameters:

latents

The latents output is a list of latent dictionaries, each representing the encoded form of an input image. These latent representations are crucial for training machine learning models, as they capture the essential features of the images in a compact form. The latents serve as the primary input for model training, enabling efficient and effective learning.

conditioning

The conditioning output is a list of conditioning lists, each corresponding to the encoded text data. This output provides the contextual information derived from the text captions, which can be used to guide the model during training. The conditioning data helps the model to associate textual descriptions with visual features, enhancing its ability to learn complex relationships.

Make Training Dataset Usage Tips:

Ensure that the images provided are of high quality and relevant to the training task, as this will improve the quality of the latent representations.
Choose a VAE model that is well-suited to the type of images you are working with to ensure optimal encoding performance.
If using text captions, ensure they are descriptive and relevant to the images to provide meaningful conditioning data for the model.
Experiment with different CLIP models to find one that best captures the nuances of your text data and aligns with your training objectives.

Make Training Dataset Common Errors and Solutions:

Error: "VAE model not found"

Explanation: This error occurs when the specified VAE model is not available or incorrectly specified.
Solution: Verify that the VAE model is correctly installed and specified in the input parameters. Ensure that the model path or identifier is correct.

Error: "CLIP model not found"

Explanation: This error indicates that the specified CLIP model is missing or incorrectly specified.
Solution: Check that the CLIP model is properly installed and specified in the input parameters. Confirm that the model path or identifier is accurate.

Error: "Mismatch between number of images and texts"

Explanation: This error arises when the number of images does not match the number of text captions provided.
Solution: Ensure that the number of text captions matches the number of images, or provide a single caption to be used for all images if applicable.

Make Training Dataset Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI

Table of Content

Description
MakeTrainingDataset:
MakeTrainingDataset Input Parameters:
MakeTrainingDataset Output Parameters:
MakeTrainingDataset Usage Tips:
MakeTrainingDataset Common Errors and Solutions:
Related Nodes

Controllable Animation in AI Video | Motion Control Tool

Make videos obey your motion rules instantly and precisely.

Qwen-Image Lightning | 8-Step Speed Boost

Cut generation time in half.

Z-Image Turbo LoRA Inference | AI Toolkit ComfyUI

Run your AI Toolkit-trained Z-Image Turbo LoRA in ComfyUI with training-matched defaults using a single RC custom node.

FLUX LoRA (RealismLoRA) | Photorealistic Images

Blend FLUX-1 model with FLUX-RealismLoRA for photorealistic AI images

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy