ComfyUI > Nodes > ComfyUI > Save Training Dataset

ComfyUI Node: Save Training Dataset

Class Name

SaveTrainingDataset

Category
dataset
Author
ComfyAnonymous (Account age: 763days)
Extension
ComfyUI
Latest Updated
2026-05-13
Github Stars
112.77K

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Save Training Dataset Description

Facilitates storage of encoded training datasets for AI artists and developers, optimizing workflow and saving time.

Save Training Dataset:

The SaveTrainingDataset node is designed to facilitate the storage of encoded training datasets, which include both latents and conditioning data, onto your disk. This node is particularly beneficial for AI artists and developers who are working with machine learning models and need to manage large datasets efficiently. By saving the datasets in a structured manner, it allows for easy retrieval and reuse, thus optimizing the workflow and saving time. The node is experimental and serves as an output node, meaning it is primarily used to export data rather than process it further. Its main goal is to ensure that your training data is securely stored in a specified directory, organized into manageable shard files, which can be particularly useful when dealing with extensive datasets.

Save Training Dataset Input Parameters:

latents

This parameter accepts a list of latent dictionaries generated from the MakeTrainingDataset node. Latents are essentially encoded representations of your data, which are crucial for training machine learning models. By providing these latents, you ensure that the node has the necessary data to save. There are no specific minimum or maximum values for this parameter, as it depends on the dataset size you are working with.

conditioning

The conditioning parameter takes a list of conditioning lists from the MakeTrainingDataset node. Conditioning data is used to influence the model's behavior during training, providing context or additional information that can guide the learning process. Like latents, this parameter does not have predefined limits, as it varies based on your specific dataset and training requirements.

folder_name

This string parameter specifies the name of the folder where the dataset will be saved within the output directory. The default value is "training_dataset". This allows you to organize your datasets effectively, making it easier to locate and manage them later. You can customize this name to suit your project needs.

shard_size

Shard size determines the number of samples per shard file, with a default value of 1000. The minimum value is 1, and the maximum is 100000. This parameter is advanced and allows you to control how your dataset is divided into smaller, more manageable files. Adjusting the shard size can impact the performance and efficiency of data loading and processing, especially when dealing with large datasets.

Save Training Dataset Output Parameters:

This node does not produce any direct output parameters. Its primary function is to save the input data to disk, so the results are not returned as outputs but are instead stored in the specified directory for future use.

Save Training Dataset Usage Tips:

  • Ensure that the folder_name is unique or descriptive enough to avoid overwriting existing datasets and to make retrieval easier.
  • Adjust the shard_size based on your system's capabilities and the size of your dataset to optimize performance. Smaller shard sizes can make data loading faster but may increase the number of files.

Save Training Dataset Common Errors and Solutions:

"Folder already exists"

  • Explanation: This error occurs when the specified folder_name already exists in the output directory, which could lead to data being overwritten.
  • Solution: Choose a different folder_name or ensure that the existing folder is backed up or cleared if you intend to overwrite it.

"Invalid shard size"

  • Explanation: This error is triggered when the shard_size is set outside the allowed range of 1 to 100000.
  • Solution: Adjust the shard_size to fall within the specified range to ensure proper dataset sharding.

Save Training Dataset Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Save Training Dataset