Save Training Dataset:
The SaveTrainingDataset node is designed to facilitate the storage of encoded training datasets, which include both latents and conditioning data, onto your disk. This node is particularly beneficial for AI artists and developers who are working with machine learning models and need to manage large datasets efficiently. By saving the datasets in a structured manner, it allows for easy retrieval and reuse, thus optimizing the workflow and saving time. The node is experimental and serves as an output node, meaning it is primarily used to export data rather than process it further. Its main goal is to ensure that your training data is securely stored in a specified directory, organized into manageable shard files, which can be particularly useful when dealing with extensive datasets.
Save Training Dataset Input Parameters:
latents
This parameter accepts a list of latent dictionaries generated from the MakeTrainingDataset node. Latents are essentially encoded representations of your data, which are crucial for training machine learning models. By providing these latents, you ensure that the node has the necessary data to save. There are no specific minimum or maximum values for this parameter, as it depends on the dataset size you are working with.
conditioning
The conditioning parameter takes a list of conditioning lists from the MakeTrainingDataset node. Conditioning data is used to influence the model's behavior during training, providing context or additional information that can guide the learning process. Like latents, this parameter does not have predefined limits, as it varies based on your specific dataset and training requirements.
folder_name
This string parameter specifies the name of the folder where the dataset will be saved within the output directory. The default value is "training_dataset". This allows you to organize your datasets effectively, making it easier to locate and manage them later. You can customize this name to suit your project needs.
shard_size
Shard size determines the number of samples per shard file, with a default value of 1000. The minimum value is 1, and the maximum is 100000. This parameter is advanced and allows you to control how your dataset is divided into smaller, more manageable files. Adjusting the shard size can impact the performance and efficiency of data loading and processing, especially when dealing with large datasets.
Save Training Dataset Output Parameters:
This node does not produce any direct output parameters. Its primary function is to save the input data to disk, so the results are not returned as outputs but are instead stored in the specified directory for future use.
Save Training Dataset Usage Tips:
- Ensure that the
folder_nameis unique or descriptive enough to avoid overwriting existing datasets and to make retrieval easier. - Adjust the
shard_sizebased on your system's capabilities and the size of your dataset to optimize performance. Smaller shard sizes can make data loading faster but may increase the number of files.
Save Training Dataset Common Errors and Solutions:
"Folder already exists"
- Explanation: This error occurs when the specified
folder_namealready exists in the output directory, which could lead to data being overwritten. - Solution: Choose a different
folder_nameor ensure that the existing folder is backed up or cleared if you intend to overwrite it.
"Invalid shard size"
- Explanation: This error is triggered when the
shard_sizeis set outside the allowed range of 1 to 100000. - Solution: Adjust the
shard_sizeto fall within the specified range to ensure proper dataset sharding.
