File Loader Crawl Batch (CRT):
The FileLoaderCrawlBatch node is designed to efficiently load and process batches of text files from a specified directory. Its primary purpose is to facilitate the retrieval of multiple text files in a single operation, allowing you to specify the number of files to load, the starting point within the directory, and whether to include subfolders in the search. This node is particularly beneficial for users who need to handle large volumes of text data, as it automates the process of file selection and content extraction. By leveraging a seed-based mechanism, it ensures that the batch selection is consistent and repeatable, which is crucial for tasks that require deterministic outputs. The node also provides flexibility in terms of file filtering based on extensions and limits the number of words per file, making it a versatile tool for managing text data in creative AI projects.
File Loader Crawl Batch (CRT) Input Parameters:
folder_path
This parameter specifies the path to the folder containing the text files you wish to load. It is crucial for directing the node to the correct location in your file system. The default value is an empty string, and it is important to ensure that the path provided is valid and points to a directory, as the node will not function correctly otherwise.
batch_count
This parameter determines the number of files to load in each batch. It allows you to control the volume of data processed at once, with a minimum value of 1 and a maximum of 64. The default is set to 1, enabling you to start with a single file and scale up as needed.
seed
The seed parameter acts as a batch offset, influencing the starting point for file selection within the directory. It ensures that the same set of files is selected consistently across different runs, provided the directory contents remain unchanged. The default value is 0, which starts the selection from the first file.
file_extension
This parameter filters the files based on their extension, allowing you to specify the type of files to be loaded, such as .txt. The default value is .txt, and it is essential to ensure that the extension matches the files you intend to process.
max_words
This parameter limits the number of words extracted from each file, providing control over the amount of text data processed. A value of 0 indicates no limit, allowing the entire content of the file to be loaded. The default is 0, which is suitable for scenarios where full file content is required.
crawl_subfolders
This boolean parameter determines whether the node should include files located in subfolders of the specified directory. The default value is False, meaning only files in the top-level directory are considered. Setting it to True enables a more comprehensive search, which can be useful for deeply nested file structures.
File Loader Crawl Batch (CRT) Output Parameters:
text_output_1, text_output_2, ..., text_output_n
These outputs contain the text content of the files loaded in the batch. Each output corresponds to a file, and the content is limited by the max_words parameter if specified. These outputs are crucial for accessing the actual data within the files, enabling further processing or analysis.
file_name_1, file_name_2, ..., file_name_n
These outputs provide the names of the files that were loaded in the batch. They are essential for identifying which files correspond to the text outputs, allowing you to track and manage the data effectively.
File Loader Crawl Batch (CRT) Usage Tips:
- Ensure the
folder_pathis correctly set to avoid errors related to directory access. - Use the
batch_countparameter to manage memory usage and processing time, especially when dealing with large datasets. - Utilize the
seedparameter to maintain consistency across different runs, which is useful for reproducible experiments. - Set
crawl_subfolderstoTrueif your files are organized in a nested directory structure to ensure all relevant files are included.
File Loader Crawl Batch (CRT) Common Errors and Solutions:
❌ Error: Folder '<folder_path>' not found or is not a directory.
- Explanation: This error occurs when the specified folder path is invalid or does not point to a directory.
- Solution: Verify that the
folder_pathis correct and that it points to an existing directory.
❌ Warning: No files with extension '<file_extension>' found.
- Explanation: This warning indicates that no files matching the specified extension were found in the directory.
- Solution: Check that the
file_extensionis correct and matches the files you intend to load. Ensure that the directory contains files with the specified extension.
