Text Cleanup:
The TextCleanup node is designed to streamline and enhance text data by applying a series of customizable cleaning operations. Its primary purpose is to prepare text for further processing or analysis by removing unwanted characters, normalizing formats, and ensuring consistency. This node is particularly beneficial for cleaning up text generated by language models, where inconsistencies and unwanted formatting can occur. By offering a range of operations such as trimming whitespace, converting unicode characters to ASCII, and normalizing text case, the TextCleanup node ensures that your text data is clean, consistent, and ready for downstream tasks. This node is essential for anyone looking to maintain high-quality text data, especially in applications involving AI-generated content.
Text Cleanup Input Parameters:
text
This parameter represents the input text string that you want to clean. It is a required parameter and must be provided for the node to function. The text can be any string that requires cleaning and normalization.
operations
This parameter specifies the cleaning operations to be applied to the text. It is a string of comma-separated operation names, such as trim, unicode, newlines, etc. The default value is trim,unicode. Each operation performs a specific cleaning task, such as removing whitespace or converting unicode characters to ASCII. Understanding the impact of each operation is crucial for achieving the desired text output.
custom_replacements
This optional parameter allows you to define custom text replacements. It is a string where you can specify pairs of text to be replaced and their replacements. This parameter is useful for applying specific text transformations that are not covered by the standard operations.
max_length
This optional parameter sets the maximum length for the cleaned text. It is an integer value with a default of 0, which means no length restriction. If a maximum length is specified, the text will be truncated to this length after all cleaning operations are applied. This is useful for ensuring that the text does not exceed a certain size, which can be important for certain applications or systems.
Text Cleanup Output Parameters:
cleaned_text
This output parameter provides the text after all specified cleaning operations have been applied. It is the main result of the node, representing the cleaned and normalized version of the input text.
original_text
This output parameter returns the original input text before any cleaning operations were applied. It allows you to compare the cleaned text with the original to understand the changes made.
char_count
This output parameter indicates the number of characters in the cleaned text. It provides a quick way to assess the length of the cleaned text, which can be useful for ensuring it meets any length requirements.
operations_applied
This output parameter lists the operations that were actually applied to the text. It provides a summary of the cleaning process, allowing you to verify which operations were executed and understand their impact on the text.
Text Cleanup Usage Tips:
- Use the
operationsparameter to tailor the cleaning process to your specific needs. For example, if you need to remove all newlines, includenewlinesin the operations list. - If you have specific text patterns that need to be replaced, utilize the
custom_replacementsparameter to define these transformations. - Set the
max_lengthparameter if you need to ensure that the cleaned text does not exceed a certain number of characters, which can be important for systems with input size limitations.
Text Cleanup Common Errors and Solutions:
InvalidOperationError
- Explanation: This error occurs when an unrecognized operation is specified in the
operationsparameter. - Solution: Ensure that all operations listed in the
operationsparameter are valid and supported by the node. Refer to the documentation for a list of supported operations.
TypeError: text must be a string
- Explanation: This error occurs when the input
textis not provided as a string. - Solution: Ensure that the input text is a valid string. If you are passing a different data type, convert it to a string before using the node.
