Visit ComfyUI Online for ready-to-use ComfyUI environment
Generate text masks from images using OCR and adaptive thresholding for isolating text regions, ideal for AI artists.
The polymath_text_mask
node is designed to generate a mask from text within an image using Optical Character Recognition (OCR) and adaptive thresholding techniques. This node is particularly useful for AI artists who need to isolate text regions in images for further processing or analysis. By leveraging EasyOCR, the node can detect text in various languages, making it versatile for international applications. The adaptive thresholding method ensures that the mask accurately represents the text areas by adjusting to the image's lighting conditions. This node is essential for tasks that require precise text extraction, such as creating text overlays, enhancing text visibility, or preparing images for text-based machine learning models.
The image
parameter is the input image from which the text mask will be generated. It should be provided in a tensor format that the node can process. The image is converted to grayscale for further processing, ensuring that the text detection is not affected by color variations.
The language_name
parameter specifies the language of the text to be detected in the image. This is crucial for the OCR process, as it determines the language model used by EasyOCR. If the language is not specified, the default is English ('en'
).
The ocr_confidence_threshold
parameter sets the minimum confidence level for text detection by the OCR. Text regions with a confidence score below this threshold will be ignored, ensuring that only reliable text detections are included in the mask. This parameter helps in reducing false positives and can be adjusted based on the quality of the input image.
The use_gpu
parameter indicates whether to utilize GPU acceleration for the OCR process. Enabling this option can significantly speed up text detection, especially for large images or when processing multiple images. However, it requires a compatible GPU and the necessary software setup.
The threshold_block_size
parameter defines the size of the neighborhood area used for adaptive thresholding. It must be an odd number, and if an even number is provided, it will be incremented by one. This parameter affects the sensitivity of the thresholding process, with larger values leading to smoother masks.
The threshold_c
parameter is a constant subtracted from the mean or weighted mean in the adaptive thresholding process. It fine-tunes the thresholding sensitivity, allowing for better control over the mask's accuracy in different lighting conditions.
The output_mask
is the resulting mask generated from the input image, highlighting the detected text regions. This mask is returned as a tensor, which can be used for further processing or analysis in various applications. The mask provides a binary representation of the text areas, making it easy to integrate with other image processing workflows.
ocr_confidence_threshold
to filter out unreliable text detections, especially in noisy images.language_name
parameter to specify the correct language for OCR, as this can significantly impact detection accuracy.use_gpu
if you have a compatible GPU to speed up the OCR process, especially for batch processing.pip install easyocr
. Check your system's GPU compatibility if you are attempting to use GPU acceleration.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.