ComfyUI > Nodes > ComfyUI > Image Deduplication

ComfyUI Node: Image Deduplication

Class Name

ImageDeduplication

Category
dataset/image
Author
ComfyAnonymous (Account age: 763days)
Extension
ComfyUI
Latest Updated
2026-05-13
Github Stars
112.77K

How to Install ComfyUI

Install this extension via the ComfyUI Manager by searching for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Image Deduplication Description

Streamline image dataset by identifying and removing duplicate/similar images using perceptual hashing and similarity threshold control.

Image Deduplication:

The Image Deduplication node is designed to streamline your image dataset by identifying and removing duplicate or very similar images. This node leverages perceptual hashing, a technique that generates a compact representation of an image, allowing for efficient comparison of visual content. By setting a similarity threshold, you can control the sensitivity of the deduplication process, ensuring that only images that are nearly identical are flagged as duplicates. This is particularly beneficial for AI artists and developers who work with large datasets, as it helps maintain a clean and diverse collection of images, reducing redundancy and potentially improving the performance of machine learning models trained on these datasets. The node processes the entire dataset as a group, ensuring comprehensive comparison across all images.

Image Deduplication Input Parameters:

similarity_threshold

The similarity_threshold parameter determines the level of similarity required for images to be considered duplicates. It is a float value ranging from 0.0 to 1.0, where a higher value indicates a stricter criterion for similarity. For instance, a threshold of 0.95 means that images with a similarity score of 95% or higher will be considered duplicates and removed from the dataset. The default value is set at 0.95, which is generally suitable for most applications, but you can adjust it based on your specific needs. Lowering the threshold will result in more images being flagged as duplicates, while raising it will make the deduplication process more conservative.

Image Deduplication Output Parameters:

unique_images

The output parameter unique_images provides a list of images that have been filtered to remove duplicates. This list contains only the unique images from the original dataset, ensuring that each image is distinct based on the specified similarity threshold. The deduplication process helps in maintaining a diverse and non-redundant dataset, which is crucial for tasks that require a wide variety of visual inputs. The output is particularly useful for AI artists and developers who need to ensure that their datasets are optimized for training and analysis purposes.

Image Deduplication Usage Tips:

  • Adjust the similarity_threshold based on the diversity of your dataset. For highly varied datasets, a lower threshold might be more appropriate to catch subtle duplicates.
  • Use this node as a preprocessing step before training machine learning models to ensure that your dataset is free from redundant images, which can skew model performance.
  • Regularly run the deduplication process on updated datasets to maintain their quality and relevance.

Image Deduplication Common Errors and Solutions:

"Image list is empty"

  • Explanation: This error occurs when the input list of images is empty, meaning there are no images to process for deduplication.
  • Solution: Ensure that you provide a non-empty list of images to the node. Check the data loading process to confirm that images are being correctly loaded into the list.

"Invalid similarity threshold"

  • Explanation: This error arises when the similarity_threshold is set outside the valid range of 0.0 to 1.0.
  • Solution: Verify that the similarity_threshold is within the specified range. Adjust the value to be between 0.0 and 1.0 to ensure proper functioning of the node.

Image Deduplication Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Image Deduplication