SimilarImageLeakage#

class SimilarImageLeakage[source]#

Check for images in training that are similar to images in test.

Parameters
n_top_show: int, default: 5

Number of images to show, sorted by the similarity score between them

hash_size: int, default: 8

Size of hashed image. Algorithm will hash the image to a hash_size*hash_size binary image. Increasing this value will increase the accuracy of the algorithm, but will also increase the time and memory requirements.

similarity_threshold: float, default: 0.1

Similarity threshold (0,1). The similarity score defines what is the ratio of pixels that are different between the two images. If the similarity score is below the threshold, the images are considered similar. Note: The threshold is defined such that setting it to 1 will result in similarity being detected for all images with up to half their pixels differing from each other. For a value of 1, random images (which on average differ from each other by half their pixels) will be detected as similar half the time. To further illustrate, for a hash of 8X8, setting the score to 1 will result with all images with up to 32 different pixels being considered similar.

__init__(n_top_show: int = 10, hash_size: int = 8, similarity_threshold: float = 0.1, **kwargs)[source]#
__new__(*args, **kwargs)#

Methods

SimilarImageLeakage.add_condition(name, ...)

Add new condition function to the check.

SimilarImageLeakage.add_condition_similar_images_less_or_equal([...])

Add condition - number of similar images is less or equal to the threshold.

SimilarImageLeakage.clean_conditions()

Remove all conditions from this check instance.

SimilarImageLeakage.compute(context)

Find similar images by comparing image hashes between train and test.

SimilarImageLeakage.conditions_decision(result)

Run conditions on given result.

SimilarImageLeakage.config([...])

Return check configuration (conditions' configuration not yet supported).

SimilarImageLeakage.from_config(conf[, ...])

Return check object from a CheckConfig object.

SimilarImageLeakage.from_json(conf[, ...])

Deserialize check instance from JSON string.

SimilarImageLeakage.initialize_run(context)

Initialize the run by initializing the lists of image hashes.

SimilarImageLeakage.metadata([with_doc_link])

Return check metadata.

SimilarImageLeakage.name()

Name of class in split camel case.

SimilarImageLeakage.params([show_defaults])

Return parameters to show when printing the check.

SimilarImageLeakage.remove_condition(index)

Remove given condition by index.

SimilarImageLeakage.run(train_dataset, ...)

Run check.

SimilarImageLeakage.to_json([indent, ...])

Serialize check instance to JSON string.

SimilarImageLeakage.update(context, batch, ...)

Calculate image hashes for train and test.

Examples#