Classification Model Validation Tutorial#

In this tutorial, you will learn how to validate your classification model using deepchecks test suites. You can read more about the different checks and suites for computer vision use cases at the examples section

A classification model is usually used to classify an image into one of a number of classes. Although there are multi label use-cases, in which the model is used to classify an image into multiple classes, most use-cases require the model to classify images into a single class. Currently deepchecks supports only single label classification (either binary or multi-class).

Defining the data and model#

import os
import urllib.request
import zipfile

import albumentations as A
import cv2
import matplotlib.pyplot as plt
import numpy as np
import PIL.Image
import torch
# Importing the required packages
import torchvision
from albumentations.pytorch import ToTensorV2
from torch import nn
from torchvision import datasets, models, transforms
from torchvision.datasets import ImageFolder

import deepchecks
from deepchecks.vision.classification_data import ClassificationData

Downloading the dataset#

The data is available from the torch library. We will download and extract it to the current directory.

url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'
urllib.request.urlretrieve(url, 'hymenoptera_data.zip')

with zipfile.ZipFile('hymenoptera_data.zip', 'r') as zip_ref:
    zip_ref.extractall('.')

Load Data#

We will use torchvision and torch.utils.data packages for loading the data. The model we are building will learn to classify ants and bees. We have about 120 training images each for ants and bees. There are 75 validation images for each class. This dataset is a very small subset of imagenet.

class AntsBeesDataset(ImageFolder):
    def __init__(self, *args, **kwargs):
        """
        Overrides initialization method to replace default loader with OpenCV loader
        :param args:
        :param kwargs:
        """
        super(AntsBeesDataset, self).__init__(*args, **kwargs)

    def __getitem__(self, index: int):
        """
        overrides __getitem__ to be compatible to albumentations
        Args:
            index (int): Index
        Returns:
            tuple: (sample, target) where target is class_index of the target class.
        """
        path, target = self.samples[index]
        sample = self.loader(path)
        sample = self.get_cv2_image(sample)
        if self.transforms is not None:
            transformed = self.transforms(image=sample, target=target)
            sample, target = transformed["image"], transformed["target"]
        else:
            if self.transform is not None:
                sample = self.transform(image=sample)['image']
            if self.target_transform is not None:
                target = self.target_transform(target)

        return sample, target

    def get_cv2_image(self, image):
        if isinstance(image, PIL.Image.Image):
            image_np = np.array(image).astype('uint8')
            return image_np
        elif isinstance(image, np.ndarray):
            return image
        else:
            raise RuntimeError("Only PIL.Image and CV2 loaders currently supported!")

# Just normalization for validation
data_transforms = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

data_dir = 'hymenoptera_data'
# Just normalization for validation
data_transforms = A.Compose([
    A.Resize(height=256, width=256),
    A.CenterCrop(height=224, width=224),
    A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ToTensorV2(),
])
train_dataset = AntsBeesDataset(root=os.path.join(data_dir,'train'))
train_dataset.transforms = data_transforms

val_dataset = AntsBeesDataset(root=os.path.join(data_dir,'val'))
val_dataset.transforms = data_transforms

dataloaders = {
    'train':torch.utils.data.DataLoader(train_dataset, batch_size=4,
                                                shuffle=True),
    'val': torch.utils.data.DataLoader(val_dataset, batch_size=4,
                                                shuffle=True)
}

class_names = ['ants', 'bees']

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Visualize a Few Images#

Let’s visualize a few training images so as to understand the data augmentation.

def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])
['bees', 'ants', 'bees', 'bees']Ants and Bees

Downloading a pre-trained model#

Now, we will download a pre-trained model from torchvision, that was trained on the ImageNet dataset.

model = torchvision.models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
# We have only 2 classes
model.fc = nn.Linear(num_ftrs, 2)
model = model.to(device)
_ = model.eval()

Out:

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /home/runner/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth

  0%|          | 0.00/44.7M [00:00<?, ?B/s]
  3%|3         | 1.38M/44.7M [00:00<00:03, 12.7MB/s]
  6%|6         | 2.80M/44.7M [00:00<00:03, 13.9MB/s]
 16%|#6        | 7.23M/44.7M [00:00<00:01, 28.4MB/s]
 28%|##7       | 12.3M/44.7M [00:00<00:00, 38.1MB/s]
 36%|###5      | 16.0M/44.7M [00:00<00:00, 37.6MB/s]
 44%|####4     | 19.8M/44.7M [00:00<00:00, 38.2MB/s]
 53%|#####2    | 23.5M/44.7M [00:00<00:00, 37.9MB/s]
 61%|######    | 27.1M/44.7M [00:00<00:00, 35.0MB/s]
 70%|######9   | 31.1M/44.7M [00:00<00:00, 36.7MB/s]
 78%|#######7  | 34.6M/44.7M [00:01<00:00, 35.4MB/s]
 85%|########5 | 38.0M/44.7M [00:01<00:00, 32.4MB/s]
 92%|#########2| 41.2M/44.7M [00:01<00:00, 29.8MB/s]
 99%|#########8| 44.1M/44.7M [00:01<00:00, 27.8MB/s]
100%|##########| 44.7M/44.7M [00:01<00:00, 31.0MB/s]

Validating the Model with Deepchecks#

Now, after we have the training data, validation data and the model, we can validate the model with deepchecks test suites.

Visualize the data loader and the model outputs#

First we’ll make sure we are familiar with the data loader and the model outputs.

batch = next(iter(dataloaders['train']))

print("Batch type is: ", type(batch))
print("First element is: ", type(batch[0]), "with len of ", len(batch[0]))
print("Example output of an image shape from the dataloader ", batch[0][0].shape)
print("Image values", batch[0][0])
print("-"*80)

print("Second element is: ", type(batch[1]), "with len of ", len(batch[1]))
print("Example output of a label shape from the dataloader ", batch[1][0].shape)
print("Image values", batch[1][0])

Out:

Batch type is:  <class 'list'>
First element is:  <class 'torch.Tensor'> with len of  4
Example output of an image shape from the dataloader  torch.Size([3, 224, 224])
Image values tensor([[[ 0.07406,  0.02269,  0.00557,  ...,  0.65631,  0.77618,  0.86180],
         [ 0.02269, -0.02868, -0.06293,  ...,  0.82755,  0.75905,  0.82755],
         [ 0.03982,  0.07406, -0.04581,  ...,  0.72481,  0.79330,  0.86180],
         ...,
         [-0.49105, -0.38830,  0.15969,  ...,  0.63918,  0.09119,  0.10831],
         [-0.31980,  0.00557, -0.42255,  ...,  0.21106, -0.49105,  0.86180],
         [-0.54243, -0.18281, -0.18281,  ...,  0.46793, -0.66230,  0.89605]],

        [[ 0.22269,  0.22269,  0.25770,  ...,  0.92297,  0.92297,  1.01050],
         [ 0.18768,  0.11765,  0.15266,  ...,  0.94048,  0.95798,  1.01050],
         [ 0.15266,  0.15266,  0.18768,  ...,  0.88796,  0.94048,  1.02801],
         ...,
         [-0.32003, -0.17997,  0.36275,  ...,  0.87045,  0.32773,  0.27521],
         [ 0.01261,  0.22269, -0.39006,  ...,  0.45028, -0.44258,  0.90546],
         [-0.49510, -0.07493, -0.02241,  ...,  0.66036, -0.51261,  1.08053]],

        [[ 0.46135,  0.51364,  0.56593,  ...,  1.28052,  1.26309,  1.36767],
         [ 0.51364,  0.49621,  0.51364,  ...,  1.24566,  1.29795,  1.35024],
         [ 0.44392,  0.49621,  0.56593,  ...,  1.19338,  1.28052,  1.36767],
         ...,
         [ 0.07791,  0.46135,  0.80993,  ...,  1.24566,  0.86222,  0.72279],
         [ 0.42649,  0.26963, -0.42754,  ...,  0.74022, -0.09638,  0.84479],
         [-0.06153,  0.30449,  0.09534,  ...,  1.03651, -0.16610,  1.24566]]])
--------------------------------------------------------------------------------
Second element is:  <class 'torch.Tensor'> with len of  4
Example output of a label shape from the dataloader  torch.Size([])
Image values tensor(0)

Implementing the ClassificationData class#

The first step is to implement a class that enables deepchecks to interact with your model and data. The appropriate class to implement should be selected according to you models task type. In this tutorial, we will implement the classification task type by implementing a class that inherits from the deepchecks.vision.classification_data.ClassificationData class.

# The goal of this class is to make sure the outputs of the model and of the dataloader are in the correct format.
# To learn more about the expected format please visit the API reference for the
# :class:`deepchecks.vision.classification_data.ClassificationData` class.

class AntsBeesData(ClassificationData):

    def __init__(self, *args, **kwargs):
      super().__init__(*args, **kwargs)

    def batch_to_images(self, batch):
        """
        Convert a batch of data to images in the expected format. The expected format is an iterable of cv2 images,
        where each image is a numpy array of shape (height, width, channels). The numbers in the array should be in the
        range [0, 255]
        """
        inp = batch[0].detach().numpy().transpose((0, 2, 3, 1))
        mean = [0.485, 0.456, 0.406]
        std = [0.229, 0.224, 0.225]
        inp = std * inp + mean
        inp = np.clip(inp, 0, 1)
        return inp*255

    def batch_to_labels(self, batch):
        """
        Convert a batch of data to labels in the expected format. The expected format is a tensor of shape (N,),
        where N is the number of samples. Each element is an integer representing the class index.
        """
        return batch[1]

    def infer_on_batch(self, batch, model, device):
        """
        Returns the predictions for a batch of data. The expected format is a tensor of shape (N, n_classes),
        where N is the number of samples. Each element is an array of length n_classes that represent the probability of
        each class.
        """
        logits = model.to(device)(batch[0].to(device))
        return nn.Softmax(dim=1)(logits)

After defining the task class, we can validate it by running the following code:

LABEL_MAP = {
    0: 'ants',
    1: 'bees'
  }
training_data = AntsBeesData(data_loader=dataloaders["train"], label_map=LABEL_MAP)
val_data = AntsBeesData(data_loader=dataloaders["val"], label_map=LABEL_MAP)

training_data.validate_format(model)
val_data.validate_format(model)

Out:

Deepchecks will try to validate the extractors given...
Structure validation
--------------------
Label formatter: Pass!
Prediction formatter: Pass!
Image formatter: Pass!

Content validation
------------------
For validating the content within the structure you have to manually observe the classes, image, label and prediction.
Examples of classes observed in the batch's labels: [[1], [1], [0], [1]]
Visual images & label & prediction: should open in a new window
*******************************************************************************
This machine does not support GUI
The formatted image was saved in:
/home/runner/work/deepchecks/deepchecks/docs/source/user-guide/vision/tutorials/deepchecks_formatted_image (2).jpg
Visual example of an image. Label class 1 Prediction class 1
validate_extractors can be set to skip the image saving or change the save path
*******************************************************************************
Deepchecks will try to validate the extractors given...
Structure validation
--------------------
Label formatter: Pass!
Prediction formatter: Pass!
Image formatter: Pass!

Content validation
------------------
For validating the content within the structure you have to manually observe the classes, image, label and prediction.
Examples of classes observed in the batch's labels: [[0], [1], [0], [1]]
Visual images & label & prediction: should open in a new window
*******************************************************************************
This machine does not support GUI
The formatted image was saved in:
/home/runner/work/deepchecks/deepchecks/docs/source/user-guide/vision/tutorials/deepchecks_formatted_image (3).jpg
Visual example of an image. Label class 0 Prediction class 1
validate_extractors can be set to skip the image saving or change the save path
*******************************************************************************

And observe the output:

Running Deepchecks’ full suite on our data and model!#

Now that we have defined the task class, we can validate the model with the full suite of deepchecks. This can be done with this simple few lines of code:

from deepchecks.vision.suites import full_suite

suite = full_suite()
result = suite.run(training_data, val_data, model, device=device)

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]
Validating Input: 100%|#| 1/1 [00:00<00:00,  4.66 /s]

Ingesting Batches - Train Dataset:   0%|                                                             | 0/61 [00:00<?, ? Batch/s]

Ingesting Batches - Train Dataset:   2%|#                                                            | 1/61 [00:00<00:12,  4.93 Batch/s]

Ingesting Batches - Train Dataset:   3%|##                                                           | 2/61 [00:00<00:11,  4.96 Batch/s]

Ingesting Batches - Train Dataset:   5%|###                                                          | 3/61 [00:00<00:11,  4.91 Batch/s]

Ingesting Batches - Train Dataset:   7%|####                                                         | 4/61 [00:00<00:11,  4.85 Batch/s]

Ingesting Batches - Train Dataset:   8%|#####                                                        | 5/61 [00:01<00:11,  4.92 Batch/s]

Ingesting Batches - Train Dataset:  10%|######                                                       | 6/61 [00:01<00:11,  4.92 Batch/s]

Ingesting Batches - Train Dataset:  11%|#######                                                      | 7/61 [00:01<00:11,  4.88 Batch/s]

Ingesting Batches - Train Dataset:  13%|########                                                     | 8/61 [00:01<00:10,  4.89 Batch/s]

Ingesting Batches - Train Dataset:  15%|#########                                                    | 9/61 [00:01<00:10,  4.89 Batch/s]

Ingesting Batches - Train Dataset:  16%|##########                                                   | 10/61 [00:02<00:10,  4.90 Batch/s]

Ingesting Batches - Train Dataset:  18%|###########                                                  | 11/61 [00:02<00:10,  4.89 Batch/s]

Ingesting Batches - Train Dataset:  20%|############                                                 | 12/61 [00:02<00:09,  4.97 Batch/s]

Ingesting Batches - Train Dataset:  21%|#############                                                | 13/61 [00:02<00:09,  4.93 Batch/s]

Ingesting Batches - Train Dataset:  23%|##############                                               | 14/61 [00:02<00:09,  4.99 Batch/s]

Ingesting Batches - Train Dataset:  25%|###############                                              | 15/61 [00:03<00:08,  5.19 Batch/s]

Ingesting Batches - Train Dataset:  26%|################                                             | 16/61 [00:03<00:08,  5.32 Batch/s]

Ingesting Batches - Train Dataset:  28%|#################                                            | 17/61 [00:03<00:08,  5.41 Batch/s]

Ingesting Batches - Train Dataset:  30%|##################                                           | 18/61 [00:03<00:07,  5.46 Batch/s]

Ingesting Batches - Train Dataset:  31%|###################                                          | 19/61 [00:03<00:07,  5.54 Batch/s]

Ingesting Batches - Train Dataset:  33%|####################                                         | 20/61 [00:03<00:07,  5.29 Batch/s]

Ingesting Batches - Train Dataset:  34%|#####################                                        | 21/61 [00:04<00:07,  5.39 Batch/s]

Ingesting Batches - Train Dataset:  36%|######################                                       | 22/61 [00:04<00:07,  5.50 Batch/s]

Ingesting Batches - Train Dataset:  38%|#######################                                      | 23/61 [00:04<00:06,  5.54 Batch/s]

Ingesting Batches - Train Dataset:  39%|########################                                     | 24/61 [00:04<00:06,  5.57 Batch/s]

Ingesting Batches - Train Dataset:  41%|#########################                                    | 25/61 [00:04<00:06,  5.56 Batch/s]

Ingesting Batches - Train Dataset:  43%|##########################                                   | 26/61 [00:04<00:06,  5.56 Batch/s]

Ingesting Batches - Train Dataset:  44%|###########################                                  | 27/61 [00:05<00:06,  5.51 Batch/s]

Ingesting Batches - Train Dataset:  46%|############################                                 | 28/61 [00:05<00:06,  5.44 Batch/s]

Ingesting Batches - Train Dataset:  48%|#############################                                | 29/61 [00:05<00:05,  5.50 Batch/s]

Ingesting Batches - Train Dataset:  49%|##############################                               | 30/61 [00:05<00:05,  5.54 Batch/s]

Ingesting Batches - Train Dataset:  51%|###############################                              | 31/61 [00:05<00:05,  5.56 Batch/s]

Ingesting Batches - Train Dataset:  52%|################################                             | 32/61 [00:06<00:05,  5.61 Batch/s]

Ingesting Batches - Train Dataset:  54%|#################################                            | 33/61 [00:06<00:04,  5.62 Batch/s]

Ingesting Batches - Train Dataset:  56%|##################################                           | 34/61 [00:06<00:04,  5.63 Batch/s]

Ingesting Batches - Train Dataset:  57%|###################################                          | 35/61 [00:06<00:04,  5.63 Batch/s]

Ingesting Batches - Train Dataset:  59%|####################################                         | 36/61 [00:06<00:04,  5.66 Batch/s]

Ingesting Batches - Train Dataset:  61%|#####################################                        | 37/61 [00:06<00:04,  5.69 Batch/s]

Ingesting Batches - Train Dataset:  62%|######################################                       | 38/61 [00:07<00:04,  5.67 Batch/s]

Ingesting Batches - Train Dataset:  64%|#######################################                      | 39/61 [00:07<00:03,  5.69 Batch/s]

Ingesting Batches - Train Dataset:  66%|########################################                     | 40/61 [00:07<00:03,  5.73 Batch/s]

Ingesting Batches - Train Dataset:  67%|#########################################                    | 41/61 [00:07<00:03,  5.75 Batch/s]

Ingesting Batches - Train Dataset:  69%|##########################################                   | 42/61 [00:07<00:03,  5.77 Batch/s]

Ingesting Batches - Train Dataset:  70%|###########################################                  | 43/61 [00:08<00:03,  5.75 Batch/s]

Ingesting Batches - Train Dataset:  72%|############################################                 | 44/61 [00:08<00:02,  5.71 Batch/s]

Ingesting Batches - Train Dataset:  74%|#############################################                | 45/61 [00:08<00:02,  5.70 Batch/s]

Ingesting Batches - Train Dataset:  75%|##############################################               | 46/61 [00:08<00:02,  5.68 Batch/s]

Ingesting Batches - Train Dataset:  77%|###############################################              | 47/61 [00:08<00:02,  5.66 Batch/s]

Ingesting Batches - Train Dataset:  79%|################################################             | 48/61 [00:08<00:02,  5.63 Batch/s]

Ingesting Batches - Train Dataset:  80%|#################################################            | 49/61 [00:09<00:02,  5.67 Batch/s]

Ingesting Batches - Train Dataset:  82%|##################################################           | 50/61 [00:09<00:01,  5.65 Batch/s]

Ingesting Batches - Train Dataset:  84%|###################################################          | 51/61 [00:09<00:01,  5.65 Batch/s]

Ingesting Batches - Train Dataset:  85%|####################################################         | 52/61 [00:09<00:01,  5.64 Batch/s]

Ingesting Batches - Train Dataset:  87%|#####################################################        | 53/61 [00:09<00:01,  5.64 Batch/s]

Ingesting Batches - Train Dataset:  89%|######################################################       | 54/61 [00:09<00:01,  5.63 Batch/s]

Ingesting Batches - Train Dataset:  90%|#######################################################      | 55/61 [00:10<00:01,  5.62 Batch/s]

Ingesting Batches - Train Dataset:  92%|########################################################     | 56/61 [00:10<00:00,  5.60 Batch/s]

Ingesting Batches - Train Dataset:  93%|#########################################################    | 57/61 [00:10<00:00,  5.57 Batch/s]

Ingesting Batches - Train Dataset:  95%|##########################################################   | 58/61 [00:10<00:00,  5.58 Batch/s]

Ingesting Batches - Train Dataset:  97%|###########################################################  | 59/61 [00:10<00:00,  5.58 Batch/s]

Ingesting Batches - Train Dataset:  98%|############################################################ | 60/61 [00:11<00:00,  5.57 Batch/s]

Ingesting Batches - Train Dataset: 100%|#############################################################| 61/61 [00:11<00:00,  5.56 Batch/s]


Computing Single Dataset Checks - Train Dataset:   0%|      | 0/6 [00:00<?, ? Check/s]


Computing Single Dataset Checks - Train Dataset:   0%|      | 0/6 [00:00<?, ? Check/s, Check=Mean Average Precision Report]


Computing Single Dataset Checks - Train Dataset:  17%|#     | 1/6 [00:00<00:00, 3872.86 Check/s, Check=Mean Average Recall Report]


Computing Single Dataset Checks - Train Dataset:  33%|##    | 2/6 [00:00<00:00, 4586.45 Check/s, Check=Confusion Matrix Report]


Computing Single Dataset Checks - Train Dataset:  50%|###   | 3/6 [00:00<00:00, 296.69 Check/s, Check=Image Segment Performance]


Computing Single Dataset Checks - Train Dataset:  67%|####  | 4/6 [00:00<00:00,  8.70 Check/s, Check=Image Segment Performance]


Computing Single Dataset Checks - Train Dataset:  67%|####  | 4/6 [00:00<00:00,  8.70 Check/s, Check=Image Property Outliers]


Computing Single Dataset Checks - Train Dataset:  83%|##### | 5/6 [00:00<00:00,  4.46 Check/s, Check=Image Property Outliers]


Computing Single Dataset Checks - Train Dataset:  83%|##### | 5/6 [00:00<00:00,  4.46 Check/s, Check=Label Property Outliers]



Ingesting Batches - Test Dataset:   0%|                                       | 0/39 [00:00<?, ? Batch/s]



Ingesting Batches - Test Dataset:   3%|#                                      | 1/39 [00:00<00:07,  5.28 Batch/s]



Ingesting Batches - Test Dataset:   5%|##                                     | 2/39 [00:00<00:06,  5.48 Batch/s]



Ingesting Batches - Test Dataset:   8%|###                                    | 3/39 [00:00<00:06,  5.52 Batch/s]



Ingesting Batches - Test Dataset:  10%|####                                   | 4/39 [00:00<00:06,  5.48 Batch/s]



Ingesting Batches - Test Dataset:  13%|#####                                  | 5/39 [00:00<00:06,  5.49 Batch/s]



Ingesting Batches - Test Dataset:  15%|######                                 | 6/39 [00:01<00:05,  5.51 Batch/s]



Ingesting Batches - Test Dataset:  18%|#######                                | 7/39 [00:01<00:05,  5.57 Batch/s]



Ingesting Batches - Test Dataset:  21%|########                               | 8/39 [00:01<00:05,  5.52 Batch/s]



Ingesting Batches - Test Dataset:  23%|#########                              | 9/39 [00:01<00:05,  5.54 Batch/s]



Ingesting Batches - Test Dataset:  26%|##########                             | 10/39 [00:01<00:05,  5.55 Batch/s]



Ingesting Batches - Test Dataset:  28%|###########                            | 11/39 [00:02<00:05,  5.03 Batch/s]



Ingesting Batches - Test Dataset:  31%|############                           | 12/39 [00:02<00:05,  5.17 Batch/s]



Ingesting Batches - Test Dataset:  33%|#############                          | 13/39 [00:02<00:04,  5.29 Batch/s]



Ingesting Batches - Test Dataset:  36%|##############                         | 14/39 [00:02<00:04,  5.37 Batch/s]



Ingesting Batches - Test Dataset:  38%|###############                        | 15/39 [00:02<00:04,  5.45 Batch/s]



Ingesting Batches - Test Dataset:  41%|################                       | 16/39 [00:02<00:04,  5.49 Batch/s]



Ingesting Batches - Test Dataset:  44%|#################                      | 17/39 [00:03<00:03,  5.51 Batch/s]



Ingesting Batches - Test Dataset:  46%|##################                     | 18/39 [00:03<00:03,  5.51 Batch/s]



Ingesting Batches - Test Dataset:  49%|###################                    | 19/39 [00:03<00:03,  5.49 Batch/s]



Ingesting Batches - Test Dataset:  51%|####################                   | 20/39 [00:03<00:03,  5.47 Batch/s]



Ingesting Batches - Test Dataset:  54%|#####################                  | 21/39 [00:03<00:03,  5.40 Batch/s]



Ingesting Batches - Test Dataset:  56%|######################                 | 22/39 [00:04<00:03,  5.39 Batch/s]



Ingesting Batches - Test Dataset:  59%|#######################                | 23/39 [00:04<00:02,  5.36 Batch/s]



Ingesting Batches - Test Dataset:  62%|########################               | 24/39 [00:04<00:02,  5.41 Batch/s]



Ingesting Batches - Test Dataset:  64%|#########################              | 25/39 [00:04<00:02,  5.41 Batch/s]



Ingesting Batches - Test Dataset:  67%|##########################             | 26/39 [00:04<00:02,  5.45 Batch/s]



Ingesting Batches - Test Dataset:  69%|###########################            | 27/39 [00:04<00:02,  5.47 Batch/s]



Ingesting Batches - Test Dataset:  72%|############################           | 28/39 [00:05<00:01,  5.51 Batch/s]



Ingesting Batches - Test Dataset:  74%|#############################          | 29/39 [00:05<00:01,  5.50 Batch/s]



Ingesting Batches - Test Dataset:  77%|##############################         | 30/39 [00:05<00:01,  5.53 Batch/s]



Ingesting Batches - Test Dataset:  79%|##############################9        | 31/39 [00:05<00:01,  5.52 Batch/s]



Ingesting Batches - Test Dataset:  82%|################################       | 32/39 [00:05<00:01,  5.49 Batch/s]



Ingesting Batches - Test Dataset:  85%|#################################      | 33/39 [00:06<00:01,  5.49 Batch/s]



Ingesting Batches - Test Dataset:  87%|##################################     | 34/39 [00:06<00:00,  5.52 Batch/s]



Ingesting Batches - Test Dataset:  90%|###################################    | 35/39 [00:06<00:00,  5.52 Batch/s]



Ingesting Batches - Test Dataset:  92%|####################################   | 36/39 [00:06<00:00,  5.51 Batch/s]



Ingesting Batches - Test Dataset:  95%|#####################################  | 37/39 [00:06<00:00,  5.50 Batch/s]



Ingesting Batches - Test Dataset:  97%|###################################### | 38/39 [00:06<00:00,  5.48 Batch/s]




Computing Single Dataset Checks - Test Dataset:   0%|      | 0/6 [00:00<?, ? Check/s]




Computing Single Dataset Checks - Test Dataset:   0%|      | 0/6 [00:00<?, ? Check/s, Check=Mean Average Precision Report]




Computing Single Dataset Checks - Test Dataset:  17%|#     | 1/6 [00:00<00:00, 3158.36 Check/s, Check=Mean Average Recall Report]




Computing Single Dataset Checks - Test Dataset:  33%|##    | 2/6 [00:00<00:00, 3833.92 Check/s, Check=Confusion Matrix Report]




Computing Single Dataset Checks - Test Dataset:  50%|###   | 3/6 [00:00<00:00, 326.52 Check/s, Check=Image Segment Performance]




Computing Single Dataset Checks - Test Dataset:  67%|####  | 4/6 [00:00<00:00, 11.50 Check/s, Check=Image Segment Performance]




Computing Single Dataset Checks - Test Dataset:  67%|####  | 4/6 [00:00<00:00, 11.50 Check/s, Check=Image Property Outliers]




Computing Single Dataset Checks - Test Dataset:  83%|##### | 5/6 [00:00<00:00, 11.50 Check/s, Check=Label Property Outliers]




Computing Single Dataset Checks - Test Dataset: 100%|######| 6/6 [00:00<00:00,  8.63 Check/s, Check=Label Property Outliers]





Computing Checks:   0%|           | 0/11 [00:00<?, ? Check/s]





Computing Checks:   0%|           | 0/11 [00:00<?, ? Check/s, Check=Class Performance]





Computing Checks:   9%|#          | 1/11 [00:00<00:00, 14.22 Check/s, Check=Simple Model Comparison]





Computing Checks:  18%|##         | 2/11 [00:00<00:00, 15.62 Check/s, Check=Simple Model Comparison]





Computing Checks:  18%|##         | 2/11 [00:00<00:00, 15.62 Check/s, Check=Model Error Analysis]





Computing Checks:  27%|###        | 3/11 [00:00<00:00, 15.62 Check/s, Check=Similar Image Leakage]





Computing Checks:  36%|####       | 4/11 [00:00<00:00,  8.78 Check/s, Check=Similar Image Leakage]





Computing Checks:  36%|####       | 4/11 [00:00<00:00,  8.78 Check/s, Check=Heatmap Comparison]





Computing Checks:  45%|#####      | 5/11 [00:00<00:00,  8.78 Check/s, Check=Train Test Label Drift]





Computing Checks:  55%|######     | 6/11 [00:00<00:00,  8.78 Check/s, Check=Train Test Prediction Drift]





Computing Checks:  64%|#######    | 7/11 [00:00<00:00,  8.78 Check/s, Check=Image Property Drift]





Computing Checks:  73%|########   | 8/11 [00:00<00:00, 11.02 Check/s, Check=Image Property Drift]





Computing Checks:  73%|########   | 8/11 [00:00<00:00, 11.02 Check/s, Check=Image Dataset Drift] Calculating permutation feature importance. Expected to finish in 1 seconds






Computing Checks:  82%|#########  | 9/11 [00:00<00:00, 11.02 Check/s, Check=Simple Feature Contribution]





Computing Checks:  91%|########## | 10/11 [00:01<00:00,  8.64 Check/s, Check=Simple Feature Contribution]





Computing Checks:  91%|########## | 10/11 [00:01<00:00,  8.64 Check/s, Check=New Labels]

Observing the results:#

The results can be saved as a html file with the following code:

result.save_as_html('output.html')

Out:

'output (1).html'

Or, if working inside a notebook, the output can be displayed directly by simply printing the result object:

result

Full Suite

The suite is composed of various checks such as: Image Dataset Drift, Similar Image Leakage, Image Property Drift, etc...
Each check may contain conditions (which will result in pass / fail / warning / error , represented by / / ! / ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites.


Conditions Summary

Status Check Condition More Info
Simple Model Comparison Model performance gain over simple model is not less than 10% Found metrics with gain below threshold: {'F1': {0: '-17.38%'}}
Image Segment Performance - Test Dataset No segment with ratio between score to mean less than 80% Properties with failed segments: Brightness: {'Range': '[0.51, 0.57)', 'Metric': 'Precision', 'Ratio': 0.76}
Similar Image Leakage Number of similar images between train and test is not greater than 0 Number of similar images between train and test datasets: 1
Class Performance Train-Test scores relative degradation is not greater than 0.1
Train Test Prediction Drift PSI <= 0.15 and Earth Mover's Distance <= 0.075 for prediction drift
Image Property Drift Earth Mover's Distance <= 0.1 for image properties drift
Simple Feature Contribution Train-Test properties' Predictive Power Score difference is not greater than 0.2
New Labels Percentage of new labels in the test set not above 0.5%.
Image Segment Performance - Train Dataset No segment with ratio between score to mean less than 80%
Train Test Label Drift PSI <= 0.15 and Earth Mover's Distance <= 0.075 for label drift

Check With Conditions Output

Class Performance

Summarize given metrics on a dataset and model.

Conditions Summary
Status Condition More Info
Train-Test scores relative degradation is not greater than 0.1
Additional Outputs

Go to top

Train Test Prediction Drift

Calculate prediction drift between train dataset and test dataset, using statistical measures.

Conditions Summary
Status Condition More Info
PSI <= 0.15 and Earth Mover's Distance <= 0.075 for prediction drift
Additional Outputs
The Drift score is a measure for the difference between two distributions. In this check, drift is measured for the distribution of the following prediction properties: ['Samples Per Class'].

Go to top

Image Property Drift

Calculate drift between train dataset and test dataset per image property, using statistical measures.

Conditions Summary
Status Condition More Info
Earth Mover's Distance <= 0.1 for image properties drift
Additional Outputs
The Drift score is a measure for the difference between two distributions. In this check, drift is measured for the distribution of the following image properties: ['Area', 'Aspect Ratio', 'Brightness', 'Mean Blue Relative Intensity', 'Mean Green Relative Intensity', 'Mean Red Relative Intensity', 'RMS Contrast'].

Go to top

Simple Feature Contribution

Return the Predictive Power Score of image properties, in order to estimate their ability to predict the label.

Conditions Summary
Status Condition More Info
Train-Test properties' Predictive Power Score difference is not greater than 0.2
Additional Outputs
The Predictive Power Score (PPS) is used to estimate the ability of an image property (such as brightness)to predict the label by itself. (Read more about Predictive Power Score)
In the graph above, we should suspect we have problems in our data if:
1. Train dataset PPS values are high:
A high PPS (close to 1) can mean that there's a bias in the dataset, as a single property can predict the label successfully, using simple classic ML algorithms
2. Large difference between train and test PPS (train PPS is larger):
An even more powerful indication of dataset bias, as an image property that was powerful in train
but not in test can be explained by bias in train that is not relevant to a new dataset.
3. Large difference between test and train PPS (test PPS is larger):
An anomalous value, could indicate drift in test dataset that caused a coincidental correlation to the target label.

Go to top

Simple Model Comparison

Compare given model score to simple model score (according to given model type).

Conditions Summary
Status Condition More Info
Model performance gain over simple model is not less than 10% Found metrics with gain below threshold: {'F1': {0: '-17.38%'}}
Additional Outputs

Go to top

Image Segment Performance - Test Dataset

Segment the data by various properties of the image, and compare the performance of the segments.

Conditions Summary
Status Condition More Info
No segment with ratio between score to mean less than 80% Properties with failed segments: Brightness: {'Range': '[0.51, 0.57)', 'Metric': 'Precision', 'Ratio': 0.76}
Additional Outputs

Go to top

Image Segment Performance - Train Dataset

Segment the data by various properties of the image, and compare the performance of the segments.

Conditions Summary
Status Condition More Info
No segment with ratio between score to mean less than 80%
Additional Outputs

Go to top

Similar Image Leakage

Check for images in training that are similar to images in test.

Conditions Summary
Status Condition More Info
Number of similar images between train and test is not greater than 0 Number of similar images between train and test datasets: 1
Additional Outputs

Similar Images

Total number of test samples with similar images in train: 1

Samples

Train
Test

Go to top

Train Test Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures.

Conditions Summary
Status Condition More Info
PSI <= 0.15 and Earth Mover's Distance <= 0.075 for label drift
Additional Outputs
The Drift score is a measure for the difference between two distributions. In this check, drift is measured for the distribution of the following label properties: ['Samples Per Class'].

Go to top

Check Without Conditions Output

Image Property Outliers - Test Dataset

Find outliers images with respect to the given properties.

Additional Outputs

Property "Aspect Ratio"

No outliers found.

Property "Area"

No outliers found.

Property "Brightness"

Total number of outliers: 1
Non-outliers range: 0.2 to 0.74
Brightness
0.19
Image

Property "RMS Contrast"

No outliers found.

Property "Mean Red Relative Intensity"

Total number of outliers: 8
Non-outliers range: 0.22 to 0.55
Mean Red Relative Intensity
0.15
0.21
0.61
0.63
0.66
0.67
0.7
Image