Test Your Deepchecks Vision Data Class#

Data Classes are used to transform the structure of your data to the structure required for deepchecks. To help ensure they work as intended, deepchecks has built-in helper functions for validating their structure. This guide will demonstrate how to use this helper function to implement them for your own data, step by step.

Structure:

Load data and model#

In the first step we load the DataLoader and our model

from deepchecks.vision.datasets.detection.coco import load_dataset, load_model

data_loader = load_dataset(train=False, batch_size=1000, object_type='DataLoader')
model = load_model()
Downloading https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt to yolov5s.pt...

  0%|          | 0.00/14.1M [00:00<?, ?B/s]
  1%|          | 80.0k/14.1M [00:00<00:18, 803kB/s]
  2%|2         | 352k/14.1M [00:00<00:07, 1.96MB/s]
  7%|7         | 1.00M/14.1M [00:00<00:03, 4.16MB/s]
 20%|##        | 2.83M/14.1M [00:00<00:01, 9.96MB/s]
 39%|###9      | 5.51M/14.1M [00:00<00:00, 15.3MB/s]
 77%|#######6  | 10.8M/14.1M [00:00<00:00, 27.0MB/s]
100%|##########| 14.1M/14.1M [00:00<00:00, 21.5MB/s]

Create simple DetectionData object#

In the second step since this is an object detection task we will override a DetectionData object with simple extracting functions. We know our DataLoader and model’s output are not in the format expected by deepchecks, so when we validate them on our data we will see in the results the functions we overrided are not passing, and then we will implement a correct functions.

from deepchecks.vision.detection_data import DetectionData
import torch


class CocoDetectionData(DetectionData):
    def batch_to_images(self, batch):
        return batch[0]

    def batch_to_labels(self, batch):
        return [torch.round(x) for x in batch[1]]

    def infer_on_batch(self, batch, model, device):
        return model.to(device)(batch[0])

Running the extractors validation#

Now we will load our validate function and see the results while running the extractors on our data. The function will print for us the validation results. At the end, if all your extractors are valid the output should look in this.

from deepchecks.vision.utils.validation import validate_extractors

validate_extractors(CocoDetectionData(data_loader), model)
Deepchecks will try to validate the extractors given...
Structure validation
--------------------
Label formatter: Pass!
Prediction formatter: Check requires detection predictions to be a sequence with an entry for each sample
Image formatter: Fail! The data inside the iterable must be a numpy array.

Content validation
------------------
For validating the content within the structure you have to manually observe the classes, image, label and prediction.
Examples of classes observed in the batch's labels: [[597, 253, 107, 26, 76, 247, 193, 149, 0, 545, 223, 218, 20, 174, 19, 512], [431, 181, 112, 215, 219, 563, 209, 546, 35, 224], [76, 73], [180, 2], [60, 0]]
Visual images & label & prediction: Unable to show due to invalid image formatter.

Understand validation results#

When looking at the result we can see is that it is separated into 2 parts.

First part is about the structure we expect to get. This validation is automatic since it’s purely technical and doesn’t check content correctness. For example, in our validation above we see that the label extractor is passing, meaning the labels are provided in the expected format.

Second part is about the content, which cannot be automatically validated and requires your attention. This part includes looking visually at data outputted by the formatters to validate it is correct. In the validation above we see a list of classes that doesn’t seem to make much sense - it contains class_ids in values ranging from 0 to 596 while in the COCO dataset there are only 80 classes.

For the next step we’ll fix the label extractor and then validate again:

class CocoDetectionData(DetectionData):
    def batch_to_labels(self, batch):
        # Translate labels to deepchecks format.
        # Originally the label_id was at the last position of the tensor while Deepchecks expects it
        # to be at the first position.
        formatted_labels = []
        for tensor in batch[1]:
            tensor = torch.index_select(tensor, 1, torch.LongTensor([4, 0, 1, 2, 3])) if len(tensor) > 0 else tensor
            formatted_labels.append(tensor)
        return formatted_labels

    def batch_to_images(self, batch):
        return batch[0]

    def infer_on_batch(self, batch, model, device):
        return model.to(device)(batch[0])


validate_extractors(CocoDetectionData(data_loader), model)
Deepchecks will try to validate the extractors given...
Structure validation
--------------------
Label formatter: Pass!
Prediction formatter: Check requires detection predictions to be a sequence with an entry for each sample
Image formatter: Fail! The data inside the iterable must be a numpy array.

Content validation
------------------
For validating the content within the structure you have to manually observe the classes, image, label and prediction.
Examples of classes observed in the batch's labels: [[32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 35, 0, 0, 0, 0], [32, 0, 0, 34, 35, 35, 0, 0, 0, 0], [71, 45], [15, 2], [38, 0]]
Visual images & label & prediction: Unable to show due to invalid image formatter.

Now we can see in the content section that our classes are indeed as we expect them to be, values between 0 and 79. Now we can continue and fix the prediction extractor

class CocoDetectionData(DetectionData):
    def infer_on_batch(self, batch, model, device):
        # Convert from yolo Detections object to List (per image) of Tensors of the shape [B, 6]"""
        return_list = []
        predictions = model.to(device)(batch[0])
        for single_image_tensor in predictions.pred:
            return_list.append(single_image_tensor)
        return return_list

    # using the same label extractor
    def batch_to_labels(self, batch):
        formatted_labels = []
        for tensor in batch[1]:
            tensor = torch.index_select(tensor, 1, torch.LongTensor([4, 0, 1, 2, 3])) if len(tensor) > 0 else tensor
            formatted_labels.append(tensor)
        return formatted_labels

    def batch_to_images(self, batch):
        return batch[0]


validate_extractors(CocoDetectionData(data_loader), model)
Deepchecks will try to validate the extractors given...
Structure validation
--------------------
Label formatter: Pass!
Prediction formatter: Pass!
Image formatter: Fail! The data inside the iterable must be a numpy array.

Content validation
------------------
For validating the content within the structure you have to manually observe the classes, image, label and prediction.
Examples of classes observed in the batch's labels: [[32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 35, 0, 0, 0, 0], [32, 0, 0, 34, 35, 35, 0, 0, 0, 0], [71, 45], [15, 2], [38, 0]]
Visual images & label & prediction: Unable to show due to invalid image formatter.

Now our prediction formatter also have valid structure. But in order to really validate it we also need visual assertion and for that we need the image extractor to work.

import numpy as np


class CocoDetectionData(DetectionData):
    def batch_to_images(self, batch):
        # Yolo works on PIL and ImageFormatter expects images as numpy arrays
        return [np.array(x) for x in batch[0]]

    # using the same prediction extractor
    def infer_on_batch(self, batch, model, device):
        # Convert from yolo Detections object to List (per image) of Tensors of the shape [N, 6]"""
        return_list = []
        predictions = model.to(device)(batch[0])
        for single_image_tensor in predictions.pred:
            return_list.append(single_image_tensor)
        return return_list

    # using the same label extractor
    def batch_to_labels(self, batch):
        formatted_labels = []
        for tensor in batch[1]:
            tensor = torch.index_select(tensor, 1, torch.LongTensor([4, 0, 1, 2, 3])) if len(tensor) > 0 else tensor
            formatted_labels.append(tensor)
        return formatted_labels


validate_extractors(CocoDetectionData(data_loader), model)
Deepchecks will try to validate the extractors given...
Structure validation
--------------------
Label formatter: Pass!
Prediction formatter: Pass!
Image formatter: Pass!

Content validation
------------------
For validating the content within the structure you have to manually observe the classes, image, label and prediction.
Examples of classes observed in the batch's labels: [[32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 35, 0, 0, 0, 0], [32, 0, 0, 34, 35, 35, 0, 0, 0, 0], [71, 45], [15, 2], [38, 0]]
Visual images & label & prediction: should open in a new window
*******************************************************************************
This machine does not support GUI
The formatted image was saved in:
/home/runner/work/deepchecks/deepchecks/docs/source/user-guide/vision/tutorials/deepchecks_formatted_image.jpg
Visual examples of an image with prediction and label data. Label is red, prediction is blue, and deepchecks loves you.
validate_extractors can be set to skip the image saving or change the save path
*******************************************************************************

Now that that image extractor is valid it displays for us visually the label and prediction. When we look at the label we see it is correct, but when we look at the bounding box predictions something seems broken.

We need to fix the prediction so the prediction will be returned in [x, y, w, h, confidence, class] format.

class CocoDetectionData(DetectionData):
    def infer_on_batch(self, batch, model, device):
        # Convert from yolo Detections object to List (per image) of Tensors of the shape [N, 6] with each row being
        # [x, y, w, h, confidence, class] for each bbox in the image."""
        return_list = []
        predictions = model.to(device)(batch[0])

        # yolo Detections objects have List[torch.Tensor] xyxy output in .pred
        for single_image_tensor in predictions.pred:
            pred_modified = torch.clone(single_image_tensor)
            pred_modified[:, 2] = pred_modified[:, 2] - pred_modified[:, 0]  # w = x_right - x_left
            pred_modified[:, 3] = pred_modified[:, 3] - pred_modified[:, 1]  # h = y_bottom - y_top
            return_list.append(pred_modified)

        return return_list

    # using the same label extractor
    def batch_to_labels(self, batch):
        formatted_labels = []
        for tensor in batch[1]:
            tensor = torch.index_select(tensor, 1, torch.LongTensor([4, 0, 1, 2, 3])) if len(tensor) > 0 else tensor
            formatted_labels.append(tensor)
        return formatted_labels

    # using the same image extractor
    def batch_to_images(self, batch):
        return [np.array(x) for x in batch[0]]

The end result#

validate_extractors(CocoDetectionData(data_loader), model)
Deepchecks will try to validate the extractors given...
Structure validation
--------------------
Label formatter: Pass!
Prediction formatter: Pass!
Image formatter: Pass!

Content validation
------------------
For validating the content within the structure you have to manually observe the classes, image, label and prediction.
Examples of classes observed in the batch's labels: [[32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34, 35, 0, 0, 0, 0], [32, 0, 0, 34, 35, 35, 0, 0, 0, 0], [71, 45], [15, 2], [38, 0]]
Visual images & label & prediction: should open in a new window
*******************************************************************************
This machine does not support GUI
The formatted image was saved in:
/home/runner/work/deepchecks/deepchecks/docs/source/user-guide/vision/tutorials/deepchecks_formatted_image (1).jpg
Visual examples of an image with prediction and label data. Label is red, prediction is blue, and deepchecks loves you.
validate_extractors can be set to skip the image saving or change the save path
*******************************************************************************

Total running time of the script: ( 1 minutes 24.248 seconds)

Gallery generated by Sphinx-Gallery