The Object Detection Data Class#

The DetectionData is a data class designed for object detection tasks. It is a subclass of the VisionData class and is used to help deepchecks load and interact with object detection data using a well defined format. detection related checks.

For more info, please visit the API reference page: DetectionData

Accepted Image Format#

All checks in deepchecks require images in the same format. They use the batch_to_images() function in order to get the images in the correct format. For more info on the accepted formats, please visit the VisionData User Guide.

Accepted Label Format#

Deepchecks’ checks use the batch_to_labels() function in order to get the labels in the correct format. The accepted label format is a a list of length N containing tensors of shape (B, 5), where N is the number of samples within a batch, B is the number of bounding boxes in the sample and each bounding box is represented by 5 values: (class_id, x_min, y_min, w, h).

x_min and y_min are the coordinates (in pixels) of the top left corner of the bounding box, w and h are the width and height of the bounding box (in pixels) and class_id is the class id of the prediction.

For example, for a sample with 2 bounding boxes, the label format may be: tensor([[1, 8.4, 50.2, 100, 100], [5, 26.4, 10.1, 20, 40]]).

Accepted Prediction Format#

Deepchecks’ checks use the infer_on_batch() function in order to get the predictions of the model in the correct format. The accepted prediction format is a list of length N containing tensors of shape (B, 6), where N is the number of images, B is the number of bounding boxes detected in the sample and each bounding box is represented by 6 values: [x_min, y_min, w, h, confidence, class_id].

x_min,y_min,w and h represent the bounding box location as above, confidence is the confidence score given by the model to bounding box and class_id is the class id predicted by the model.

For example, for a sample with 2 bounding boxes, the prediction format may be: tensor([[8.4, 50.2, 100, 100, 0.9, 1], [26.4, 10.1, 20, 40, 0.8, 5]]).

Example#

Assuming we have implemented a torch DataLoader whose underlying __getitem__ method returns a tuple of the form: (images, bboxes). images is a tensor of shape (N, C, H, W) in which the images pixel values are normalized to [0, 1] range based on the mean and std of the ImageNet dataset. bboxes is a tensor of shape (N, B, 5) in which each box arrives in the format: (class_id, x_min, y_min, x_max, y_max). Additionally, we are using Yolo as a model.

from deepchecks.vision import DetectionData
import torch.nn.functional as F
import torch
import numpy as np

class MyDetectionTaskData(DetectionData)
"""A deepchecks data digestion class for object detection related checks."""

    def batch_to_images(self, batch):
        """Convert a batch of images to a list of PIL images.

        Parameters
        ----------
        batch : torch.Tensor
            The batch of images to convert.

        Returns
        -------
        list
            A list of PIL images.
        """

        # Assuming batch[0] is a batch of (N, C, H, W) images, we convert it to (N, H, W, C)/
        imgs = batch[0].detach().numpy().transpose((0, 2, 3, 1))

        # The images are normalized to [0, 1] range based on the mean and std of the ImageNet dataset, so we need to
        # convert them back to [0, 255] range.
        mean = [0.485, 0.456, 0.406]
        std = [0.229, 0.224, 0.225]
        imgs = std * imgs + mean
        imgs = np.clip(imgs, 0, 1)
        imgs *= 255

        return imgs

    def batch_to_labels(self, batch):
        """Convert a batch bounding boxes to the required format.

        Parameters
        ----------
        batch : tuple
            The batch of data, containing images and bounding boxes.

        Returns
        -------
        List
            A list of size N containing tensors of shape (B,5).
        """

        # each bbox in the labels is (class_id, x, y, x, y). convert to (class_id, x, y, w, h)
        bboxes = []
        for bboxes_single_image in batch[1]:
            formatted_bboxes = [torch.cat((bbox[0], bbox[1:3], bbox[4:] - bbox[1:3]), dim=0)
                                for bbox in bboxes_single_image]
            if len(formatted_bboxes) != 0:
                bboxes.append(torch.stack(formatted_bboxes))
        return bboxes

    def infer_on_batch(self, batch, model, device):
        """Get the predictions of the model on a batch of images.

        Parameters
        ----------
        batch : tuple
            The batch of data, containing images and bounding boxes.
        model : torch.nn.Module
            The model to use for inference.
        device : torch.device
            The device to use for inference.

        Returns
        -------
        List
            A list of size N containing tensors of shape (B,6).
        """

        return_list = []
        predictions = model.to(device)(batch[0])

        # yolo Detections objects have List[torch.Tensor(B,6)] output where each bbox is
        #(x_min, y_min, x_max, y_max, confidence, class_id).
        for single_image_tensor in predictions.pred:
            pred_modified = torch.clone(single_image_tensor)
            pred_modified[:, 2] = pred_modified[:, 2] - pred_modified[:, 0]
            pred_modified[:, 3] = pred_modified[:, 3] - pred_modified[:, 1]
            return_list.append(pred_modified)

        return return_list

# Now, in order to test the class, we can create an instance of it:
data = MyDetectionTaskData(your_dataloader)

# And validate the implementation:
data.validate_format(your_model)