.. _detection_data_class: =============================== The Object Detection Data Class =============================== The DetectionData is a :doc:`data class ` designed for object detection tasks. It is a subclass of the :class:`~deepchecks.vision.VisionData` class and is used to help deepchecks load and interact with object detection data using a well defined format. detection related checks. For more info, please visit the API reference page: :class:`~deepchecks.vision.DetectionData` Accepted Image Format --------------------- All checks in deepchecks require images in the same format. They use the :func:`~deepchecks.vision.VisionData.batch_to_images` function in order to get the images in the correct format. For more info on the accepted formats, please visit the :doc:`VisionData User Guide `. Accepted Label Format --------------------- Deepchecks' checks use the :func:`~deepchecks.vision.DetectionData.batch_to_labels` function in order to get the labels in the correct format. The accepted label format for is a a list of length N containing tensors of shape (B, 5), where N is the number of samples within a batch, B is the number of bounding boxes in the sample and each bounding box is represented by 5 values: ``(class_id, x_min, y_min, w, h)``. x_min and y_min are the coordinates (in pixels) of the **top left corner** of the bounding box, w and h are the width and height of the bounding box (in pixels) and class_id is the class id of the prediction. For example, for a sample with 2 bounding boxes, the label format may be: ``tensor([[1, 8.4, 50.2, 100, 100], [5, 26.4, 10.1, 20, 40]])``. Accepted Prediction Format -------------------------- Deepchecks' checks use the :func:`~deepchecks.vision.DetectionData.infer_on_batch` function in order to get the predictions of the model in the correct format. The accepted prediction format is a list of length N containing tensors of shape (B, 6), where N is the number of images, B is the number of bounding boxes detected in the sample and each bounding box is represented by 6 values: ``[x_min, y_min, w, h, confidence, class_id]``. x_min,y_min,w and h represent the bounding box location as above, confidence is the confidence score given by the model to bounding box and class_id is the class id predicted by the model. For example, for a sample with 2 bounding boxes, the prediction format may be: ``tensor([[8.4, 50.2, 100, 100, 0.9, 1], [26.4, 10.1, 20, 40, 0.8, 5]])``. Example -------- Assuming we have implemented a torch DataLoader whose underlying __getitem__ method returns a tuple of the form: ``(images, bboxes)``. ``image`` is a tensor of shape (N, C, H, W) in which the images pixel values are normalized to [0, 1] range based on the mean and std of the ImageNet dataset. ``bboxes`` is a tensor of shape (N, B, 5) in which each box arrives in the format: ``(class_id, x_min, y_min, x_max, y_max)``. Additionally, we are using Yolo as a model. .. code-block:: python from deepchecks.vision import DetectionData import torch.nn.functional as F import torch import numpy as np class MyDetectionTaskData(DetectionData) """A deepchecks data digestion class for object detection related checks.""" def batch_to_images(self, batch): """Convert a batch of images to a list of PIL images. Parameters ---------- batch : torch.Tensor The batch of images to convert. Returns ------- list A list of PIL images. """ # Assuming batch[0] is a batch of (N, C, H, W) images, we convert it to (N, H, W, C)/ imgs = batch[0].detach().numpy().transpose((0, 2, 3, 1)) # The images are normalized to [0, 1] range based on the mean and std of the ImageNet dataset, so we need to # convert them back to [0, 255] range. mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] imgs = std * imgs + mean imgs = np.clip(imgs, 0, 1) imgs *= 255 return imgs def batch_to_labels(self, batch): """Convert a batch bounding boxes to the required format. Parameters ---------- batch : tuple The batch of data, containing images and bounding boxes. Returns ------- List A list of size N containing tensors of shape (B,5). """ # each bbox in the labels is (class_id, x, y, x, y). convert to (class_id, x, y, w, h) bboxes = [] for bboxes_single_image in batch[1]: formatted_bboxes = [torch.cat((bbox[0], bbox[1:3], bbox[4:] - bbox[1:3]), dim=0) for bbox in bboxes_single_image] if len(formatted_bboxes) != 0: bboxes.append(torch.stack(formatted_bboxes)) return bboxes def infer_on_batch(self, batch, model, device): """Get the predictions of the model on a batch of images. Parameters ---------- batch : tuple The batch of data, containing images and bounding boxes. model : torch.nn.Module The model to use for inference. device : torch.device The device to use for inference. Returns ------- List A list of size N containing tensors of shape (B,6). """ return_list = [] predictions = model.to(device)(batch[0]) # yolo Detections objects have List[torch.Tensor(B,6)] output where each bbox is #(x_min, y_min, x_max, y_max, confidence, class_id). for single_image_tensor in predictions.pred: pred_modified = torch.clone(single_image_tensor) pred_modified[:, 2] = pred_modified[:, 2] - pred_modified[:, 0] pred_modified[:, 3] = pred_modified[:, 3] - pred_modified[:, 1] return_list.append(pred_modified) return return_list # Now, in order to test the class, we can create an instance of it: data = MyDetectionTaskData(your_dataloader) # And validate the implementation: data.validate_format(your_model)