The Object Detection Data Class#
The DetectionData is a data class designed for object detection tasks.
It is a subclass of the VisionData
class and is used to help deepchecks load and interact with object detection data using a well defined format.
detection related checks.
For more info, please visit the API reference page: DetectionData
Accepted Image Format#
All checks in deepchecks require images in the same format. They use the batch_to_images()
function in order to get
the images in the correct format. For more info on the accepted formats, please visit the
VisionData User Guide.
Accepted Label Format#
Deepchecks’ checks use the batch_to_labels()
function in order to get the labels in the correct format.
The accepted label format is a a list of length N containing tensors of shape (B, 5), where N is the number
of samples within a batch, B is the number of bounding boxes in the sample and each bounding box is represented by 5 values:
(class_id, x_min, y_min, w, h)
.
x_min and y_min are the coordinates (in pixels) of the top left corner of the bounding box, w and h are the width and height of the bounding box (in pixels) and class_id is the class id of the prediction.
For example, for a sample with 2 bounding boxes, the label format may be:
tensor([[1, 8.4, 50.2, 100, 100], [5, 26.4, 10.1, 20, 40]])
.
Accepted Prediction Format#
Deepchecks’ checks use the infer_on_batch()
function in order to get the predictions of the model in the correct format.
The accepted prediction format is a list of length N containing tensors of shape (B, 6), where N is the number
of images, B is the number of bounding boxes detected in the sample and each bounding box is represented by 6
values: [x_min, y_min, w, h, confidence, class_id]
.
x_min,y_min,w and h represent the bounding box location as above, confidence is the confidence score given by the model to bounding box and class_id is the class id predicted by the model.
For example, for a sample with 2 bounding boxes, the prediction format may be:
tensor([[8.4, 50.2, 100, 100, 0.9, 1], [26.4, 10.1, 20, 40, 0.8, 5]])
.
Example#
Assuming we have implemented a torch DataLoader whose underlying __getitem__ method returns a tuple of the form:
(images, bboxes)
. images
is a tensor of shape (N, C, H, W) in which the images pixel values are normalized to
[0, 1] range based on the mean and std of the ImageNet dataset. bboxes
is a tensor of shape (N, B, 5) in which
each box arrives in the format: (class_id, x_min, y_min, x_max, y_max)
. Additionally, we are using Yolo as a model.
from deepchecks.vision import DetectionData
import torch.nn.functional as F
import torch
import numpy as np
class MyDetectionTaskData(DetectionData)
"""A deepchecks data digestion class for object detection related checks."""
def batch_to_images(self, batch):
"""Convert a batch of images to a list of PIL images.
Parameters
----------
batch : torch.Tensor
The batch of images to convert.
Returns
-------
list
A list of PIL images.
"""
# Assuming batch[0] is a batch of (N, C, H, W) images, we convert it to (N, H, W, C)/
imgs = batch[0].detach().numpy().transpose((0, 2, 3, 1))
# The images are normalized to [0, 1] range based on the mean and std of the ImageNet dataset, so we need to
# convert them back to [0, 255] range.
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
imgs = std * imgs + mean
imgs = np.clip(imgs, 0, 1)
imgs *= 255
return imgs
def batch_to_labels(self, batch):
"""Convert a batch bounding boxes to the required format.
Parameters
----------
batch : tuple
The batch of data, containing images and bounding boxes.
Returns
-------
List
A list of size N containing tensors of shape (B,5).
"""
# each bbox in the labels is (class_id, x, y, x, y). convert to (class_id, x, y, w, h)
bboxes = []
for bboxes_single_image in batch[1]:
formatted_bboxes = [torch.cat((bbox[0], bbox[1:3], bbox[4:] - bbox[1:3]), dim=0)
for bbox in bboxes_single_image]
if len(formatted_bboxes) != 0:
bboxes.append(torch.stack(formatted_bboxes))
return bboxes
def infer_on_batch(self, batch, model, device):
"""Get the predictions of the model on a batch of images.
Parameters
----------
batch : tuple
The batch of data, containing images and bounding boxes.
model : torch.nn.Module
The model to use for inference.
device : torch.device
The device to use for inference.
Returns
-------
List
A list of size N containing tensors of shape (B,6).
"""
return_list = []
predictions = model.to(device)(batch[0])
# yolo Detections objects have List[torch.Tensor(B,6)] output where each bbox is
#(x_min, y_min, x_max, y_max, confidence, class_id).
for single_image_tensor in predictions.pred:
pred_modified = torch.clone(single_image_tensor)
pred_modified[:, 2] = pred_modified[:, 2] - pred_modified[:, 0]
pred_modified[:, 3] = pred_modified[:, 3] - pred_modified[:, 1]
return_list.append(pred_modified)
return return_list
# Now, in order to test the class, we can create an instance of it:
data = MyDetectionTaskData(your_dataloader)
# And validate the implementation:
data.validate_format(your_model)