.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "user-guide/vision/auto_tutorials/plot_detection_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_user-guide_vision_auto_tutorials_plot_detection_tutorial.py: .. _vision_detection_tutorial: ========================== Object Detection Tutorial ========================== In this tutorial, you will learn how to validate your **object detection model** using deepchecks test suites. You can read more about the different checks and suites for computer vision use cases at the :doc:`examples section `. If you just want to see the output of this tutorial, jump to the :ref:`observing the results ` section. An object detection tasks usually consist of two parts: - Object Localization, where the model predicts the location of an object in the image, - Object Classification, where the model predicts the class of the detected object. The common output of an object detection model is a list of bounding boxes around the objects, and their classes. .. code-block:: bash # Before we start, if you don't have deepchecks vision package installed yet, run: import sys !{sys.executable} -m pip install "deepchecks[vision]" --quiet --upgrade # --user # or install using pip from your python environment .. GENERATED FROM PYTHON SOURCE LINES 32-49 Defining the data and model =========================== .. note:: In this tutorial, we use the pytorch to create the dataset and model. To see how this can be done using tensorflow or other frameworks, please visit the :ref:`creating VisionData guide `. Load Data ~~~~~~~~~ The model in this tutorial is used to detect tomatoes in images. The model is trained on a dataset consisted of 895 images of tomatoes, with bounding box annotations provided in PASCAL VOC format. All annotations belong to a single class: tomato. .. note:: The dataset is available at the following link: https://www.kaggle.com/andrewmvd/tomato-detection We thank the authors of the dataset for providing the dataset. .. GENERATED FROM PYTHON SOURCE LINES 49-126 .. code-block:: default import os import numpy as np import torch from torch.utils.data import DataLoader, Dataset import albumentations as A from albumentations.pytorch import ToTensorV2 from PIL import Image import xml.etree.ElementTree as ET import urllib.request import zipfile url = 'https://figshare.com/ndownloader/files/34488599' urllib.request.urlretrieve(url, 'tomato-detection.zip') with zipfile.ZipFile('tomato-detection.zip', 'r') as zip_ref: zip_ref.extractall('.') class TomatoDataset(Dataset): def __init__(self, root, transforms): self.root = root self.transforms = transforms self.images = list(sorted(os.listdir(os.path.join(root, 'images')))) self.annotations = list(sorted(os.listdir(os.path.join(root, 'annotations')))) def __getitem__(self, idx): img_path = os.path.join(self.root, "images", self.images[idx]) ann_path = os.path.join(self.root, "annotations", self.annotations[idx]) img = Image.open(img_path).convert("RGB") bboxes, labels = [], [] with open(ann_path, 'r') as f: root = ET.parse(f).getroot() for obj in root.iter('object'): difficult = obj.find('difficult').text if int(difficult) == 1: continue cls_id = 1 xmlbox = obj.find('bndbox') b = [float(xmlbox.find('xmin').text), float(xmlbox.find('ymin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymax').text)] bboxes.append(b) labels.append(cls_id) bboxes = torch.as_tensor(np.array(bboxes), dtype=torch.float32) labels = torch.as_tensor(np.array(labels), dtype=torch.int64) if self.transforms is not None: res = self.transforms(image=np.array(img), bboxes=bboxes, class_labels=labels) target = { 'boxes': [torch.Tensor(x) for x in res['bboxes']], 'labels': res['class_labels'] } img = res['image'] return img, target def __len__(self): return len(self.images) data_transforms = A.Compose([ A.Resize(height=256, width=256), A.CenterCrop(height=224, width=224), A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ToTensorV2(), ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['class_labels'])) dataset = TomatoDataset(root=os.path.join(os.path.curdir, 'tomato-detection/data'), transforms=data_transforms) train_dataset, test_dataset = torch.utils.data.random_split(dataset, [int(len(dataset)*0.9), len(dataset)-int(len(dataset)*0.9)], generator=torch.Generator().manual_seed(42)) test_dataset.transforms = A.Compose([ToTensorV2()]) .. GENERATED FROM PYTHON SOURCE LINES 127-130 Visualize the dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's see how our data looks like. .. GENERATED FROM PYTHON SOURCE LINES 130-137 .. code-block:: default print(f'Number of training images: {len(train_dataset)}') print(f'Number of test images: {len(test_dataset)}') print(f'Example output of an image shape: {train_dataset[0][0].shape}') print(f'Example output of a label: {train_dataset[0][1]}') .. rst-class:: sphx-glr-script-out .. code-block:: none Number of training images: 805 Number of test images: 90 Example output of an image shape: torch.Size([3, 224, 224]) Example output of a label: {'boxes': [tensor([ 0.00000, 75.13600, 39.68000, 165.75999]), tensor([ 0.00000, 0.00000, 94.08000, 93.56800])], 'labels': [tensor(1), tensor(1)]} .. GENERATED FROM PYTHON SOURCE LINES 138-146 Downloading a Pre-trained Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this tutorial, we will download a pre-trained SSDlite model and a MobileNetV3 Large backbone from the official PyTorch repository. For more details, please refer to the `official documentation `_. After downloading the model, we will fine-tune it for our particular classes. We will do it by replacing the pre-trained head with a new one that matches our needs. .. GENERATED FROM PYTHON SOURCE LINES 146-163 .. code-block:: default from functools import partial from torch import nn import torchvision from torchvision.models.detection import _utils as det_utils from torchvision.models.detection.ssdlite import SSDLiteClassificationHead device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") model = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True) in_channels = det_utils.retrieve_out_channels(model.backbone, (320, 320)) num_anchors = model.anchor_generator.num_anchors_per_location() norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.03) model.head.classification_head = SSDLiteClassificationHead(in_channels, num_anchors, 2, norm_layer) _ = model.to(device) .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading: "https://download.pytorch.org/models/ssdlite320_mobilenet_v3_large_coco-a79551df.pth" to /home/runner/.cache/torch/hub/checkpoints/ssdlite320_mobilenet_v3_large_coco-a79551df.pth 0%| | 0.00/13.4M [00:00` For pytorch, we will use our DataLoader, but we'll create a new collate function for it, that transforms the batch to the correct format. Then, we'll create a :class:`deepchecks.vision.vision_data.vision_data.VisionData` object, that will hold the data loader. To learn more about the expected format please visit :doc:`supported tasks and formats guide `. First, we will create some functions that transform our batch to the correct format of images, labels and predictions: .. GENERATED FROM PYTHON SOURCE LINES 190-257 .. code-block:: default def get_untransformed_images(original_images): """ Convert a batch of data to images in the expected format. The expected format is an iterable of images, where each image is a numpy array of shape (height, width, channels). The numbers in the array should be in the range [0, 255] in a uint8 format. """ inp = torch.stack(list(original_images)).cpu().detach().numpy().transpose((0, 2, 3, 1)) mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] # Un-normalize the images inp = std * inp + mean inp = np.clip(inp, 0, 1) return inp * 255 def transform_labels_to_cxywh(original_labels): """ Convert a batch of data to labels in the expected format. The expected format is an iterator of arrays, each array corresponding to a sample. Each array element is in a shape of [B, 5], where B is the number of bboxes in the image, and each bounding box is in the structure of [class_id, x, y, w, h]. """ label = [] for annotation in original_labels: if len(annotation["boxes"]): bbox = torch.stack(annotation["boxes"]) # Convert the Pascal VOC xyxy format to xywh format bbox[:, 2:] = bbox[:, 2:] - bbox[:, :2] # The label shape is [class_id, x, y, w, h] label.append( torch.concat([torch.stack(annotation["labels"]).reshape((-1, 1)), bbox], dim=1) ) else: # If it's an empty image, we need to add an empty label label.append(torch.tensor([])) return label def infer_on_images(original_images): """ Returns the predictions for a batch of data. The expected format is an iterator of arrays, each array corresponding to a sample. Each array element is in a shape of [B, 6], where B is the number of bboxes in the predictions, and each bounding box is in the structure of [x, y, w, h, score, class_id]. Note that model and device here are global variables, and are defined in the previous code block, as the collate function cannot recieve other arguments than the batch. """ nm_thrs = 0.2 score_thrs = 0.7 imgs = list(img.to(device) for img in original_images) # Getting the predictions of the model on the batch with torch.no_grad(): preds = model(imgs) processed_pred = [] for pred in preds: # Performoing non-maximum suppression on the detections keep_boxes = torchvision.ops.nms(pred['boxes'], pred['scores'], nm_thrs) score_filter = pred['scores'][keep_boxes] > score_thrs # get the filtered result test_boxes = pred['boxes'][keep_boxes][score_filter].reshape((-1, 4)) test_boxes[:, 2:] = test_boxes[:, 2:] - test_boxes[:, :2] # xyxy to xywh test_labels = pred['labels'][keep_boxes][score_filter] test_scores = pred['scores'][keep_boxes][score_filter] processed_pred.append( torch.concat([test_boxes, test_scores.reshape((-1, 1)), test_labels.reshape((-1, 1))], dim=1)) return processed_pred .. GENERATED FROM PYTHON SOURCE LINES 258-261 Now we'll create the collate function that will be used by the DataLoader. In pytorch, the collate function is used to transform the output batch to any custom format, and we'll use that in order to transform the batch to the correct format for the checks. .. GENERATED FROM PYTHON SOURCE LINES 261-273 .. code-block:: default from deepchecks.vision.vision_data import BatchOutputFormat def deepchecks_collate_fn(batch) -> BatchOutputFormat: """Return a batch of images, labels and predictions in the deepchecks format.""" # batch received as iterable of tuples of (image, label) and transformed to tuple of iterables of images and labels: batch = tuple(zip(*batch)) images = get_untransformed_images(batch[0]) labels = transform_labels_to_cxywh(batch[1]) predictions = infer_on_images(batch[0]) return BatchOutputFormat(images=images, labels=labels, predictions=predictions) .. GENERATED FROM PYTHON SOURCE LINES 274-276 We have a single label here, which is the tomato class The label_map is a dictionary that maps the class id to the class name, for display purposes. .. GENERATED FROM PYTHON SOURCE LINES 276-281 .. code-block:: default LABEL_MAP = { 1: 'Tomato' } .. GENERATED FROM PYTHON SOURCE LINES 282-284 Now that we have our updated collate function, we can recreate the dataloader in the deepchecks format, and use it to create a VisionData object: .. GENERATED FROM PYTHON SOURCE LINES 284-293 .. code-block:: default from deepchecks.vision.vision_data import VisionData train_loader = DataLoader(train_dataset, batch_size=64, collate_fn=deepchecks_collate_fn) test_loader = DataLoader(test_dataset, batch_size=64, collate_fn=deepchecks_collate_fn) training_data = VisionData(batch_loader=train_loader, task_type='object_detection', label_map=LABEL_MAP) test_data = VisionData(batch_loader=test_loader, task_type='object_detection', label_map=LABEL_MAP) .. rst-class:: sphx-glr-script-out .. code-block:: none torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) .. GENERATED FROM PYTHON SOURCE LINES 294-300 Making sure our data is in the correct format: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The VisionData object automatically validates your data format and will alert you if there is a problem. However, you can also manually view your images and labels to make sure they are in the correct format by using the ``head`` function to conveniently visualize your data: .. GENERATED FROM PYTHON SOURCE LINES 300-303 .. code-block:: default training_data.head() .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 304-308 Running Deepchecks' suite on our data and model! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now that we have defined the task class, we can validate the model with the deepchecks' model evaluation suite. This can be done with this simple few lines of code: .. GENERATED FROM PYTHON SOURCE LINES 308-314 .. code-block:: default from deepchecks.vision.suites import model_evaluation suite = model_evaluation() result = suite.run(training_data, test_data) .. rst-class:: sphx-glr-script-out .. code-block:: none Processing Batches:Train: | | 0/1 [Time: 00:00] Processing Batches:Train: |#####| 1/1 [Time: 00:44] Processing Batches:Train: |#####| 1/1 [Time: 00:44] Computing Single Dataset Checks Train: | | 0/4 [Time: 00:00] Computing Single Dataset Checks Train: |#2 | 1/4 [Time: 00:00, Check=Mean Average Precision Report] Computing Single Dataset Checks Train: |##5 | 2/4 [Time: 00:00, Check=Mean Average Recall Report] Computing Single Dataset Checks Train: |#####| 4/4 [Time: 00:02, Check=Weak Segments Performance] Computing Single Dataset Checks Train: |#####| 4/4 [Time: 00:02, Check=Weak Segments Performance] Processing Batches:Test: | | 0/1 [Time: 00:00] Processing Batches:Test: |#####| 1/1 [Time: 00:04] Processing Batches:Test: |#####| 1/1 [Time: 00:04] Computing Single Dataset Checks Test: | | 0/4 [Time: 00:00] Computing Single Dataset Checks Test: |###7 | 3/4 [Time: 00:00, Check=Confusion Matrix Report] Computing Single Dataset Checks Test: |#####| 4/4 [Time: 00:02, Check=Weak Segments Performance] Computing Train Test Checks: | | 0/2 [Time: 00:00] Computing Train Test Checks: | | 0/2 [Time: 00:00, Check=Class Performance] Computing Train Test Checks: |##5 | 1/2 [Time: 00:00, Check=Class Performance] Computing Train Test Checks: |##5 | 1/2 [Time: 00:00, Check=Prediction Drift] Computing Train Test Checks: |#####| 2/2 [Time: 00:00, Check=Prediction Drift] Computing Train Test Checks: |#####| 2/2 [Time: 00:00, Check=Prediction Drift] .. GENERATED FROM PYTHON SOURCE LINES 315-320 We also have suites for: :func:`data integrity ` - validating a single dataset and :func:`train test validation ` - validating the dataset split .. GENERATED FROM PYTHON SOURCE LINES 322-327 .. _observing_the_result: Observing the results: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The results can be saved as a html file with the following code: .. GENERATED FROM PYTHON SOURCE LINES 327-330 .. code-block:: default result.save_as_html('output.html') .. rst-class:: sphx-glr-script-out .. code-block:: none 'output (3).html' .. GENERATED FROM PYTHON SOURCE LINES 331-332 Or, if working inside a notebook, the output can be displayed directly by simply printing the result object: .. GENERATED FROM PYTHON SOURCE LINES 332-335 .. code-block:: default result .. raw:: html
Model Evaluation Suite


.. GENERATED FROM PYTHON SOURCE LINES 336-342 We can see that our model does not perform well, as can be seen in the "Class Performance" check under the "Didn't Pass" section of the suite results. This is because the model was trained on a different dataset, and the model was not trained to detect tomatoes. Moreover, we can see that lowering the IoU threshold could have fixed this a bit (as can be seen in the "Mean Average Precision Report" Check), but would still keep the overall precision low. Moreover, under the "Passed" section, we can see that our drift checks have passed, which means that the distribution of the predictions on the training and test data is similar, and the issue is not there but in the model itself. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 1 minutes 15.011 seconds) .. _sphx_glr_download_user-guide_vision_auto_tutorials_plot_detection_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_detection_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_detection_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_