.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "user-guide/vision/auto_tutorials/plot_custom_task_tutorial.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_user-guide_vision_auto_tutorials_plot_custom_task_tutorial.py: ==================== Custom Task Tutorial ==================== Computer vision is an umbrella term for a wide spectrum of objectives models are trained for. These objective reflect on the structure of the data and the possible actions on it. The first step before running any Deepchecks checks is to create an implementation of :class:`VisionData `. Each implementation represents and standardize a computer vision task and allows to run a more complex checks which relates to the given task's characteristics. There are default base classes for a few known tasks like classification, object detection, and semantic segmentation however not all tasks have a base implementation, meaning you will have to create your own task. When creating your own task you will be limited to run checks which are agnostic to the specific task type. For example performance checks that uses IOU works only on object detection tasks, since they need to know the exact bounding box format in order to run, while other checks that uses :doc:`/user-guide/vision/vision_properties` or custom metrics are agnostic to the task type. In this guide we will implement a custom instance segmentation task and run checks on it. Note that instance segmentation is different from semantic segmentation, which is currently supported in Deepchecks. 1. `Defining the Data <#defining-the-data>`__ 2. `Implement Custom Task <#implement-custom-task>`__ 3. `Implement Custom Properties <#implement-custom-properties>`__ 4. `Implement Custom Metric <#implement-custom-metric>`__ .. GENERATED FROM PYTHON SOURCE LINES 30-34 Defining the Data ================= First we will define a `PyTorch Dataset `_. of COCO-128 segmentation task. This part represents your own code, and is not yet Deepchecks related. .. GENERATED FROM PYTHON SOURCE LINES 34-193 .. code-block:: default import contextlib import os import typing as t from pathlib import Path import albumentations as A import matplotlib.pyplot as plt import numpy as np import torch import torchvision.transforms.functional as F from albumentations.pytorch.transforms import ToTensorV2 from PIL import Image, ImageDraw from torch.utils.data import DataLoader from torchvision.datasets import VisionDataset from torchvision.datasets.utils import download_and_extract_archive from torchvision.utils import draw_segmentation_masks class CocoInstanceSegmentationDataset(VisionDataset): """An instance of PyTorch VisionData the represents the COCO128-segments dataset. Parameters ---------- root : str Path to the root directory of the dataset. name : str Name of the dataset. train : bool if `True` train dataset, otherwise test dataset transforms : Callable, optional A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop """ TRAIN_FRACTION = 0.5 def __init__( self, root: str, name: str, train: bool = True, transforms: t.Optional[t.Callable] = None, ) -> None: super().__init__(root, transforms=transforms) self.train = train self.root = Path(root).absolute() self.images_dir = Path(root) / 'images' / name self.labels_dir = Path(root) / 'labels' / name images: t.List[Path] = sorted(self.images_dir.glob('./*.jpg')) labels: t.List[t.Optional[Path]] = [] for image in images: label = self.labels_dir / f'{image.stem}.txt' labels.append(label if label.exists() else None) assert len(images) != 0, 'Did not find folder with images or it was empty' assert not all(l is None for l in labels), 'Did not find folder with labels or it was empty' train_len = int(self.TRAIN_FRACTION * len(images)) if self.train is True: self.images = images[0:train_len] self.labels = labels[0:train_len] else: self.images = images[train_len:] self.labels = labels[train_len:] def __getitem__(self, idx: int) -> t.Tuple[Image.Image, np.ndarray]: """Get the image and label at the given index.""" image = Image.open(str(self.images[idx])) label_file = self.labels[idx] masks = [] classes = [] if label_file is not None: for label_str in label_file.open('r').read().strip().splitlines(): label = np.array(label_str.split(), dtype=np.float32) class_id = int(label[0]) # Transform normalized coordinates to un-normalized coordinates = (label[1:].reshape(-1, 2) * np.array([image.width, image.height])).reshape(-1).tolist() # Create mask image mask = Image.new('L', (image.width, image.height), 0) ImageDraw.Draw(mask).polygon(coordinates, outline=1, fill=1) # Add to list masks.append(np.array(mask, dtype=bool)) classes.append(class_id) if self.transforms is not None: # Albumentations accepts images as numpy transformed = self.transforms(image=np.array(image), masks=masks) image = transformed['image'] masks = transformed['masks'] # Transform masks to tensor of (num_masks, H, W) if masks: if isinstance(masks[0], np.ndarray): masks = [torch.from_numpy(m) for m in masks] masks = torch.stack(masks) else: masks = torch.empty((0, 3)) return image, classes, masks def __len__(self): return len(self.images) @classmethod def load_or_download(cls, root: Path, train: bool) -> 'CocoInstanceSegmentationDataset': extract_dir = root / 'coco128segments' coco_dir = root / 'coco128segments' / 'coco128-seg' folder = 'train2017' if not coco_dir.exists(): url = 'https://ultralytics.com/assets/coco128-segments.zip' with open(os.devnull, 'w', encoding='utf8') as f, contextlib.redirect_stdout(f): download_and_extract_archive( url, download_root=str(root), extract_root=str(extract_dir) ) try: # remove coco128's README.txt so that it does not come in docs os.remove("coco128/README.txt") except: pass return CocoInstanceSegmentationDataset(coco_dir, folder, train=train, transforms=A.Compose([ToTensorV2()])) # Download and load the datasets curr_dir = Path('.') train_ds = CocoInstanceSegmentationDataset.load_or_download(curr_dir, train=True) test_ds = CocoInstanceSegmentationDataset.load_or_download(curr_dir, train=False) def batch_collate(batch): """Function which gets list of samples from `CocoInstanceSegmentationDataset` and combine them to a batch.""" images, classes, masks = zip(*batch) return list(images), list(classes), list(masks) # Create DataLoaders train_data_loader = DataLoader( dataset=train_ds, batch_size=32, shuffle=False, collate_fn=batch_collate ) test_data_loader = DataLoader( dataset=test_ds, batch_size=32, shuffle=False, collate_fn=batch_collate ) .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/7119623 [00:00 List[List[int]]: """Return per label a list of classes (by id) in it.""" # The input `batch_labels` is the result of `batch_to_labels` function. return [x[0] for x in batch_labels] def batch_to_labels(self, batch): """Extract from the batch only the labels. Must return an iterable with an element per image.""" _, classes, masks = batch return [(classes[idx], masks[idx]) for idx in range(len(classes))] def infer_on_batch(self, batch, model, device): """Infer on a batch of images. Must return an iterable with an element per image.""" predictions = model.to(device)(batch[0]) return predictions def batch_to_images(self, batch) -> Sequence[np.ndarray]: """Convert the batch to a list of images as (H, W, C) 3D numpy array per image.""" return [tensor.numpy().transpose((1, 2, 0)) for tensor in batch[0]] .. GENERATED FROM PYTHON SOURCE LINES 246-248 Now we are able to run checks that use only the image data, since it's in the standard Deepchecks format. Let's run PropertyLabelCorrelationChange check with our task .. GENERATED FROM PYTHON SOURCE LINES 248-258 .. code-block:: default from deepchecks.vision.checks import PropertyLabelCorrelationChange # Create our task with the `DataLoader`s we defined before. train_task = MyCustomInstanceSegmentationData(train_data_loader) test_task = MyCustomInstanceSegmentationData(test_data_loader) result = PropertyLabelCorrelationChange().run(train_task, test_task) result.show() .. rst-class:: sphx-glr-script-out .. code-block:: none Validating Input: | | 0/1 [Time: 00:00] Validating Input: |#####| 1/1 [Time: 00:00] Ingesting Batches - Train Dataset: | | 0/2 [Time: 00:00] Ingesting Batches - Train Dataset: |##5 | 1/2 [Time: 00:01] Ingesting Batches - Train Dataset: |#####| 2/2 [Time: 00:02] Ingesting Batches - Train Dataset: |#####| 2/2 [Time: 00:02] Ingesting Batches - Test Dataset: | | 0/2 [Time: 00:00] Ingesting Batches - Test Dataset: |##5 | 1/2 [Time: 00:01] Ingesting Batches - Test Dataset: |#####| 2/2 [Time: 00:02] Ingesting Batches - Test Dataset: |#####| 2/2 [Time: 00:02] Computing Check: | | 0/1 [Time: 00:00] Computing Check: |#####| 1/1 [Time: 00:03] Computing Check: |#####| 1/1 [Time: 00:03] .. raw:: html
Feature Label Correlation Change


.. GENERATED FROM PYTHON SOURCE LINES 259-267 Now in order to run more check, we'll need to define custom properties or metrics. Implement Custom Properties =========================== In order to run checks that are using label or prediction properties we'll have to implement a custom :doc:`properties `. We'll write label properties and run a label drift check. .. GENERATED FROM PYTHON SOURCE LINES 267-298 .. code-block:: default from itertools import chain from deepchecks.vision.checks import TrainTestLabelDrift # The labels object is the result of `batch_to_labels` function we defined earlier. The property should return a flat # list of values. def number_of_detections(labels) -> List[int]: """Return a list containing the number of detections per sample in batch.""" all_masks = [x[1] for x in labels] return [sample_masks.shape[0] for sample_masks in all_masks] def classes_in_labels(labels: List[torch.Tensor]) -> List[int]: """Return a list containing the classes in batch.""" classes = [x[0] for x in labels] return classes # We will pass this object as parameter to checks that are using label properties label_properties = [ {'name': '# Detections per Label', 'method': number_of_detections, 'output_type': 'categorical'}, {'name': 'Classes in Labels', 'method': classes_in_labels, 'output_type': 'class_id'} ] result = TrainTestLabelDrift(label_properties=label_properties).run(train_task, test_task) result.show() .. rst-class:: sphx-glr-script-out .. code-block:: none Validating Input: | | 0/1 [Time: 00:00] Validating Input: |#####| 1/1 [Time: 00:00] Ingesting Batches - Train Dataset: | | 0/2 [Time: 00:00] Ingesting Batches - Train Dataset: |##5 | 1/2 [Time: 00:00] Ingesting Batches - Train Dataset: |#####| 2/2 [Time: 00:00] Ingesting Batches - Train Dataset: |#####| 2/2 [Time: 00:00] Ingesting Batches - Test Dataset: | | 0/2 [Time: 00:00] Ingesting Batches - Test Dataset: |##5 | 1/2 [Time: 00:00] Ingesting Batches - Test Dataset: |#####| 2/2 [Time: 00:00] Ingesting Batches - Test Dataset: |#####| 2/2 [Time: 00:00] Computing Check: | | 0/1 [Time: 00:00] Computing Check: |#####| 1/1 [Time: 00:00] .. raw:: html
Train Test Label Drift


.. GENERATED FROM PYTHON SOURCE LINES 299-306 Implement Custom Metric ======================= Some checks test the model performance and requires a metric in order to run. When using a custom task you will also have to create a custom metric in order for those checks to work, since the built-in metrics don't know to handle your data structure. The metrics need to conform to the API of `pytorch-ignite `_. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 11.140 seconds) .. _sphx_glr_download_user-guide_vision_auto_tutorials_plot_custom_task_tutorial.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_custom_task_tutorial.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_custom_task_tutorial.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_