deepchecks.vision#
Package for vision functionality.
Modules
Module importing all vision checks. |
|
Module contains all prebuilt vision suites. |
|
Module containing datasets and models for vision tasks. |
|
Package for vision utilities. |
Classes
- class VisionData[source]#
VisionData represent a base task in deepchecks. It wraps PyTorch DataLoader together with model related metadata.
The VisionData class is containing additional data and general methods intended for easily accessing metadata relevant for validating a computer vision ML models.
- Parameters
- data_loaderDataLoader
PyTorch DataLoader object. If your data loader is using IterableDataset please see note below.
- num_classesint, optional
Number of classes in the dataset. If not provided, will be inferred from the dataset.
- label_mapDict[int, str], optional
A dictionary mapping class ids to their names.
- transform_fieldstr, default: ‘transforms’
Name of transforms field in the dataset which holds transformations of both data and label.
- Attributes
classes_indices
Return dict of classes as keys, and list of corresponding indices (in Dataset) of samples that include this class (in the label).
data_dimension
Return how many dimensions the image data have.
data_loader
Return the data loader.
has_images
Return True if the data loader has images.
has_labels
Return True if the data loader has labels.
n_of_samples_per_class
Return a dictionary containing the number of samples per class.
num_classes
Return the number of classes in the dataset.
num_samples
Return the number of samples in the dataset.
original_num_samples
Return the number of samples in the original dataset.
task_type
Return the task type: classification, object_detection or other.
transform_field
Return the data loader.
Methods
Assert the image formatter defined is valid.
Assert the label formatter defined is valid.
batch_of_index
(*indices)Return batch samples of the given batch indices.
batch_to_images
(batch)Transform a batch of data to images in the accpeted format.
batch_to_labels
(batch)Transform a batch of data to labels.
copy
([n_samples, shuffle, random_state])Create new copy of this object, with the data-loader and dataset also copied, and altered by the given parameters.
from_dataset
(data[, batch_size, shuffle, ...])Create VisionData instance from a Dataset instance.
Return a copy of the vision data object with the augmentation in the start of it.
get_classes
(batch_labels)Get a labels batch and return classes inside it.
Return transforms handler created from the transform field.
infer_on_batch
(batch, model, device)Infer on a batch of data.
Initialize the cache of the classes' metadata info.
Return whether the vision data is running on sample of the data.
label_id_to_name
(class_id)Return the name of the class with the given id.
to_batch
(*samples)Use the defined collate_fn to transform a few data items to batch format.
to_dataset_index
(*batch_indices)Return for the given batch_index the sample index in the dataset object.
update_cache
(batch)Get labels and update the classes' metadata info.
validate_format
(model[, device])Validate the correctness of the data class implementation according to the expected format.
validate_get_classes
(batch)Validate that the get_classes function returns data in the correct format.
validate_image_data
(batch)Validate that the data is in the required format.
Validate the infered predictions from the batch.
validate_label
(batch)Validate a batch of labels.
validate_prediction
(batch, model, device)Validate the prediction.
validate_shared_label
(other)Verify presence of shared labels.
- __init__(data_loader: DataLoader, num_classes: Optional[int] = None, label_map: Optional[Dict[int, str]] = None, transform_field: Optional[str] = 'transforms')[source]#
- abstract batch_to_images(batch) Sequence[ndarray] [source]#
Transform a batch of data to images in the accpeted format.
- Parameters
- batchtorch.Tensor
Batch of data to transform to images.
- Returns
- Sequence[np.ndarray]
List of images in the accepted format. Each image in the iterable must be a [H, W, C] 3D numpy array. See notes for more details. :func: batch_to_images must be implemented in a subclass.
Notes
Each image in the iterable must be a [H, W, C] 3D numpy array. The first dimension must be the image height (y axis), the second being the image width (x axis), and the third being the number of channels. The numbers in the array should be in the range [0, 255]. Color images should be in RGB format and have 3 channels, while grayscale images should have 1 channel.
Examples
>>> import numpy as np ... ... ... def batch_to_images(self, batch): ... # Converts a batch of normalized images to rgb images with range [0, 255] ... inp = batch[0].detach().numpy().transpose((0, 2, 3, 1)) ... mean = [0.485, 0.456, 0.406] ... std = [0.229, 0.224, 0.225] ... inp = std * inp + mean ... inp = np.clip(inp, 0, 1) ... return inp * 255
- abstract batch_to_labels(batch) Union[List[Tensor], Tensor] [source]#
Transform a batch of data to labels.
- property classes_indices: Dict[int, List[int]]#
Return dict of classes as keys, and list of corresponding indices (in Dataset) of samples that include this class (in the label).
- copy(n_samples: Optional[int] = None, shuffle: bool = False, random_state: Optional[int] = None) VD [source]#
Create new copy of this object, with the data-loader and dataset also copied, and altered by the given parameters.
- Parameters
- n_samplesint , default: None
take only this number of samples to the copied DataLoader. The samples which will be chosen are affected by random_state (fixed random state will return consistent samples).
- shufflebool, default: False
Whether to shuffle the samples order. The shuffle is affected random_state (fixed random state will return consistent order)
- random_stateint , default: None
random_state used for the psuedo-random actions (sampling and shuffling)
- property data_dimension#
Return how many dimensions the image data have.
- property data_loader: torch.utils.data.dataloader.DataLoader#
Return the data loader.
- classmethod from_dataset(data: Dataset, batch_size: int = 64, shuffle: bool = True, num_workers: int = 0, pin_memory: bool = True, collate_fn: Optional[Callable] = None, num_classes: Optional[int] = None, label_map: Optional[Dict[int, str]] = None, transform_field: Optional[str] = 'transforms') VD [source]#
Create VisionData instance from a Dataset instance.
- Parameters
- dataDataset
instance of a Dataset.
- batch_size: int, default 64
how many samples per batch to load.
- shufflebool, default True:
set to
True
to have the data reshuffled at every epoch.- num_workers int, default 0:
how many subprocesses to use for data loading.
0
means that the data will be loaded in the main process.- pin_memory bool, default True
If
True
, the data loader will copy Tensors into CUDA pinned memory before returning them.- collate_fnOptional[Callable]
merges a list of samples to form a mini-batch of Tensor(s).
- num_classesOptional[int], default None
Number of classes in the dataset. If not provided, will be inferred from the dataset.
- label_mapOptional[Dict[int, str]], default None
A dictionary mapping class ids to their names.
- transform_fieldOptional[str], default: ‘transforms’
Name of transforms field in the dataset which holds transformations of both data and label.
- Returns
- VisionData
- get_augmented_dataset(aug) VD [source]#
Return a copy of the vision data object with the augmentation in the start of it.
- abstract get_classes(batch_labels: Union[List[Tensor], Tensor]) List[List[int]] [source]#
Get a labels batch and return classes inside it.
- property has_images: bool#
Return True if the data loader has images.
- property has_labels: bool#
Return True if the data loader has labels.
- abstract infer_on_batch(batch, model, device) Union[List[Tensor], Tensor] [source]#
Infer on a batch of data.
- property n_of_samples_per_class: Dict[Any, int]#
Return a dictionary containing the number of samples per class.
- property num_classes: int#
Return the number of classes in the dataset.
- property num_samples: int#
Return the number of samples in the dataset.
- property original_num_samples: int#
Return the number of samples in the original dataset.
- property task_type: deepchecks.vision.task_type.TaskType#
Return the task type: classification, object_detection or other.
- to_batch(*samples)[source]#
Use the defined collate_fn to transform a few data items to batch format.
- to_dataset_index(*batch_indices)[source]#
Return for the given batch_index the sample index in the dataset object.
- property transform_field: str#
Return the data loader.
- validate_format(model, device=None)[source]#
Validate the correctness of the data class implementation according to the expected format.
- Parameters
- modelModel
Model to validate the data class implementation against.
- device
Device to run the model on.
- validate_get_classes(batch)[source]#
Validate that the get_classes function returns data in the correct format.
- Parameters
- batch
- Raises
- ValidationError
If the classes data doesn’t fit the format after being transformed.
- validate_image_data(batch)[source]#
Validate that the data is in the required format.
The validation is done on the first element of the batch.
- Parameters
- batch
- Raises
- DeepchecksValueError
If the batch data doesn’t fit the format after being transformed by self().
- static validate_infered_batch_predictions(batch_predictions)[source]#
Validate the infered predictions from the batch.
- validate_prediction(batch, model, device)[source]#
Validate the prediction.
- Parameters
- batcht.Any
Batch from DataLoader
- modelt.Any
- devicetorch.Device
- Raises
- ValidationError
If predictions format is invalid (depends on validate_infered_batch_predictions implementations)
- DeepchecksNotImplementedError
If infer_on_batch not implemented
Verify presence of shared labels.
Validates whether the 2 datasets share the same label shape
- Parameters
- otherVisionData
Expected to be Dataset type. dataset to compare
- Raises
- DeepchecksValueError
if datasets don’t have the same label
- class ClassificationData[source]#
The ClassificationData class is used to load and preprocess data for a classification task.
It is a subclass of the VisionData class. The ClassificationData class is containing additional data and general methods intended for easily accessing metadata relevant for validating a computer vision classification ML models.
- Attributes
classes_indices
Return dict of classes as keys, and list of corresponding indices (in Dataset) of samples that include this class (in the label).
data_dimension
Return how many dimensions the image data have.
data_loader
Return the data loader.
has_images
Return True if the data loader has images.
has_labels
Return True if the data loader has labels.
n_of_samples_per_class
Return a dictionary containing the number of samples per class.
num_classes
Return the number of classes in the dataset.
num_samples
Return the number of samples in the dataset.
original_num_samples
Return the number of samples in the original dataset.
task_type
Return the task type (classification).
transform_field
Return the data loader.
Methods
Assert the image formatter defined is valid.
Assert the label formatter defined is valid.
batch_of_index
(*indices)Return batch samples of the given batch indices.
batch_to_images
(batch)Transform a batch of data to images in the accpeted format.
batch_to_labels
(batch)Extract the labels from a batch of data.
copy
([n_samples, shuffle, random_state])Create new copy of this object, with the data-loader and dataset also copied, and altered by the given parameters.
from_dataset
(data[, batch_size, shuffle, ...])Create VisionData instance from a Dataset instance.
Return a copy of the vision data object with the augmentation in the start of it.
get_classes
(batch_labels)Get a labels batch and return classes inside it.
Return transforms handler created from the transform field.
infer_on_batch
(batch, model, device)Return the predictions of the model on a batch of data.
Initialize the cache of the classes' metadata info.
Return whether the vision data is running on sample of the data.
label_id_to_name
(class_id)Return the name of the class with the given id.
to_batch
(*samples)Use the defined collate_fn to transform a few data items to batch format.
to_dataset_index
(*batch_indices)Return for the given batch_index the sample index in the dataset object.
update_cache
(batch)Get labels and update the classes' metadata info.
validate_format
(model[, device])Validate the correctness of the data class implementation according to the expected format.
validate_get_classes
(batch)Validate that the get_classes function returns data in the correct format.
validate_image_data
(batch)Validate that the data is in the required format.
validate_infered_batch_predictions
(...[, ...])Validate the infered predictions from the batch.
validate_label
(batch)Validate the label.
validate_prediction
(batch, model, device)Validate the prediction.
validate_shared_label
(other)Verify presence of shared labels.
- __init__(data_loader: DataLoader, num_classes: Optional[int] = None, label_map: Optional[Dict[int, str]] = None, transform_field: Optional[str] = 'transforms')[source]#
- abstract batch_to_images(batch) Sequence[ndarray] [source]#
Transform a batch of data to images in the accpeted format.
- Parameters
- batchtorch.Tensor
Batch of data to transform to images.
- Returns
- Sequence[np.ndarray]
List of images in the accepted format. Each image in the iterable must be a [H, W, C] 3D numpy array. See notes for more details. :func: batch_to_images must be implemented in a subclass.
Notes
Each image in the iterable must be a [H, W, C] 3D numpy array. The first dimension must be the image height (y axis), the second being the image width (x axis), and the third being the number of channels. The numbers in the array should be in the range [0, 255]. Color images should be in RGB format and have 3 channels, while grayscale images should have 1 channel.
Examples
>>> import numpy as np ... ... ... def batch_to_images(self, batch): ... # Converts a batch of normalized images to rgb images with range [0, 255] ... inp = batch[0].detach().numpy().transpose((0, 2, 3, 1)) ... mean = [0.485, 0.456, 0.406] ... std = [0.229, 0.224, 0.225] ... inp = std * inp + mean ... inp = np.clip(inp, 0, 1) ... return inp * 255
- abstract batch_to_labels(batch) Tensor [source]#
Extract the labels from a batch of data.
- Parameters
- batchtorch.Tensor
The batch of data.
- Returns
- torch.Tensor
The labels extracted from the batch. The labels should be in a tensor format of shape (N,), where N is the number of samples in the batch. See the notes for more info.
Notes
The accepted label format for classification is a tensor of shape (N,), when N is the number of samples. Each element is an integer representing the class index.
Examples
>>> def batch_to_labels(self, batch): ... return batch[1]
- property classes_indices: Dict[int, List[int]]#
Return dict of classes as keys, and list of corresponding indices (in Dataset) of samples that include this class (in the label).
- copy(n_samples: Optional[int] = None, shuffle: bool = False, random_state: Optional[int] = None) VD [source]#
Create new copy of this object, with the data-loader and dataset also copied, and altered by the given parameters.
- Parameters
- n_samplesint , default: None
take only this number of samples to the copied DataLoader. The samples which will be chosen are affected by random_state (fixed random state will return consistent samples).
- shufflebool, default: False
Whether to shuffle the samples order. The shuffle is affected random_state (fixed random state will return consistent order)
- random_stateint , default: None
random_state used for the psuedo-random actions (sampling and shuffling)
- property data_dimension#
Return how many dimensions the image data have.
- property data_loader: torch.utils.data.dataloader.DataLoader#
Return the data loader.
- classmethod from_dataset(data: Dataset, batch_size: int = 64, shuffle: bool = True, num_workers: int = 0, pin_memory: bool = True, collate_fn: Optional[Callable] = None, num_classes: Optional[int] = None, label_map: Optional[Dict[int, str]] = None, transform_field: Optional[str] = 'transforms') VD [source]#
Create VisionData instance from a Dataset instance.
- Parameters
- dataDataset
instance of a Dataset.
- batch_size: int, default 64
how many samples per batch to load.
- shufflebool, default True:
set to
True
to have the data reshuffled at every epoch.- num_workers int, default 0:
how many subprocesses to use for data loading.
0
means that the data will be loaded in the main process.- pin_memory bool, default True
If
True
, the data loader will copy Tensors into CUDA pinned memory before returning them.- collate_fnOptional[Callable]
merges a list of samples to form a mini-batch of Tensor(s).
- num_classesOptional[int], default None
Number of classes in the dataset. If not provided, will be inferred from the dataset.
- label_mapOptional[Dict[int, str]], default None
A dictionary mapping class ids to their names.
- transform_fieldOptional[str], default: ‘transforms’
Name of transforms field in the dataset which holds transformations of both data and label.
- Returns
- VisionData
- get_augmented_dataset(aug) VD [source]#
Return a copy of the vision data object with the augmentation in the start of it.
- get_classes(batch_labels: Union[List[Tensor], Tensor])[source]#
Get a labels batch and return classes inside it.
- property has_images: bool#
Return True if the data loader has images.
- property has_labels: bool#
Return True if the data loader has labels.
- abstract infer_on_batch(batch, model, device) Tensor [source]#
Return the predictions of the model on a batch of data.
- Parameters
- batchtorch.Tensor
The batch of data.
- modeltorch.nn.Module
The model to use for inference.
- devicetorch.device
The device to use for inference.
- Returns
- torch.Tensor
The predictions of the model on the batch. The predictions should be in a OHE tensor format of shape (N, n_classes), where N is the number of samples in the batch.
Notes
The accepted prediction format for classification is a tensor of shape (N, n_classes), where N is the number of samples. Each element is an array of length n_classes that represent the probability of each class.
Examples
>>> import torch.nn.functional as F ... ... ... def infer_on_batch(self, batch, model, device): ... logits = model.to(device)(batch[0].to(device)) ... return F.softmax(logits, dim=1)
- property n_of_samples_per_class: Dict[Any, int]#
Return a dictionary containing the number of samples per class.
- property num_classes: int#
Return the number of classes in the dataset.
- property num_samples: int#
Return the number of samples in the dataset.
- property original_num_samples: int#
Return the number of samples in the original dataset.
- property task_type: deepchecks.vision.task_type.TaskType#
Return the task type (classification).
- to_batch(*samples)[source]#
Use the defined collate_fn to transform a few data items to batch format.
- to_dataset_index(*batch_indices)[source]#
Return for the given batch_index the sample index in the dataset object.
- property transform_field: str#
Return the data loader.
- validate_format(model, device=None)[source]#
Validate the correctness of the data class implementation according to the expected format.
- Parameters
- modelModel
Model to validate the data class implementation against.
- device
Device to run the model on.
- validate_get_classes(batch)[source]#
Validate that the get_classes function returns data in the correct format.
- Parameters
- batch
- Raises
- ValidationError
If the classes data doesn’t fit the format after being transformed.
- validate_image_data(batch)[source]#
Validate that the data is in the required format.
The validation is done on the first element of the batch.
- Parameters
- batch
- Raises
- DeepchecksValueError
If the batch data doesn’t fit the format after being transformed by self().
- static validate_infered_batch_predictions(batch_predictions, n_classes: Optional[int] = None, eps: float = 0.001)[source]#
Validate the infered predictions from the batch.
- Parameters
- batch_predictionst.Any
The infered predictions from the batch
- n_classesint , default: None
Number of classes.
- epsfloat , default: 1e-3
Epsilon value to be used in the validation, by default 1e-3
- Raises
- ValidationError
If predictions format is invalid
- DeepchecksNotImplementedError
If infer_on_batch not implemented
- validate_prediction(batch, model, device)[source]#
Validate the prediction.
- Parameters
- batcht.Any
Batch from DataLoader
- modelt.Any
- devicetorch.Device
- Raises
- ValidationError
If predictions format is invalid (depends on validate_infered_batch_predictions implementations)
- DeepchecksNotImplementedError
If infer_on_batch not implemented
Verify presence of shared labels.
Validates whether the 2 datasets share the same label shape
- Parameters
- otherVisionData
Expected to be Dataset type. dataset to compare
- Raises
- DeepchecksValueError
if datasets don’t have the same label
- class DetectionData[source]#
The DetectionData class is used to load and preprocess data for a object detection task.
It is a subclass of the VisionData class. The DetectionData class is containing additional data and general methods intended for easily accessing metadata relevant for validating a computer vision object detection ML models.
- Attributes
classes_indices
Return dict of classes as keys, and list of corresponding indices (in Dataset) of samples that include this class (in the label).
data_dimension
Return how many dimensions the image data have.
data_loader
Return the data loader.
has_images
Return True if the data loader has images.
has_labels
Return True if the data loader has labels.
n_of_samples_per_class
Return a dictionary containing the number of samples per class.
num_classes
Return the number of classes in the dataset.
num_samples
Return the number of samples in the dataset.
original_num_samples
Return the number of samples in the original dataset.
task_type
Return the task type (object_detection).
transform_field
Return the data loader.
Methods
Assert the image formatter defined is valid.
Assert the label formatter defined is valid.
batch_of_index
(*indices)Return batch samples of the given batch indices.
batch_to_images
(batch)Transform a batch of data to images in the accpeted format.
batch_to_labels
(batch)Extract the labels from a batch of data.
copy
([n_samples, shuffle, random_state])Create new copy of this object, with the data-loader and dataset also copied, and altered by the given parameters.
from_dataset
(data[, batch_size, shuffle, ...])Create VisionData instance from a Dataset instance.
Return a copy of the vision data object with the augmentation in the start of it.
get_classes
(batch_labels)Get a labels batch and return classes inside it.
Return transforms handler created from the transform field.
infer_on_batch
(batch, model, device)Return the predictions of the model on a batch of data.
Initialize the cache of the classes' metadata info.
Return whether the vision data is running on sample of the data.
label_id_to_name
(class_id)Return the name of the class with the given id.
to_batch
(*samples)Use the defined collate_fn to transform a few data items to batch format.
to_dataset_index
(*batch_indices)Return for the given batch_index the sample index in the dataset object.
update_cache
(batch)Get labels and update the classes' metadata info.
validate_format
(model[, device])Validate the correctness of the data class implementation according to the expected format.
validate_get_classes
(batch)Validate that the get_classes function returns data in the correct format.
validate_image_data
(batch)Validate that the data is in the required format.
Validate the infered predictions from the batch.
validate_label
(batch)Validate the label.
validate_prediction
(batch, model, device)Validate the prediction.
validate_shared_label
(other)Verify presence of shared labels.
- __init__(data_loader: DataLoader, num_classes: Optional[int] = None, label_map: Optional[Dict[int, str]] = None, transform_field: Optional[str] = 'transforms')[source]#
- abstract batch_to_images(batch) Sequence[ndarray] [source]#
Transform a batch of data to images in the accpeted format.
- Parameters
- batchtorch.Tensor
Batch of data to transform to images.
- Returns
- Sequence[np.ndarray]
List of images in the accepted format. Each image in the iterable must be a [H, W, C] 3D numpy array. See notes for more details. :func: batch_to_images must be implemented in a subclass.
Notes
Each image in the iterable must be a [H, W, C] 3D numpy array. The first dimension must be the image height (y axis), the second being the image width (x axis), and the third being the number of channels. The numbers in the array should be in the range [0, 255]. Color images should be in RGB format and have 3 channels, while grayscale images should have 1 channel.
Examples
>>> import numpy as np ... ... ... def batch_to_images(self, batch): ... # Converts a batch of normalized images to rgb images with range [0, 255] ... inp = batch[0].detach().numpy().transpose((0, 2, 3, 1)) ... mean = [0.485, 0.456, 0.406] ... std = [0.229, 0.224, 0.225] ... inp = std * inp + mean ... inp = np.clip(inp, 0, 1) ... return inp * 255
- abstract batch_to_labels(batch) List[Tensor] [source]#
Extract the labels from a batch of data.
- Parameters
- batchtorch.Tensor
The batch of data.
- Returns
- List[torch.Tensor]
The labels extracted from the batch. The labels should be a list of length N containing tensor of shape (B, 5) where N is the number of samples, B is the number of bounding boxes in the sample and each bounding box is represented by 5 values. See the notes for more info.
Notes
The accepted label format for is a a list of length N containing tensors of shape (B, 5), where N is the number of samples, B is the number of bounding boxes in the sample and each bounding box is represented by 5 values: (class_id, x, y, w, h). x and y are the coordinates (in pixels) of the upper left corner of the bounding box, w
and h are the width and height of the bounding box (in pixels) and class_id is the class id of the prediction.
Examples
>>> import torch ... ... ... def batch_to_labels(self, batch): ... # each bbox in the labels is (class_id, x, y, x, y). convert to (class_id, x, y, w, h) ... return [torch.stack( ... [torch.cat((bbox[0], bbox[1:3], bbox[4:] - bbox[1:3]), dim=0) ... for bbox in image]) ... for image in batch[1]]
- property classes_indices: Dict[int, List[int]]#
Return dict of classes as keys, and list of corresponding indices (in Dataset) of samples that include this class (in the label).
- copy(n_samples: Optional[int] = None, shuffle: bool = False, random_state: Optional[int] = None) VD [source]#
Create new copy of this object, with the data-loader and dataset also copied, and altered by the given parameters.
- Parameters
- n_samplesint , default: None
take only this number of samples to the copied DataLoader. The samples which will be chosen are affected by random_state (fixed random state will return consistent samples).
- shufflebool, default: False
Whether to shuffle the samples order. The shuffle is affected random_state (fixed random state will return consistent order)
- random_stateint , default: None
random_state used for the psuedo-random actions (sampling and shuffling)
- property data_dimension#
Return how many dimensions the image data have.
- property data_loader: torch.utils.data.dataloader.DataLoader#
Return the data loader.
- classmethod from_dataset(data: Dataset, batch_size: int = 64, shuffle: bool = True, num_workers: int = 0, pin_memory: bool = True, collate_fn: Optional[Callable] = None, num_classes: Optional[int] = None, label_map: Optional[Dict[int, str]] = None, transform_field: Optional[str] = 'transforms') VD [source]#
Create VisionData instance from a Dataset instance.
- Parameters
- dataDataset
instance of a Dataset.
- batch_size: int, default 64
how many samples per batch to load.
- shufflebool, default True:
set to
True
to have the data reshuffled at every epoch.- num_workers int, default 0:
how many subprocesses to use for data loading.
0
means that the data will be loaded in the main process.- pin_memory bool, default True
If
True
, the data loader will copy Tensors into CUDA pinned memory before returning them.- collate_fnOptional[Callable]
merges a list of samples to form a mini-batch of Tensor(s).
- num_classesOptional[int], default None
Number of classes in the dataset. If not provided, will be inferred from the dataset.
- label_mapOptional[Dict[int, str]], default None
A dictionary mapping class ids to their names.
- transform_fieldOptional[str], default: ‘transforms’
Name of transforms field in the dataset which holds transformations of both data and label.
- Returns
- VisionData
- get_augmented_dataset(aug) VD [source]#
Return a copy of the vision data object with the augmentation in the start of it.
- property has_images: bool#
Return True if the data loader has images.
- property has_labels: bool#
Return True if the data loader has labels.
- abstract infer_on_batch(batch, model, device) Sequence[Tensor] [source]#
Return the predictions of the model on a batch of data.
- Parameters
- batchtorch.Tensor
The batch of data.
- modeltorch.nn.Module
The model to use for inference.
- devicetorch.device
The device to use for inference.
- Returns
- Sequence[torch.Tensor]
The predictions of the model on the batch. The predictions should be in a sequence of length N containing tensors of shape (B, 6), where N is the number of images, B is the number of bounding boxes detected in the sample and each bounding box is represented by 6 values. See the notes for more info.
Notes
The accepted prediction format is a list of length N containing tensors of shape (B, 6), where N is the number of images, B is the number of bounding boxes detected in the sample and each bounding box is represented by 6 values: [x, y, w, h, confidence, class_id]. x and y are the coordinates (in pixels) of the upper left corner of the bounding box, w and h are the width and height of the bounding box (in pixels), confidence is the confidence of the model and class_id is the class id.
Examples
>>> import torch ... ... ... def infer_on_batch(self, batch, model, device): ... # Converts a yolo prediction batch to the accepted xywh format ... return_list = [] ... ... predictions = model(batch[0]) ... # yolo Detections objects have List[torch.Tensor] xyxy output in .pred ... for single_image_tensor in predictions.pred: ... pred_modified = torch.clone(single_image_tensor) ... pred_modified[:, 2] = pred_modified[:, 2] - pred_modified[:, 0] ... pred_modified[:, 3] = pred_modified[:, 3] - pred_modified[:, 1] ... return_list.append(pred_modified) ... ... return return_list
- property n_of_samples_per_class: Dict[Any, int]#
Return a dictionary containing the number of samples per class.
- property num_classes: int#
Return the number of classes in the dataset.
- property num_samples: int#
Return the number of samples in the dataset.
- property original_num_samples: int#
Return the number of samples in the original dataset.
- property task_type: deepchecks.vision.task_type.TaskType#
Return the task type (object_detection).
- to_batch(*samples)[source]#
Use the defined collate_fn to transform a few data items to batch format.
- to_dataset_index(*batch_indices)[source]#
Return for the given batch_index the sample index in the dataset object.
- property transform_field: str#
Return the data loader.
- validate_format(model, device=None)[source]#
Validate the correctness of the data class implementation according to the expected format.
- Parameters
- modelModel
Model to validate the data class implementation against.
- device
Device to run the model on.
- validate_get_classes(batch)[source]#
Validate that the get_classes function returns data in the correct format.
- Parameters
- batch
- Raises
- ValidationError
If the classes data doesn’t fit the format after being transformed.
- validate_image_data(batch)[source]#
Validate that the data is in the required format.
The validation is done on the first element of the batch.
- Parameters
- batch
- Raises
- DeepchecksValueError
If the batch data doesn’t fit the format after being transformed by self().
- static validate_infered_batch_predictions(batch_predictions)[source]#
Validate the infered predictions from the batch.
- Parameters
- batch_predictionst.Any
The infered predictions from the batch
- Raises
- ValidationError
If predictions format is invalid
- DeepchecksNotImplementedError
If infer_on_batch not implemented
- validate_label(batch)[source]#
Validate the label.
- Parameters
- batch
- Raises
- DeepchecksValueError
If labels format is invalid
- DeepchecksNotImplementedError
If batch_to_labels not implemented
- validate_prediction(batch, model, device)[source]#
Validate the prediction.
- Parameters
- batcht.Any
Batch from DataLoader
- modelt.Any
- devicetorch.Device
- Raises
- ValidationError
If predictions format is invalid (depends on validate_infered_batch_predictions implementations)
- DeepchecksNotImplementedError
If infer_on_batch not implemented
Verify presence of shared labels.
Validates whether the 2 datasets share the same label shape
- Parameters
- otherVisionData
Expected to be Dataset type. dataset to compare
- Raises
- DeepchecksValueError
if datasets don’t have the same label
- class Context[source]#
Contains all the data + properties the user has passed to a check/suite, and validates it seamlessly.
- Parameters
- trainOptional[VisionData] , default: None
VisionData object, representing data an neural network was fitted on
- testOptional[VisionData] , default: None
VisionData object, representing data an neural network predicts on
- modelOptional[nn.Module] , default: None
pytorch neural network module instance
- model_name: str , default: ‘’
The name of the model
- scorersOptional[Mapping[str, Metric]] , default: None
dict of scorers names to a Metric
- scorers_per_classOptional[Mapping[str, Metric]] , default: None
dict of scorers for classification without averaging of the classes. See <a href= “https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel”> scikit-learn docs</a>
- deviceUnion[str, torch.device], default: ‘cpu’
processing unit for use
- random_stateint
A seed to set for pseudo-random functions
- n_samplesOptional[int], default: None
number of samples
- with_displaybool , default: True
flag that determines if checks will calculate display (redundant in some checks).
- train_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the train dataset (keys are the indexes).
- test_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the test dataset (keys are the indexes).
- Attributes
device
Return device specified by the user.
model
Return & validate model if model exists, otherwise raise error.
model_name
Return model name.
static_predictions
Return the static_predictions.
static_properties
Return the static_predictions.
test
Return test if exists, otherwise raise error.
train
Return train if exists, otherwise raise error.
with_display
Return the with_display flag.
Methods
add_is_sampled_footnote
(result[, kind])Get footnote to display when the datasets are sampled.
assert_predictions_valid
([kind])Assert that for given DatasetKind the model & dataset infer_on_batch return predictions in right format.
assert_task_type
(*expected_types)Assert task_type matching given types.
finalize_check_result
(check_result, check)Run final processing on a check result which includes validation and conditions processing.
get_data_by_kind
(kind)Return the relevant VisionData by given kind.
Return whether there is test dataset defined.
- __init__(train: Optional[VisionData] = None, test: Optional[VisionData] = None, model: Optional[Module] = None, model_name: str = '', scorers: Optional[Mapping[str, Metric]] = None, scorers_per_class: Optional[Mapping[str, Metric]] = None, device: Optional[Union[str, device]] = None, random_state: int = 42, n_samples: Optional[int] = None, with_display: bool = True, train_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, test_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, train_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None, test_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None)[source]#
- add_is_sampled_footnote(result: Union[CheckResult, SuiteResult], kind: Optional[DatasetKind] = None)[source]#
Get footnote to display when the datasets are sampled.
- assert_predictions_valid(kind: Optional[DatasetKind] = None)[source]#
Assert that for given DatasetKind the model & dataset infer_on_batch return predictions in right format.
- property device: torch.device#
Return device specified by the user.
- finalize_check_result(check_result, check)[source]#
Run final processing on a check result which includes validation and conditions processing.
- property model: torch.nn.modules.module.Module#
Return & validate model if model exists, otherwise raise error.
- property model_name#
Return model name.
- property static_predictions: Dict#
Return the static_predictions.
- property static_properties: Dict#
Return the static_predictions.
- property test: VisionData#
Return test if exists, otherwise raise error.
- property train: VisionData#
Return train if exists, otherwise raise error.
- property with_display: bool#
Return the with_display flag.
- class SingleDatasetCheck[source]#
Parent class for checks that only use one dataset.
Methods
add_condition
(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
compute
(context, dataset_kind)Compute final check result based on accumulated internal state.
conditions_decision
(result)Run conditions on given result.
config
()Return check configuration (conditions' configuration not yet supported).
alias of
Context
from_config
(conf)Return check object from a CheckConfig object.
initialize_run
(context, dataset_kind)Initialize run before starting updating on batches.
metadata
([with_doc_link])Return check metadata.
name
()Name of class in split camel case.
params
([show_defaults])Return parameters to show when printing the check.
remove_condition
(index)Remove given condition by index.
run
(dataset[, model, model_name, scorers, ...])Run check.
update
(context, batch, dataset_kind)Update internal check state with given batch.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
Add new condition function to the check.
- Parameters
- namestr
Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
Additional parameters to pass when calling the condition function.
- compute(context: Context, dataset_kind: DatasetKind) CheckResult [source]#
Compute final check result based on accumulated internal state.
- conditions_decision(result: CheckResult) List[ConditionResult] [source]#
Run conditions on given result.
- config() CheckConfig [source]#
Return check configuration (conditions’ configuration not yet supported).
- Returns
- CheckConfig
includes the checks class name, params, and module name.
- static from_config(conf: CheckConfig) BaseCheck [source]#
Return check object from a CheckConfig object.
- Parameters
- confCheckConfig
the CheckConfig object
- Returns
- BaseCheck
the check class object from given config
- initialize_run(context: Context, dataset_kind: DatasetKind)[source]#
Initialize run before starting updating on batches. Optional.
- metadata(with_doc_link: bool = False) CheckMetadata [source]#
Return check metadata.
- Parameters
- with_doc_linkbool, default False
whethere to include doc link in summary or not
- Returns
- Dict[str, Any]
- params(show_defaults: bool = False) Dict [source]#
Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
Remove given condition by index.
- Parameters
- indexint
index of condtion to remove
- run(dataset: VisionData, model: Optional[Module] = None, model_name: str = '', scorers: Optional[Mapping[str, Metric]] = None, scorers_per_class: Optional[Mapping[str, Metric]] = None, device: Optional[Union[str, device]] = None, random_state: int = 42, n_samples: Optional[int] = 10000, with_display: bool = True, train_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, test_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, train_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None, test_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None) CheckResult [source]#
Run check.
- Parameters
- dataset: VisionData
VisionData object to process
- model: Optional[nn.Module] , default None
pytorch neural network module instance
- model_name: str , default: ‘’
The name of the model
- scorersOptional[Mapping[str, Metric]] , default: None
dict of scorers names to a Metric
- scorers_per_classOptional[Mapping[str, Metric]] , default: None
dict of scorers for classification without averaging of the classes. See <a href= “https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel”> scikit-learn docs</a>
- deviceUnion[str, torch.device], default: ‘cpu’
processing unit for use
- random_stateint
A seed to set for pseudo-random functions
- n_samplesOptional[int], default: None
number of samples
- with_displaybool , default: True
flag that determines if checks will calculate display (redundant in some checks).
- train_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the train dataset (keys are the indexes).
- test_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the test dataset (keys are the indexes).
- class TrainTestCheck[source]#
Parent class for checks that compare two datasets.
The class checks train dataset and test dataset for model training and test.
Methods
add_condition
(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
compute
(context)Compute final check result based on accumulated internal state.
conditions_decision
(result)Run conditions on given result.
config
()Return check configuration (conditions' configuration not yet supported).
alias of
Context
from_config
(conf)Return check object from a CheckConfig object.
initialize_run
(context)Initialize run before starting updating on batches.
metadata
([with_doc_link])Return check metadata.
name
()Name of class in split camel case.
params
([show_defaults])Return parameters to show when printing the check.
remove_condition
(index)Remove given condition by index.
run
(train_dataset, test_dataset[, model, ...])Run check.
update
(context, batch, dataset_kind)Update internal check state with given batch for either train or test.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
Add new condition function to the check.
- Parameters
- namestr
Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
Additional parameters to pass when calling the condition function.
- compute(context: Context) CheckResult [source]#
Compute final check result based on accumulated internal state.
- conditions_decision(result: CheckResult) List[ConditionResult] [source]#
Run conditions on given result.
- config() CheckConfig [source]#
Return check configuration (conditions’ configuration not yet supported).
- Returns
- CheckConfig
includes the checks class name, params, and module name.
- static from_config(conf: CheckConfig) BaseCheck [source]#
Return check object from a CheckConfig object.
- Parameters
- confCheckConfig
the CheckConfig object
- Returns
- BaseCheck
the check class object from given config
- initialize_run(context: Context)[source]#
Initialize run before starting updating on batches. Optional.
- metadata(with_doc_link: bool = False) CheckMetadata [source]#
Return check metadata.
- Parameters
- with_doc_linkbool, default False
whethere to include doc link in summary or not
- Returns
- Dict[str, Any]
- params(show_defaults: bool = False) Dict [source]#
Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
Remove given condition by index.
- Parameters
- indexint
index of condtion to remove
- run(train_dataset: VisionData, test_dataset: VisionData, model: Optional[Module] = None, model_name: str = '', scorers: Optional[Mapping[str, Metric]] = None, scorers_per_class: Optional[Mapping[str, Metric]] = None, device: Optional[Union[str, device]] = None, random_state: int = 42, n_samples: Optional[int] = 10000, with_display: bool = True, train_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, test_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, train_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None, test_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None) CheckResult [source]#
Run check.
- Parameters
- train_dataset: VisionData
VisionData object, representing data an neural network was fitted on
- test_dataset: VisionData
VisionData object, representing data an neural network predicts on
- model: Optional[nn.Module] , default None
pytorch neural network module instance
- model_name: str , default: ‘’
The name of the model
- scorersOptional[Mapping[str, Metric]] , default: None
dict of scorers names to a Metric
- scorers_per_classOptional[Mapping[str, Metric]] , default: None
dict of scorers for classification without averaging of the classes. See <a href= “https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel”> scikit-learn docs</a>
- deviceUnion[str, torch.device], default: ‘cpu’
processing unit for use
- random_stateint
A seed to set for pseudo-random functions
- n_samplesOptional[int], default: None
number of samples
- with_displaybool , default: True
flag that determines if checks will calculate display (redundant in some checks).
- train_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the train dataset (keys are the indexes).
- test_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the test dataset (keys are the indexes).
- class ModelOnlyCheck[source]#
Parent class for checks that only use a model and no datasets.
Methods
add_condition
(name, condition_func, **params)Add new condition function to the check.
Remove all conditions from this check instance.
compute
(context)Compute final check result.
conditions_decision
(result)Run conditions on given result.
config
()Return check configuration (conditions' configuration not yet supported).
alias of
Context
from_config
(conf)Return check object from a CheckConfig object.
initialize_run
(context)Initialize run before starting updating on batches.
metadata
([with_doc_link])Return check metadata.
name
()Name of class in split camel case.
params
([show_defaults])Return parameters to show when printing the check.
remove_condition
(index)Remove given condition by index.
run
(model[, model_name, scorers, ...])Run check.
- add_condition(name: str, condition_func: Callable[[Any], Union[ConditionResult, bool]], **params)[source]#
Add new condition function to the check.
- Parameters
- namestr
Name of the condition. should explain the condition action and parameters
- condition_funcCallable[[Any], Union[List[ConditionResult], bool]]
Function which gets the value of the check and returns object of List[ConditionResult] or boolean.
- paramsdict
Additional parameters to pass when calling the condition function.
- compute(context: Context) CheckResult [source]#
Compute final check result.
- conditions_decision(result: CheckResult) List[ConditionResult] [source]#
Run conditions on given result.
- config() CheckConfig [source]#
Return check configuration (conditions’ configuration not yet supported).
- Returns
- CheckConfig
includes the checks class name, params, and module name.
- static from_config(conf: CheckConfig) BaseCheck [source]#
Return check object from a CheckConfig object.
- Parameters
- confCheckConfig
the CheckConfig object
- Returns
- BaseCheck
the check class object from given config
- initialize_run(context: Context)[source]#
Initialize run before starting updating on batches. Optional.
- metadata(with_doc_link: bool = False) CheckMetadata [source]#
Return check metadata.
- Parameters
- with_doc_linkbool, default False
whethere to include doc link in summary or not
- Returns
- Dict[str, Any]
- params(show_defaults: bool = False) Dict [source]#
Return parameters to show when printing the check.
- remove_condition(index: int)[source]#
Remove given condition by index.
- Parameters
- indexint
index of condtion to remove
- run(model: Module, model_name: str = '', scorers: Optional[Mapping[str, Metric]] = None, scorers_per_class: Optional[Mapping[str, Metric]] = None, device: Optional[Union[str, device]] = None, random_state: int = 42, n_samples: Optional[int] = None, with_display: bool = True, train_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, test_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, train_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None, test_properties: Optional[Dict[int, Dict[PropertiesInputType, Dict[str, Any]]]] = None) CheckResult [source]#
Run check.
- Parameters
- model: nn.Module
pytorch neural network module instance
- model_name: str , default: ‘’
The name of the model
- scorersOptional[Mapping[str, Metric]] , default: None
dict of scorers names to a Metric
- scorers_per_classOptional[Mapping[str, Metric]] , default: None
dict of scorers for classification without averaging of the classes. See <a href= “https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel”> scikit-learn docs</a>
- deviceUnion[str, torch.device], default: ‘cpu’
processing unit for use
- random_stateint
A seed to set for pseudo-random functions
- n_samplesOptional[int], default: None
number of samples
- with_displaybool , default: True
flag that determines if checks will calculate display (redundant in some checks).
- train_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the train dataset (keys are the indexes).
- test_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the test dataset (keys are the indexes).
- class Suite[source]#
Tabular suite to run checks of types: TrainTestCheck, SingleDatasetCheck, ModelOnlyCheck.
Methods
add
(check)Add a check or a suite to current suite.
config
()Return suite configuration (checks' conditions' configuration not yet supported).
from_config
(conf)Return suite object from a CheckConfig object.
remove
(index)Remove a check by given index.
run
([train_dataset, test_dataset, model, ...])Run all checks.
Return tuple of supported check types of this suite.
- add(check: Union[BaseCheck, BaseSuite])[source]#
Add a check or a suite to current suite.
- Parameters
- checkBaseCheck
A check or suite to add.
- config() SuiteConfig [source]#
Return suite configuration (checks’ conditions’ configuration not yet supported).
- Returns
- SuiteConfig
includes the suite name, and list of check configs.
- static from_config(conf: SuiteConfig) BaseSuite [source]#
Return suite object from a CheckConfig object.
- Parameters
- confSuiteConfig
the SuiteConfig object
- Returns
- BaseSuite
the suite class object from given config
- remove(index: int)[source]#
Remove a check by given index.
- Parameters
- indexint
Index of check to remove.
- run(train_dataset: Optional[VisionData] = None, test_dataset: Optional[VisionData] = None, model: Optional[Module] = None, scorers: Optional[Mapping[str, Metric]] = None, scorers_per_class: Optional[Mapping[str, Metric]] = None, device: Optional[Union[str, device]] = None, random_state: int = 42, with_display: bool = True, n_samples: Optional[int] = None, train_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, test_predictions: Optional[Dict[int, Union[Sequence[Tensor], Tensor]]] = None, model_name: str = '') SuiteResult [source]#
Run all checks.
- Parameters
- train_dataset: Optional[VisionData] , default None
object, representing data an estimator was fitted on
- test_datasetOptional[VisionData] , default None
object, representing data an estimator predicts on
- modelnn.Module , default None
A scikit-learn-compatible fitted estimator instance
- model_name: str , default: ‘’
The name of the model
- scorersOptional[Mapping[str, Metric]] , default: None
dict of scorers names to a Metric
- scorers_per_classOptional[Mapping[str, Metric]] , default: None
dict of scorers for classification without averaging of the classes. See <a href= “https://scikit-learn.org/stable/modules/model_evaluation.html#from-binary-to-multiclass-and-multilabel”> scikit-learn docs</a>
- deviceUnion[str, torch.device], default: ‘cpu’
processing unit for use
- random_stateint
A seed to set for pseudo-random functions
- n_samplesOptional[int], default: None
number of samples
- with_displaybool , default: True
flag that determines if checks will calculate display (redundant in some checks).
- train_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the train dataset (keys are the indexes).
- test_predictions: Optional[Dict[int, Union[Sequence[torch.Tensor], torch.Tensor]]] , default None
Dictionary of the model prediction over the test dataset (keys are the indexes).
- Returns
- SuiteResult
All results by all initialized checks