New Labels#

This notebooks provides an overview for using and understanding the New Labels check.

Structure:

How the check works#

In this check we count the frequency of each class id in the test set then check which of them do not appear in the training set. Note that by default this check run on a sample of the data set and so it is possible that class ids that are rare in the train set will also be considered as new labels in the test set.

Run the Check#

Note

In this example, we use the pytorch version of the coco dataset and model. In order to run this example using tensorflow, please change the import statements to:

from deepchecks.vision.datasets.detection import coco_tensorflow as coco
from deepchecks.vision.datasets.detection import coco_torch as coco
from deepchecks.vision.checks import NewLabels

coco_train = coco.load_dataset(train=True, object_type='VisionData', shuffle=False)
coco_test = coco.load_dataset(train=False, object_type='VisionData', shuffle=False)

result = NewLabels().run(coco_train, coco_test)
result
Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...

  0%|          | 0.00/14.1M [00:00<?, ?B/s]
100%|██████████| 14.1M/14.1M [00:00<00:00, 269MB/s]

You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.

Processing Train Batches:
|     | 0/1 [Time: 00:00]
Processing Train Batches:
|█████| 1/1 [Time: 00:00]
Processing Train Batches:
|█████| 1/1 [Time: 00:00]

Processing Test Batches:
|     | 0/1 [Time: 00:00]

Processing Test Batches:
|█████| 1/1 [Time: 00:00]

Processing Test Batches:
|█████| 1/1 [Time: 00:00]


Computing Check:
|     | 0/1 [Time: 00:00]


Computing Check:
|█████| 1/1 [Time: 00:00]


Computing Check:
|█████| 1/1 [Time: 00:00]
New Labels


To display the results in an IDE like PyCharm, you can use the following code:

#  result.show_in_window()

The result will be displayed in a new window.

Observe the check’s output#

The check searches for new labels in the test set. The value output is a dictionary containing of appearances of each newly found class_id in addition to the total number of labels in the test set for comparison purposes.

result.value
{'new_labels': {'donut': 14, 'tennis racket': 7, 'boat': 6, 'cat': 4, 'laptop': 3, 'mouse': 2, 'toilet': 2, 'bear': 1}, 'all_labels_count': 387}

Define a condition#

The check has a default condition which can be defined. The condition verifies that the ratio of new labels out of the total number of labels in the test set is smaller than a given threshold. If the check is run with the default sampling mechanism we recommend on setting the condition threshold to a small percentage instead of setting it to 0.

check = NewLabels().add_condition_new_label_ratio_less_or_equal(0.05)
check.run(coco_train, coco_test)
Processing Train Batches:
|     | 0/1 [Time: 00:00]
Processing Train Batches:
|█████| 1/1 [Time: 00:00]
Processing Train Batches:
|█████| 1/1 [Time: 00:00]

Processing Test Batches:
|     | 0/1 [Time: 00:00]

Processing Test Batches:
|█████| 1/1 [Time: 00:00]

Processing Test Batches:
|█████| 1/1 [Time: 00:00]


Computing Check:
|     | 0/1 [Time: 00:00]


Computing Check:
|█████| 1/1 [Time: 00:00]


Computing Check:
|█████| 1/1 [Time: 00:00]
New Labels


In this case the condition identified that a major portion of the test set labels do not appear in the training set.

Total running time of the script: (0 minutes 2.566 seconds)

Gallery generated by Sphinx-Gallery