train_test_validation#

train_test_validation(label_properties: Optional[List[Dict[str, Any]]] = None, image_properties: Optional[List[Dict[str, Any]]] = None, **kwargs) → Suite[source]#

Suite for validating correctness of train-test split, including distribution, integrity and leakage checks.

List of Checks:

List of Checks#
Check Example	API Reference
New Labels	`NewLabels`
Heatmap Comparison	`HeatmapComparison`
Label Drift	`LabelDrift`
Image Property Drift	`ImagePropertyDrift`
Image Dataset Drift	`ImageDatasetDrift`
Property Label Correlation Change	`PropertyLabelCorrelationChange`

Parameters

label_propertiesList[Dict[str, Any]], default: None

List of properties. Replaces the default deepchecks properties. Each property is a dictionary with keys 'name' (str), method (Callable) and 'output_type' (str), representing attributes of said method. ‘output_type’ must be one of:

'numerical' - for continuous ordinal outputs.
'categorical' - for discrete, non-ordinal outputs. These can still be numbers, but these numbers do not have inherent value.
'class_id' - for properties that return the class_id. This is used because these properties are later matched with the VisionData.label_map, if one was given.

For more on image / label properties, see the guide about Vision Properties.

image_propertiesList[Dict[str, Any]], default: None

'numerical' - for continuous ordinal outputs.
'categorical' - for discrete, non-ordinal outputs. These can still be numbers, but these numbers do not have inherent value.

For more on image / label properties, see the guide about Vision Properties.

**kwargsdict

additional arguments to pass to the checks.

Returns

Suite: A Suite for validating correctness of train-test split, including distribution, integrity and leakage checks.

See also

Image Classification Tutorial
Object Detection Tutorial
Semantic Segmentation Tutorial

Examples

>>> from deepchecks.vision.suites import train_test_validation
>>> suite = train_test_validation()
>>> train_data, test_data = ...
>>> result = suite.run(train_data, test_data, max_samples=800)
>>> result.show()

run(self, train_dataset: Optional[VisionData] = None, test_dataset: Optional[VisionData] = None, random_state: int = 42, with_display: bool = True, max_samples: Optional[int] = None, run_single_dataset: Optional[str] = None) → SuiteResult#

Run all checks.

Parameters

train_datasetOptional[VisionData] , default: None: VisionData object, representing data the model was fitted on
test_datasetOptional[VisionData] , default: None: VisionData object, representing data the models predicts on
random_stateint: A seed to set for pseudo-random functions
with_displaybool , default: True: flag that determines if checks will calculate display (redundant in some checks).
max_samplesOptional[int] , default: None: Each check will run on a number of samples which is the minimum between the n_samples parameter of the check and this parameter. If this argument is None then the number of samples for each check will be determined by the n_samples argument.
run_single_dataset: Optional[str], default None: ‘Train’, ‘Test’ , or None to run on both train and test.

Returns

SuiteResult: All results by all initialized checks

data_integrity

model_evaluation