class MixedNulls[source]#

Search for various types of null values, including string representations of null.

null_string_listIterable[str] , default: None

List of strings to be considered alternative null representations

check_nanbool , default: True

Whether to add to null list to check also NaN values

columnsUnion[Hashable, List[Hashable]] , default: None

Columns to check, if none are given checks all columns except ignored ones.

ignore_columnsUnion[Hashable, List[Hashable]] , default: None

Columns to ignore, if none given checks based on columns variable

n_top_columnsint , optional

amount of columns to show ordered by feature importance (date, index, label are first)

aggregation_method: t.Optional[str], default: ‘max’

Argument for the reduce_output functionality, decides how to aggregate the vector of per-feature scores into a single aggregated score. The aggregated score value is between 0 and 1 for all methods. Possible values are: ‘l3_weighted’: Default. L3 norm over the ‘per-feature scores’ vector weighted by the feature importance, specifically, sum(FI * PER_FEATURE_SCORES^3)^(1/3). This method takes into account the feature importance yet puts more weight on the per-feature scores. This method is recommended for most cases. ‘l5_weighted’: Similar to ‘l3_weighted’, but with L5 norm. Puts even more emphasis on the per-feature scores and specifically on the largest per-feature scores returning a score closer to the maximum among the per-feature scores. ‘weighted’: Weighted mean of per-feature scores based on feature importance. ‘max’: Maximum of all the per-feature scores. None: No averaging. Return a dict with a per-feature score for each feature.

n_samplesint , default: 10_000_000

number of samples to use for this check.

random_stateint, default: 42

random seed for all check internals.

__init__(null_string_list: Optional[Iterable[str]] = None, check_nan: bool = True, columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, n_top_columns: int = 10, aggregation_method: Optional[str] = 'max', n_samples: int = 10000000, random_state: int = 42, **kwargs)[source]#
__new__(*args, **kwargs)#


MixedNulls.add_condition(name, ...)

Add new condition function to the check.


Add condition - require column's number of different null values to be less or equal to threshold.


Remove all conditions from this check instance.


Run conditions on given result.

MixedNulls.config([include_version, ...])

Return check configuration (conditions' configuration not yet supported).


Return an aggregated drift score based on aggregation method defined.

MixedNulls.from_config(conf[, version_unmatch])

Return check object from a CheckConfig object.

MixedNulls.from_json(conf[, version_unmatch])

Deserialize check instance from JSON string.


Return True if the check reduce_output is better when it is greater.


Return check metadata.


Name of class in split camel case.


Return parameters to show when printing the check.


Return an aggregated drift score based on aggregation method defined.


Remove given condition by index.

MixedNulls.run(dataset[, model, ...])

Run check.

MixedNulls.run_logic(context, dataset_kind)

Run check.

MixedNulls.to_json([indent, ...])

Serialize check instance to JSON string.