class StringLengthOutOfBounds[source]#

Detect strings with length that is much longer/shorter than the identified “normal” string lengths.

columnsUnion[Hashable, List[Hashable]] , default: None

Columns to check, if none are given checks all columns except ignored ones.

ignore_columnsUnion[Hashable, List[Hashable]] , default: None

Columns to ignore, if none given checks based on columns variable

num_percentilesint , default: 1000

Number of percentiles values to retrieve for the length of the samples in the string column. Affects the resolution of string lengths that is used to detect outliers.

inner_quantile_rangeint , default: 94

The int upper percentile [0-100] defining the inner percentile range. E.g. for 98 the range would be 2%-98%.

outlier_factorint , default: 4

Strings would be defined as outliers if their length is outlier_factor times more/less than the values inside the inner quantile range.

min_length_differenceint , default: 5

The minimum length difference to be considered as outlier.

min_length_ratio_differenceint , default: 0.5

Used to calculate the minimum length difference to be considered as outlier. (calculated form this times the average of the normal lengths.)

min_unique_value_ratiofloat , default: 0.01


min_unique_valuesint , default: 100

Minimum unique values in column to calculate string length outlier

n_top_columnsint , optional

amount of columns to show ordered by feature importance (date, index, label are first)

outlier_length_to_showint , default: 50

Maximum length of outlier to show in results. If an outlier is longer it is trimmed and added ‘…’

samples_per_range_to_showint , default: 3

Number of outlier samples to show in results per outlier range found.

n_samplesint , default: 10_000_000

number of samples to use for this check.

random_stateint, default: 42

random seed for all check internals.

__init__(columns: Optional[Union[Hashable, List[Hashable]]] = None, ignore_columns: Optional[Union[Hashable, List[Hashable]]] = None, num_percentiles: int = 1000, inner_quantile_range: int = 94, outlier_factor: int = 4, min_length_difference: int = 5, min_length_ratio_difference: float = 0.5, min_unique_value_ratio: float = 0.01, min_unique_values: int = 100, n_top_columns: int = 10, outlier_length_to_show: int = 50, samples_per_range_to_show: int = 3, n_samples: int = 10000000, random_state: int = 42, **kwargs)[source]#
__new__(*args, **kwargs)#


StringLengthOutOfBounds.add_condition(name, ...)

Add new condition function to the check.


Add condition - require column's number of string length outliers to be less or equal to the threshold.


Add condition - require column's ratio of string length outliers to be less or equal to threshold.


Remove all conditions from this check instance.


Run conditions on given result.


Return check configuration (conditions' configuration not yet supported).

StringLengthOutOfBounds.from_config(conf[, ...])

Return check object from a CheckConfig object.

StringLengthOutOfBounds.from_json(conf[, ...])

Deserialize check instance from JSON string.


Return check metadata.

Name of class in split camel case.


Return parameters to show when printing the check.


Remove given condition by index.[, ...])

Run check.

StringLengthOutOfBounds.run_logic(context, ...)

Run check.


Serialize check instance to JSON string.