calculate_builtin_properties#

calculate_builtin_properties(raw_text: Sequence[str], include_properties: Optional[List[str]] = None, ignore_properties: Optional[List[str]] = None, include_long_calculation_properties: bool = False, ignore_non_english_samples_for_english_properties: bool = True, device: Optional[str] = None, models_storage: Optional[Union[Path, str]] = None, batch_size: Optional[int] = 16, cache_models: bool = False, quantize_models: bool = True) → Tuple[Dict[str, List[float]], Dict[str, str]][source]#

Calculate properties on provided text samples.

Parameters

raw_textSequence[str]: The text to calculate the properties for.
include_propertiesList[str], default None: The properties to calculate. If None, all default properties will be calculated. Cannot be used together with ignore_properties parameter. Available properties are: [‘Text Length’, ‘Average Word Length’, ‘Max Word Length’, ‘% Special Characters’, ‘% Punctuation’, ‘Language’, ‘Sentiment’, ‘Subjectivity’, ‘Toxicity’, ‘Fluency’, ‘Formality’, ‘Lexical Density’, ‘Unique Noun Count’, ‘Reading Ease’, ‘Average Words Per Sentence’, ‘URLs Count’, Unique URLs Count’, ‘Email Address Count’, ‘Unique Email Address Count’, ‘Unique Syllables Count’, ‘Reading Time’, ‘Sentences Count’, ‘Average Syllable Length’] List of default properties are: [‘Text Length’, ‘Average Word Length’, ‘Max Word Length’, ‘% Special Characters’, ‘% Punctuation’, ‘Language’, ‘Sentiment’, ‘Subjectivity’, ‘Toxicity’, ‘Fluency’, ‘Formality’, ‘Lexical Density’, ‘Unique Noun Count’, ‘Reading Ease’, ‘Average Words Per Sentence’] To calculate all the default properties, the include_properties and ignore_properties parameters should be None. If you pass either include_properties or ignore_properties then only the properties specified in the list will be calculated or ignored. Note that the properties [‘Toxicity’, ‘Fluency’, ‘Formality’, ‘Language’, ‘Unique Noun Count’] may take a long time to calculate. If include_long_calculation_properties is False, these properties will be ignored, even if they are in the include_properties parameter.
ignore_propertiesList[str], default None: The properties to ignore from the list of default properties. If None, no properties will be ignored and all the default properties will be calculated. Cannot be used together with include_properties parameter.
include_long_calculation_propertiesbool, default False: Whether to include properties that may take a long time to calculate. If False, these properties will be ignored, unless they are specified in the include_properties parameter explicitly.
ignore_non_english_samples_for_english_propertiesbool, default True: Whether to ignore samples that are not in English when calculating English properties. If False, samples that are not in English will be calculated as well. This parameter is ignored when calculating non-English properties. English-Only properties WILL NOT work properly on non-English samples, and this parameter should be used only when you are sure that all the samples are in English.
deviceint, default None: The device to use for the calculation. If None, the default device will be used.
models_storageUnion[str, pathlib.Path, None], default None: A directory to store the models. If not provided, models will be stored in DEEPCHECKS_LIB_PATH/nlp/.nlp-models. Also, if a folder already contains relevant resources they are not re-downloaded.
batch_sizeint, default 8: The batch size.
cache_modelsbool, default False: If True, will store the models in CPU RAM memory. This will speed up the calculation, but will take up more memory. If device is not CPU, the models will be moved from CPU RAM memory to relevant device before calculation.
quantize_modelsbool, default True: If True, will quantize the models to reduce their size and speed up the calculation. Requires the accelerate and bitsandbytes libraries to be installed as well as the availability of GPU.

Returns

Dict[str, List[float]]: A dictionary with the property name as key and a list of the property values for each text as value.
Dict[str, str]: A dictionary with the property name as key and the property’s type as value.

utils

calculate_builtin_embeddings