.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/integrity/plot_string_length_out_of_bounds.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_integrity_plot_string_length_out_of_bounds.py: String Length Out Of Bounds *************************** .. GENERATED FROM PYTHON SOURCE LINES 8-14 .. code-block:: default import pandas as pd from deepchecks.tabular.checks.integrity.string_length_out_of_bounds import \ StringLengthOutOfBounds .. GENERATED FROM PYTHON SOURCE LINES 15-28 .. code-block:: default col1 = ["aaaaa33", "aaaaaaa33"]*40 col1.append("a") col1.append("aaaaaadsfasdfasdf") col2 = ["b", "abc"]*41 col3 = ["a"]*80 col3.append("a"*100) col3.append("a"*200) # col1 and col3 contrains outliers, col2 does not df = pd.DataFrame({"col1":col1, "col2": col2, "col3": col3 }) .. GENERATED FROM PYTHON SOURCE LINES 29-32 .. code-block:: default StringLengthOutOfBounds(min_unique_value_ratio=0.01).run(df) .. raw:: html

String Length Out Of Bounds

Detect strings with length that is much longer/shorter than the identified "normal" string lengths.

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
      Number of Outlier Samples Example Samples
Column Name Range of Detected Normal String Lengths Range of Detected Outlier String Lengths    
col1 7 - 9 1 - 1 1 ['a']
17 - 17 1 ['aaaaaadsfasdfasdf']
col3 1 - 1 100 - 200 2 ['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...']


.. GENERATED FROM PYTHON SOURCE LINES 33-39 .. code-block:: default col = ["a","a","a","a","a","a","a","a","a","a","a","a","a","ab","ab","ab","ab","ab","ab", "ab"]*1000 col.append("basdbadsbaaaaaaaaaa") col.append("basdbadsbaaaaaaaaaaa") df = pd.DataFrame({"col1":col}) StringLengthOutOfBounds(num_percentiles=1000, min_unique_values=3).run(df) .. raw:: html

String Length Out Of Bounds

Detect strings with length that is much longer/shorter than the identified "normal" string lengths.

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
      Number of Outlier Samples Example Samples
Column Name Range of Detected Normal String Lengths Range of Detected Outlier String Lengths    
col1 1 - 2 19 - 20 2 ['basdbadsbaaaaaaaaaa', 'basdbadsbaaaaaaaaaaa']


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.187 seconds) .. _sphx_glr_download_checks_gallery_tabular_integrity_plot_string_length_out_of_bounds.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_string_length_out_of_bounds.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_string_length_out_of_bounds.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_