.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "checks_gallery/tabular/methodology/plot_date_train_test_leakage_duplicates.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_checks_gallery_tabular_methodology_plot_date_train_test_leakage_duplicates.py: Date Train Validation Leakage Duplicates **************************************** .. GENERATED FROM PYTHON SOURCE LINES 8-22 .. code-block:: default from datetime import datetime import pandas as pd from deepchecks.tabular import Dataset, Suite from deepchecks.tabular.checks.methodology import \ DateTrainTestLeakageDuplicates def dataset_from_dict(d: dict, datetime_name: str = None) -> Dataset: dataframe = pd.DataFrame(data=d) return Dataset(dataframe, datetime_name=datetime_name) .. GENERATED FROM PYTHON SOURCE LINES 23-25 Synthetic example with date leakage =================================== .. GENERATED FROM PYTHON SOURCE LINES 25-58 .. code-block:: default train_ds = dataset_from_dict({'col1': [ datetime(2021, 10, 1, 0, 0), datetime(2021, 10, 1, 0, 0), datetime(2021, 10, 1, 0, 0), datetime(2021, 10, 2, 0, 0), datetime(2021, 10, 2, 0, 0), datetime(2021, 10, 2, 0, 0), datetime(2021, 10, 3, 0, 0), datetime(2021, 10, 3, 0, 0), datetime(2021, 10, 3, 0, 0), datetime(2021, 10, 4, 0, 0), datetime(2021, 10, 4, 0, 0), datetime(2021, 10, 4, 0, 0), datetime(2021, 10, 5, 0, 0), datetime(2021, 10, 5, 0, 0) ]}, 'col1') test_ds = dataset_from_dict({'col1': [ datetime(2021, 9, 4, 0, 0), datetime(2021, 10, 4, 0, 0), datetime(2021, 10, 5, 0, 0), datetime(2021, 10, 6, 0, 0), datetime(2021, 10, 6, 0, 0), datetime(2021, 10, 7, 0, 0), datetime(2021, 10, 7, 0, 0), datetime(2021, 10, 8, 0, 0), datetime(2021, 10, 8, 0, 0), datetime(2021, 10, 9, 0, 0), datetime(2021, 10, 9, 0, 0) ]}, 'col1') DateTrainTestLeakageDuplicates(n_to_show=3).run(train_dataset=train_ds, test_dataset=test_ds) .. raw:: html

Date Train-Test Leakage (duplicates)

Check if test dates are present in train data.

Additional Outputs
18.18% of test data dates appear in training data
  0
Sample of test dates in train: ['2021/10/05 00:00:00.000000 ', '2021/10/04 00:00:00.000000 ']


.. GENERATED FROM PYTHON SOURCE LINES 59-61 Synthetic example no date leakage ================================= .. GENERATED FROM PYTHON SOURCE LINES 61-80 .. code-block:: default train_ds = dataset_from_dict({'col1': [ datetime(2021, 10, 3, 0, 0), datetime(2021, 10, 3, 0, 0), datetime(2021, 10, 4, 0, 0), datetime(2021, 10, 4, 0, 0), datetime(2021, 10, 4, 0, 0), datetime(2021, 10, 5, 0, 0), datetime(2021, 10, 5, 0, 0) ]}, 'col1') test_ds = dataset_from_dict({'col1': [ datetime(2021, 11, 4, 0, 0), datetime(2021, 11, 4, 0, 0), datetime(2021, 11, 5, 0, 0), datetime(2021, 11, 6, 0, 0), ]}, 'col1') DateTrainTestLeakageDuplicates().run(train_dataset=train_ds, test_dataset=test_ds) .. raw:: html

Date Train-Test Leakage (duplicates)

Check if test dates are present in train data.

Additional Outputs

Nothing to display



.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.013 seconds) .. _sphx_glr_download_checks_gallery_tabular_methodology_plot_date_train_test_leakage_duplicates.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_date_train_test_leakage_duplicates.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_date_train_test_leakage_duplicates.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_