.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "checks_gallery/tabular/distribution/plot_train_test_prediction_drift.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_checks_gallery_tabular_distribution_plot_train_test_prediction_drift.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_checks_gallery_tabular_distribution_plot_train_test_prediction_drift.py:


Train Test Prediction Drift
***************************

This notebooks provides an overview for using and understanding the tabular prediction drift check.

**Structure:**

* `What is prediction drift? <#what-is-prediction-drift>`__
* `Generate Data <#generate-data>`__
* `Build Model <#build-model>`__
* `Run check <#run-check>`__

What Is Prediction Drift?
===========================
The term drift (and all it's derivatives) is used to describe any change in the data compared
to the data the model was trained on. Prediction drift refers to the case in which a change
in the data (data/feature drift) has happened and as a result, the distribution of the
models' prediction has changed.

Calculating prediction drift is especially useful in cases
in which labels are not available for the test dataset, and so a drift in the predictions
is our only indication that a changed has happened in the data that actually affects model
predictions. If labels are available, it's also recommended to run the `Label Drift Check
</examples/tabular/checks/distribution/examples/plot_train_test_label_drift.html>`__.

There are two main causes for prediction drift:

* A change in the sample population. In this case, the underline phenomenon we're trying
  to predict behaves the same, but we're not getting the same types of samples. For example,
  Iris Virginica stops growing and is not being predicted by the model trained to classify Iris species.
* Concept drift, which means that the underline relation between the data and
  the label has changed.
  For example, we're trying to predict income based on food spending, but ongoing inflation effect prices.
  It's important to note that concept drift won't necessarily result in prediction drift, unless it affects features that
  are of high importance to the model.

How Does the TrainTestPredictionDrift Check Work?
=================================================
There are many methods to detect drift, that usually include statistical methods
that aim to measure difference between 2 distributions.
We experimented with various approaches and found that for detecting drift between 2
one-dimensional distributions, the following 2 methods give the best results:

* For regression problems, the `Population Stability Index (PSI) <https://www.lexjansen.com/wuss/2017/47_Final_Paper_PDF.pdf>`__
* For classification problems, the `Wasserstein Distance (Earth Mover's Distance) <https://en.wikipedia.org/wiki/Wasserstein_metric>`__

.. GENERATED FROM PYTHON SOURCE LINES 52-59

.. code-block:: default


    from sklearn.preprocessing import LabelEncoder

    from deepchecks.tabular.checks import TrainTestPredictionDrift
    from deepchecks.tabular.datasets.classification import adult


.. GENERATED FROM PYTHON SOURCE LINES 60-62

Generate data
=============

.. GENERATED FROM PYTHON SOURCE LINES 62-69

.. code-block:: default


    label_name = 'income'
    train_ds, test_ds = adult.load_data()
    encoder = LabelEncoder()
    train_ds.data[label_name] = encoder.fit_transform(train_ds.data[label_name])
    test_ds.data[label_name] = encoder.transform(test_ds.data[label_name])


.. GENERATED FROM PYTHON SOURCE LINES 70-71

Introducing drift:

.. GENERATED FROM PYTHON SOURCE LINES 71-76

.. code-block:: default


    test_ds.data['education-num'] = 13
    test_ds.data['education'] = ' Bachelors'


.. GENERATED FROM PYTHON SOURCE LINES 77-79

Build Model
===========

.. GENERATED FROM PYTHON SOURCE LINES 79-87

.. code-block:: default


    from sklearn.compose import ColumnTransformer
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.impute import SimpleImputer
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import OrdinalEncoder


.. GENERATED FROM PYTHON SOURCE LINES 88-106

.. code-block:: default


    numeric_transformer = SimpleImputer()
    categorical_transformer = Pipeline(
        steps=[("imputer", SimpleImputer(strategy="most_frequent")), ("encoder", OrdinalEncoder())]
    )

    train_ds.features
    preprocessor = ColumnTransformer(
        transformers=[
            ("num", numeric_transformer, train_ds.numerical_features),
            ("cat", categorical_transformer, train_ds.cat_features),
        ]
    )

    model = Pipeline(steps=[("preprocessing", preprocessor), ("model", RandomForestClassifier(max_depth=5, n_jobs=-1))])
    model = model.fit(train_ds.data[train_ds.features], train_ds.data[train_ds.label_name])


.. GENERATED FROM PYTHON SOURCE LINES 107-109

Run check
=========

.. GENERATED FROM PYTHON SOURCE LINES 109-113

.. code-block:: default


    check = TrainTestPredictionDrift()
    result = check.run(train_dataset=train_ds, test_dataset=test_ds, model=model)
    result


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">

    <script
        src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js"
        integrity="sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg=="
        crossorigin="anonymous"
        referrerpolicy="no-referrer">
    </script>

    <script type="text/javascript">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
    <script type="text/javascript">if (window.MathJax) {MathJax.Hub.Config({SVG: {font: "STIX-Web"}});}</script>
    <script type="text/javascript">
        if (typeof require !== 'undefined') {
            require.undef("plotly");
            requirejs.config({
                paths: {'plotly': ['https://cdn.plot.ly/plotly-2.8.3.min']}
            });
            require(
                ['plotly'],
                function(Plotly) {
                    window._Plotly = Plotly;
                    window.Plotly = Plotly;
                    console.log('Loaded plotly successfully');
                },
                function() {console.log('Failed to load plotly')}
            );
        } else {
            console.log('requirejs is not present');
        }
    </script>
    <h4><b>Train Test Prediction Drift</b></h4><p>    Calculate prediction drift between train dataset and test dataset, using statistical measures.</p><h5><b>Additional Outputs</b></h5><div><span>
                The Drift score is a measure for the difference between two distributions, in this check - the test
                and train distributions.<br> The check shows the drift score and distributions for the predictions.
            </span></div><div>                            <div id="3c1c80a2-bf31-47a6-b12a-fa99e35d5f24" class="plotly-graph-div" style="height:400px; width:700px;"></div>            <script type="text/javascript">                require(["plotly"], function(Plotly) {                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById("3c1c80a2-bf31-47a6-b12a-fa99e35d5f24")) {                    Plotly.newPlot(                        "3c1c80a2-bf31-47a6-b12a-fa99e35d5f24",                        [{"base":0,"marker":{"color":"#01B8AA"},"offsetgroup":"0","orientation":"h","showlegend":false,"x":[0.1],"y":["Drift Score"],"type":"bar","xaxis":"x","yaxis":"y"},{"base":0.1,"marker":{"color":"#F2C80F"},"offsetgroup":"0","orientation":"h","showlegend":false,"x":[0.1],"y":["Drift Score"],"type":"bar","xaxis":"x","yaxis":"y"},{"base":0.2,"marker":{"color":"#FE9666"},"offsetgroup":"0","orientation":"h","showlegend":false,"x":[0.06142231522093017],"y":["Drift Score"],"type":"bar","xaxis":"x","yaxis":"y"},{"marker":{"color":"#00008b"},"name":"Train Dataset","x":[0,1],"y":[0.8498817603881945,0.15011823961180554],"type":"bar","xaxis":"x2","yaxis":"y2"},{"marker":{"color":"#69b3a2"},"name":"Test Dataset","x":[0,1],"y":[0.6312880044223328,0.3687119955776672],"type":"bar","xaxis":"x2","yaxis":"y2"}],                        {"template":{"data":{"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"heatmapgl":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmapgl"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"geo":{"bgcolor":"white","lakecolor":"white","landcolor":"#E5ECF6","showlakes":true,"showland":true,"subunitcolor":"white"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"light"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"bgcolor":"#E5ECF6","radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","gridwidth":2,"linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white"}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"ternary":{"aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"bgcolor":"#E5ECF6","caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"title":{"x":0.05},"xaxis":{"automargin":true,"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","zerolinewidth":2}}},"xaxis":{"anchor":"y","domain":[0.0,1.0],"showgrid":false,"gridcolor":"black","linecolor":"black","range":[0,0.4],"dtick":0.05,"fixedrange":true},"yaxis":{"anchor":"x","domain":[0.9200000000000002,1.0],"showgrid":false,"showline":false,"showticklabels":false,"zeroline":false,"color":"black","fixedrange":true},"xaxis2":{"anchor":"y2","domain":[0.0,1.0],"type":"category"},"yaxis2":{"anchor":"x2","domain":[0.0,0.7200000000000001],"fixedrange":true,"range":[0,1],"title":{"text":"Frequency"}},"annotations":[{"font":{"size":16},"showarrow":false,"text":"Drift Score (PSI)","x":0.5,"xanchor":"center","xref":"paper","y":1.0000000000000002,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Distribution Plot","x":0.5,"xanchor":"center","xref":"paper","y":0.7200000000000001,"yanchor":"bottom","yref":"paper"}],"legend":{"title":{"text":"Legend"},"yanchor":"top","y":0.6},"title":{"text":"model predictions","x":0.5,"xanchor":"center"},"width":700,"height":400,"bargroupgap":0},                        {"responsive": true}                    ).then(function(){
                            
    var gd = document.getElementById('3c1c80a2-bf31-47a6-b12a-fa99e35d5f24');
    var x = new MutationObserver(function (mutations, observer) {{
            var display = window.getComputedStyle(gd).display;
            if (!display || display === 'none') {{
                console.log([gd, 'removed!']);
                Plotly.purge(gd);
                observer.disconnect();
            }}
    }});

    // Listen for the removal of the full notebook cells
    var notebookContainer = gd.closest('#notebook-container');
    if (notebookContainer) {{
        x.observe(notebookContainer, {childList: true});
    }}

    // Listen for the clearing of the current output cell
    var outputEl = gd.closest('.output');
    if (outputEl) {{
        x.observe(outputEl, {childList: true});
    }}

                            })                };                });            </script>        </div>
    </div>
    <br />
    <br />


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  4.451 seconds)


.. _sphx_glr_download_checks_gallery_tabular_distribution_plot_train_test_prediction_drift.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_train_test_prediction_drift.py <plot_train_test_prediction_drift.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_train_test_prediction_drift.ipynb <plot_train_test_prediction_drift.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_