load_data#

load_data(data_format: str = 'Dataset', as_train_test: bool = True) Union[Tuple, Dataset, DataFrame][source]#

Load and returns the Avocado dataset (regression).

The avocado dataset contains historical data on avocado prices and sales volume in multiple US markets https://www.kaggle.com/neuromusic/avocado-prices.

This dataset is licensed under the Open Data Commons Open Database License (ODbL) v1.0 (https://opendatacommons.org/licenses/odbl/1-0/).

The typical ML task in this dataset is to build a model that predicts the average price of Avocados.

Dataset Shape:
Dataset Shape#

Property

Value

Samples Total

18.2K

Dimensionality

14

Features

real, string

Targets

real 0.44 - 3.25

Description:
Dataset Description#

Column name

Column Role

Description

Date

Datetime

The date of the observation

Total Volume

Feature

Total number of avocados sold

4046

Feature

Total number of avocados with PLU 4046 (small avocados) sold

4225

Feature

Total number of avocados with PLU 4225 (large avocados) sold

4770

Feature

Total number of avocados with PLU 4770 (xlarge avocados) sold

Total Bags

Feature

Small Bags

Feature

Large Bags

Feature

XLarge Bags

Feature

type

Feature

Conventional or organic

year

Feature

region

Feature

The city or region of the observation

AveragePrice

Label

The average price of a single avocado

Parameters
data_formatstr , default: Dataset

Represent the format of the returned value. Can be ‘Dataset’|’Dataframe’ ‘Dataset’ will return the data as a Dataset object ‘Dataframe’ will return the data as a pandas Dataframe object

as_train_testbool , default: True

If True, the returned data is splitted into train and test exactly like the toy model was trained. The first return value is the train data and the second is the test data. In order to get this model, call the load_fitted_model() function. Otherwise, returns a single object.

Returns
datasetUnion[deepchecks.Dataset, pd.DataFrame]

the data object, corresponding to the data_format attribute.

train_data, test_dataTuple[Union[deepchecks.Dataset, pd.DataFrame],Union[deepchecks.Dataset, pd.DataFrame]

tuple if as_train_test = True. Tuple of two objects represents the dataset splitted to train and test sets.