load_data#
- load_data(data_format: str = 'Dataset', as_train_test: bool = True) Union[Tuple, Dataset, DataFrame] [source]#
Load and returns the Avocado dataset (regression).
The avocado dataset contains historical data on avocado prices and sales volume in multiple US markets https://www.kaggle.com/neuromusic/avocado-prices.
This dataset is licensed under the Open Data Commons Open Database License (ODbL) v1.0 (https://opendatacommons.org/licenses/odbl/1-0/).
The typical ML task in this dataset is to build a model that predicts the average price of Avocados.
- Dataset Shape:
Dataset Shape# Property
Value
Samples Total
18.2K
Dimensionality
14
Features
real, string
Targets
real 0.44 - 3.25
- Description:
Dataset Description# Column name
Column Role
Description
Date
Datetime
The date of the observation
Total Volume
Feature
Total number of avocados sold
4046
Feature
Total number of avocados with PLU 4046 (small avocados) sold
4225
Feature
Total number of avocados with PLU 4225 (large avocados) sold
4770
Feature
Total number of avocados with PLU 4770 (xlarge avocados) sold
Total Bags
Feature
Small Bags
Feature
Large Bags
Feature
XLarge Bags
Feature
type
Feature
Conventional or organic
year
Feature
region
Feature
The city or region of the observation
AveragePrice
Label
The average price of a single avocado
- Parameters
- data_formatstr , default: Dataset
Represent the format of the returned value. Can be ‘Dataset’|’Dataframe’ ‘Dataset’ will return the data as a Dataset object ‘Dataframe’ will return the data as a pandas Dataframe object
- as_train_testbool , default: True
If True, the returned data is splitted into train and test exactly like the toy model was trained. The first return value is the train data and the second is the test data. In order to get this model, call the load_fitted_model() function. Otherwise, returns a single object.
- Returns
- datasetUnion[deepchecks.Dataset, pd.DataFrame]
the data object, corresponding to the data_format attribute.
- train_data, test_dataTuple[Union[deepchecks.Dataset, pd.DataFrame],Union[deepchecks.Dataset, pd.DataFrame]
tuple if as_train_test = True. Tuple of two objects represents the dataset splitted to train and test sets.