Dataset.train_test_split#

Dataset.train_test_split(train_size: Optional[Union[int, float]] = None, test_size: Union[int, float] = 0.25, random_state: int = 42, shuffle: bool = True, stratify: Union[List, Series, ndarray, bool] = False) → Tuple[TDataset, TDataset][source]#

Split dataset into random train and test datasets.

Parameters

train_sizet.Union[int, float, None] , default: None: If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
test_sizet.Union[int, float] , default: 0.25: If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
random_stateint , default: 42: The random state to use for shuffling.
shufflebool , default: True: Whether or not to shuffle the data before splitting.
stratifyt.Union[t.List, pd.Series, np.ndarray, bool] , default: False: If True, data is split in a stratified fashion, using the class labels. If array-like, data is split in a stratified fashion, using this as class labels.
Returns
——-
Dataset: Dataset containing train split data.
Dataset: Dataset containing test split data.

Dataset.select

Context