load_data#

load_data(data_format: str = 'TextData', include_properties: bool = True, include_embeddings: bool = False) Tuple[Union[TextData, DataFrame], Union[TextData, DataFrame]][source]#

Load and returns the SCIERC Abstract NER dataset (token classification).

Parameters
data_formatstr, default: ‘TextData’

Represent the format of the returned value. Can be ‘TextData’|’Dict’ ‘TextData’ will return the data as a TextData object ‘Dict’ will return the data as a dict of tokenized texts and IOB NER labels

include_propertiesbool, default: True

If True, the returned data will include properties of the comments. Incompatible with data_format=’DataFrame’

include_embeddingsbool, default: False

If True, the returned data will include embeddings of the comments. Incompatible with data_format=’DataFrame’

Returns
train, testTuple[Union[TextData, Dict]

Tuple of two objects represents the dataset split to train and test sets.