load_data#

load_data(data_format: str = 'TextData', include_properties: bool = True, include_embeddings: bool = False) → Tuple[Union[TextData, DataFrame], Union[TextData, DataFrame]][source]#

Load and returns the SCIERC Abstract NER dataset (token classification).

Parameters

data_formatstr, default: ‘TextData’: Represent the format of the returned value. Can be ‘TextData’|’Dict’ ‘TextData’ will return the data as a TextData object ‘Dict’ will return the data as a dict of tokenized texts and IOB NER labels
include_propertiesbool, default: True: If True, the returned data will include properties of the comments. Incompatible with data_format=’DataFrame’
include_embeddingsbool, default: False: If True, the returned data will include embeddings of the comments. Incompatible with data_format=’DataFrame’

Returns

train, testTuple[Union[TextData, Dict]: Tuple of two objects represents the dataset split to train and test sets.

scierc_ner

metric_utils