【发布时间】:2020-05-21 16:00:05
【问题描述】:
我是 Prodigy 和 spaCy 以及 CLI 编码的新手。我想使用 Prodigy 为 NER 模型标记我的数据,然后在 python 中使用 spaCy 来创建模型。
Prodigy 以 SQLite 格式输出。 SpaCy 采用另一种格式,不知道该怎么称呼它:
TRAIN_DATA = [
(
"Horses are too tall and they pretend to care about your feelings",
{"entities": [(0, 6, LABEL)]},
),
("Do they bite?", {"entities": []}),
(
"horses are too tall and they pretend to care about your feelings",
{"entities": [(0, 6, LABEL)]},
),
("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}),
(
"they pretend to care about your feelings, those horses",
{"entities": [(48, 54, LABEL)]},
),
("horses?", {"entities": [(0, 6, LABEL)]}),
]
如何从一种转换为另一种?看起来这应该很容易,但我在任何地方都找不到。
我在数据集中加载没有问题,只是转换。
【问题讨论】:
标签: python sqlite nlp spacy named-entity-recognition