【发布时间】:2020-01-23 11:05:38
【问题描述】:
我的数据库中有多个类似下面的句子:
KP Snacks Ltd 召回了 McCoy 的 4 种变体的某些日期代码 多袋薯片。 KP Snacks Ltd 已进行预防性召回 下面列出的产品中这些袋子的数量非常少 薯片可能含有小块塑料。
我应该先拆分句子还是只将整个数据(2 个句子)放到模型中?
TRAIN_DATA_1 = [
("KP Snacks Ltd recalls certain date codes of 4 variants of McCoy’s multi bag crisps. KP Snacks Ltd has undertaken a precautionary recall of the products listed below as a very small number of these bags of crisps may contain small pieces of plastic.", {"entities": []}),
("I like London and Berlin.", {"entities": []}),
]
TRAIN_DATA_2 = [
("KP Snacks Ltd recalls certain date codes of 4 variants of McCoy’s multi bag crisps.", {"entities": []}),
("KP Snacks Ltd has undertaken a precautionary recall of the products listed below as a very small number of these bags of crisps may contain small pieces of plastic.", {"entities": []}),
("I like London and Berlin.", {"entities": []}),
]
简而言之,TRAIN_DATA_1 与 TRAIN_DATA_2 哪个正确,为什么?
【问题讨论】: