尝试训练数据时出现文本错误答案

【问题标题】：Text Error while trying to train the data尝试训练数据时出现文本错误
【发布时间】：2020-02-03 10:02:29
【问题描述】：

Getting ValueError: "text" while trying to read and feed csv data BasicClassificationDatasetReader from deeppavlov model

从 deeppavlov 导入 dataset_readers

dat = dataset_readers.basic_classification_reader.BasicClassificationDatasetReader() l=dat.read("C:\Users\Anna\Desktop\NLP\test", url=None, format = 'csv', sep=',', header = 1)

TypeError Traceback（最近一次调用最后一次） ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key) 4380 尝试： -> 4381 返回 libindex.get_value_box(s, key) 4382 除了IndexError：

pandas/_libs/index.pyx in pandas._libs.index.get_value_box()

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/util.pxd 在 pandas._libs.util.get_value_at()

pandas/_libs/util.pxd 在 pandas._libs.util.validate_indexer()

TypeError: 'str' 对象不能被解释为整数

在处理上述异常的过程中，又发生了一个异常：

KeyError Traceback（最近一次调用最后一次）在 2 3 dat = dataset_readers.basic_classification_reader.BasicClassificationDatasetReader() ----> 4 l=dat.read("C:\Users\Anna\Desktop\NLP\test", url=None, format = 'csv', sep=',', header = 1, names = [ 'x','y'])

~\Anaconda3\lib\site-packages\deeppavlov\dataset_readers\basic_classification_reader.py in read(self, data_path, url, format, class_sep, *args, **kwargs) 如果 class_sep 为 None，则为 100： 101 # 每个样本都是一个元组 ("text", "label") --> 102 data[data_type] = [(row[x], str(row[y])) for _, row in df.iterrows()] 103 其他： 104 # 每个样本都是一个元组 ("text", ["label", "label", ...])

~\Anaconda3\lib\site-packages\deeppavlov\dataset_readers\basic_classification_reader.py in (.0) 如果 class_sep 为 None，则为 100： 101 # 每个样本都是一个元组 ("text", "label") --> 102 data[data_type] = [(row[x], str(row[y])) for _, row in df.iterrows()] 103 其他： 104 # 每个样本都是一个元组 ("text", ["label", "label", ...])

~\Anaconda3\lib\site-packages\pandas\core\series.py in getitem(self, key) 第866章 867尝试： --> 868 结果 = self.index.get_value(self, key) 869 870 if not is_scalar(result):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key) 第4387章 4388 其他： -> 4389 提高 e1 4390 例外：# pragma: no cover 第4391章

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key) 4373 尝试：第4374章 -> 4375 tz=getattr(series.dtype, 'tz', None)) 4376 除了 KeyError 作为 e1：第4377章如果len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: '文本'

从 deeppavlov 导入 train_model，配置

我希望提供的数据不会出错。现在数据看起来像价值标签 1600行

【问题讨论】：

标签： nlp stanford-nlp training-data

【解决方案1】：

有未记录的初始化参数 x='text' 和 y='labels' — x 和 y 数据的标头。出现错误是因为 pandas 在您的数据中找不到 text 标头。
另请记住，您使用header=1 并且行号以0 开头，因此您的 csv 文件中的第一行将被跳过。

【讨论】：