【发布时间】:2020-03-11 20:15:02
【问题描述】:
作为项目的一部分,我需要在 Python 中训练一个多标签文本分类器。我正在遵循某种指南,但由于我在 Python 方面的经验不足,我在理解验证验证标签与训练标签在同一范围内的部分代码时遇到了一些问题。 + 这是抛出错误的原因。
我试图理解的代码是这个: (更具体地说,这段代码的前两行让我感到困惑)
num_classes = max(np.array(train_labels)) + 1
missing_classes = [i for i in range(num_classes) if i not in train_labels]
if len(missing_classes):
raise ValueError('Missing samples with label value(s) '
'{missing_classes}. Please make sure you have '
'at least one sample for every label value '
'in the range(0, {max_class})'.format(
missing_classes=missing_classes,
max_class=num_classes - 1))
if num_classes <= 1:
raise ValueError('Invalid number of labels: {num_classes}.'
'Please make sure there are at least two classes '
'of samples'.format(num_classes=num_classes))
unexpected_labels = [v for v in test_labels if v not in range(num_classes)]
if len(unexpected_labels):
raise ValueError('Unexpected label values found in the test set:'
' {unexpected_labels}. Please make sure that the '
'labels in the validation set are in the same range '
'as training labels.'.format(
unexpected_labels=unexpected_labels))
还有它给我的错误:
num_classes = max(np.array(train_labels))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
如果这对您来说很重要,那么在此代码块之前编写的代码是:
lb = preprocessing.LabelBinarizer()
train_labels = lb.fit_transform(train_df['label'])
train_labels = np.squeeze(train_labels)
print(lb.classes_)
test_labels=lb.transform(test_df['label'])
test_labels=np.squeeze(test_labels)
这给了我这个输出:
[67 68 69 70]
任何帮助我更好地理解将不胜感激。
【问题讨论】:
标签: python numpy error-handling preprocessor text-classification