【发布时间】:2020-05-10 03:23:03
【问题描述】:
简单地说,我正在尝试将相同的特征选择应用于测试数据,就像我对训练集所做的那样,但是测试没有完全相同的形状。
def get_important_features (X_train, Y_train, X_test):
'''
:param X_train: features of training set of type scipy.sparse.csr_matrix
:param Y_train: labels of training set of type scipy.sparse.csr_matrix
:param X_test: features of test set of type scipy.sparse.csr_matrix
:return:
'''
select_percentile = SelectPercentile(chi2, percentile=75)
print(X_train.shape)
print(X_test.shape)
X_new_train = select_percentile.fit_transform(X_train, Y_train)
#print(select_percentile.get_support(indices=True))
X_new_test = select_percentile.transform(X_test)
return X_new_train, X_new_test
所以训练集形状(836, 3188) 和测试集形状(633, 3187) 如您所见,测试集的形状与训练集不同,但是我只关心在应用chi2 后选择训练集中存在的特征.另外,由于我上面提到的原因,您可能知道X_new_test = select_percentile.transform(X_test) 抛出值错误ValueError: X has a different shape than during fitting.。有什么方法可以在不使用transform(X_test) 的情况下从X_test 中提取这些特征?
注意:输入是 csr 矩阵而不是数据框,所以我从 libsvm 格式文档中获取这些值。
train= load_svmlight_file(train_file_name)
X_train = train[0]
Y_train = train[1]
test= load_svmlight_file(test_file_name)
X_test = test[0]
Y_test = test[1]
【问题讨论】:
标签: python-3.x scikit-learn scipy