【发布时间】:2020-06-21 10:50:12
【问题描述】:
1.包含数据(即文本描述)以及分类标签的CSV
df = pd.read_csv('./output/csv_sanitized_16_.csv', dtype=str)
X = df['description_plus']
y = df['category_id']
2.此 CSV 包含需要预测标签的看不见的数据(即文本描述)
df_2 = pd.read_csv('./output/csv_sanitized_2.csv', dtype=str)
X2 = df_2['description_plus']
对上述训练数据(项目 #1)进行操作的交叉验证函数。
def cross_val():
cv = KFold(n_splits=20)
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5,
stop_words='english')
X_train = vectorizer.fit_transform(X)
clf = make_pipeline(preprocessing.StandardScaler(with_mean=False), svm.SVC(C=1))
scores = cross_val_score(clf, X_train, y, cv=cv)
print(scores)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
cross_val()
我需要知道如何将看不见的数据(项目 #2)传递给交叉验证函数以及如何预测标签?
【问题讨论】:
标签: python-3.x pandas scikit-learn sklearn-pandas