【发布时间】:2021-02-02 17:42:02
【问题描述】:
我在使用 sklearn 70-30 部门时遇到了挑战。我在线收到错误:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
错误是:
Found input variables with inconsistent numbers of samples
上下文
from imblearn.over_sampling import SMOTE
sm = SMOTE(k_neighbors = 1)
X = data.drop('cluster',axis=1)
y = data['cluster']
X_smote, y_smote= sm.fit_sample(X,y)
data_bal = pd.DataFrame(columns=X.columns.values, data=X_smote)
data_bal['cluster']=y_smote
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
y_train.value_counts().plot(kind='bar')
编辑
我解决了错误,我只需将stratify=y 放入stratify=y_smote
【问题讨论】:
-
stackoverflow.com/questions/30813044/… 我认为这是同一个问题
-
您好,我尝试了该解决方案,但仍然无法正常工作,谢谢
标签: python data-analysis sklearn-pandas train-test-split