【发布时间】:2020-10-05 22:24:30
【问题描述】:
我有一个数据框,我想根据它们的标签将它们分成不同的数组,我不确定如何通过它的索引过滤它。不确定这是否正确完成:
数据集示例(df)
Cancer_Type | Variable | Data Split | Target
Cancer1 43 Train Good
Cancer5 34 Train Bad
Cancer2 34 Test Good
Cancer3 23 Test Bad
Cancer4 25 Test Good
可能会做这样的事情吗?
#initial split into train/test data
train = df['split'] == 'train'
print("train")
print(train)
test = df['split'] == 'test'
print("valid")
print(test)
X_test = test.values[-1, :-1]
y_test = test.values[-1, -1]
# Get the remaining dataset
X = train.values[:-1, :-1]
y = train.values[:-1, -1]
print("X")
#print(type(X))
#print(X)
print("y")
#print(type(y))
#print(y)
# Split the remaining dataset into train and calibration sets.
X_train, X_cal, y_train, y_cal = train_test_split(X, y)
print(X_train.shape, y_train.shape)
print(X_cal.shape, y_cal.shape)
希望按行。
【问题讨论】:
标签: python arrays pandas numpy scikit-learn