【发布时间】:2019-07-25 06:37:53
【问题描述】:
美好的一天,
我正在尝试在不使用 scikit-learn 的情况下训练、验证和测试数据。
我希望将数据拆分为以下样本:
- 训练数据 0.7(70%)
- 验证数据 0.2(20%)
- 测试数据 0.1(10%)
但是,当我尝试拆分数据时,出现以下错误:
TypeError: Level type mismatch: 6.0
我需要帮助来了解我在这里做错了什么。样本数据和目标数据分别是 x_data 是一个数据框和 y_data 一个 Pandas 系列。这是我在下面尝试的代码:
def train_valid_test(x_data y_data, train_split, valid_split, test_split):
""" Parameters
x_data: the input data
y_data: target values
train_split: the portion used for training data
valid_split: the portion used for validating data
test_split: the portion used for testing data
"""
# setting sizes to split the data into training validating and testing samples accordingly
train_size = float(len(all_x)*0.7)
valid_size = float(len(all_x)*0.2)
test_size = float(len(x_prime)*0.1)
# Creating Training and Validation sets
x_train, x_prime = x_data[:valid_size], x_data[valid_size:]
y_train, y_prime = y_data[:valid_size], y_data[valid_size:]
# Creating test sets
x_valid, x_test = x_prime[:test_size], x_prime[test_size:]
y_valid, y_test = y_prime[:test_size], y_prime[test_size:]
# Return the samples
return X_train, X_valid, X_test, y_train, y_valid, y_test
【问题讨论】:
标签: python pandas machine-learning cross-validation