【发布时间】:2019-11-04 20:51:02
【问题描述】:
我想将数据集与行和列一起拆分,将数据集拆分为 80:20% 的比例,其中 80% 是训练数据,20% 是测试数据。但我可以将数据集分成 80%,但不能分成 20%。
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
city_attributes = pd.read_csv('./input/city_attributes.csv')
humidity = pd.read_csv('./input/humidity.csv')
pressure = pd.read_csv('./input/pressure.csv')
temperature = pd.read_csv('./input/temperature.csv')
weather_description = pd.read_csv('./input/weather_description.csv')
wind_direction = pd.read_csv('./input/wind_direction.csv')
wind_speed = pd.read_csv('./input/wind_speed.csv')
# we can reshape these using pd.melt
humidity = pd.melt(humidity, id_vars = ['datetime'], value_name = 'humidity', var_name = 'City')
pressure = pd.melt(pressure, id_vars = ['datetime'], value_name = 'pressure', var_name = 'City')
temperature = pd.melt(temperature, id_vars = ['datetime'], value_name = 'temperature', var_name = 'City')
weather_description = pd.melt(weather_description, id_vars = ['datetime'], value_name = 'weather_description', var_name = 'City')
wind_direction = pd.melt(wind_direction, id_vars = ['datetime'], value_name = 'wind_direction', var_name = 'City')
wind_speed = pd.melt(wind_speed, id_vars = ['datetime'], value_name = 'wind_speed', var_name = 'City')
# combine all of the dataframes created above
weather = pd.concat([humidity, pressure, temperature, wind_direction, wind_speed, weather_description], axis = 1)
weather = weather.loc[:,~weather.columns.duplicated()] # indexing: every row, only the columns that aren't duplicates
# now we can merge this with the city attributes
weather = pd.merge(city_attributes,weather, on = 'City')
weather = weather.dropna()
first = pd.DataFrame()
rest = pd.DataFrame()
total_size = weather.shape[0]
train_size = 1277055
test_size = 319264
if len(weather) > train_size:
first = weather[:1277055]
rest = weather[319264:]
print(rest)
【问题讨论】:
-
您得到什么错误或意外结果?您导入了
train_test_split但未使用。该功能应该完全满足您的需求。 -
通过使用train_test_split 数据可以按列划分,不能按行划分,我已经测试过了。
标签: python-3.x machine-learning scikit-learn spyder