【发布时间】:2021-05-25 16:20:12
【问题描述】:
我有一个数据框(大约 28000+ 行,61 列)
原始数据列是:
data.columns
Index(['count_id', 'count_date', 'location_id', 'lanes', 'is_oneway',
'location', 'lng', 'lat', 'centreline_type', 'centreline_id', 'px',
'time_start', 'time_end', 'sb_cars_r', 'sb_cars_t', 'sb_cars_l',
'nb_cars_r', 'nb_cars_t', 'nb_cars_l', 'wb_cars_r', 'wb_cars_t',
'wb_cars_l', 'eb_cars_r', 'eb_cars_t', 'eb_cars_l', 'sb_truck_r',
'sb_truck_t', 'sb_truck_l', 'nb_truck_r', 'nb_truck_t', 'nb_truck_l',
'wb_truck_r', 'wb_truck_t', 'wb_truck_l', 'eb_truck_r', 'eb_truck_t',
'eb_truck_l', 'sb_bus_r', 'sb_bus_t', 'sb_bus_l', 'nb_bus_r',
'nb_bus_t', 'nb_bus_l', 'wb_bus_r', 'wb_bus_t', 'wb_bus_l', 'eb_bus_r',
'eb_bus_t', 'eb_bus_l', 'nx_peds', 'sx_peds', 'ex_peds', 'wx_peds',
'nx_bike', 'sx_bike', 'ex_bike', 'wx_bike', 'nx_other', 'sx_other',
'ex_other', 'wx_other'],
dtype='object')
我正在尝试创建一个仅包含所需列的新数据框
首先我创建一个带有列标题的字典:
row_dict = {
'location_id': 0,
'year': 0,
'month': 0,
'day': 0,
'time_start_hour': 0,
'time_start_min': 0,
'time_end_hour': 0,
'time_end_min': 0,
'num_lanes': 0,
'is_oneway': 0,
'is_weekend': 0,
'is_holiday': 0,
'nx': 0,
'sx': 0,
'ex': 0,
'wx': 0,
'nb_r': 0,
'nb_t': 0,
'nb_l': 0,
'sb_r': 0,
'sb_t': 0,
'sb_l': 0,
'eb_r': 0,
'eb_t': 0,
'eb_l': 0,
'wb_r': 0,
'wb_t': 0,
'wb_l': 0
}
然后我创建一个空列表,我将在其中存储每一行:data_list = []
然后我遍历原始数据框并将相关信息放入我的字典中。我将字典附加到列表中。最后我将列表转换为数据框:
def getTime(time):
time = time.split(' ')[1].split('-')[0]
hour, minute, _ = time.split(':')
return float(hour), float(minute)
def isWeekend(date):
return datetime.strptime(date, '%Y-%m-%d').weekday() > 4
def isHoliday(date):
return datetime.strptime(date, '%Y-%m-%d') in holidays.CA()
for index, row in data.iterrows():
row_dict['location_id'] = row['location_id']
row_dict['year'], row_dict['month'], row_dict['day'] = row['count_date'].split('-')
row_dict['time_start_hour'], row_dict['time_start_min'] = getTime(row['time_start'])
row_dict['time_end_hour'], row_dict['time_end_min'] = getTime(row['time_end'])
row_dict['num_lanes'] = row['lanes']
row_dict['is_oneway'] = row['is_oneway']
row_dict['is_weekend'] = isWeekend(row['count_date'])
row_dict['is_holiday'] = isHoliday(row['count_date'])
row_dict['nx'] = float(row['nx_peds']) + float(row['nx_bike']) + float(row['nx_other'])
row_dict['sx'] = float(row['sx_peds']) + float(row['sx_bike']) + float(row['sx_other'])
row_dict['ex'] = float(row['ex_peds']) + float(row['ex_bike']) + float(row['ex_other'])
row_dict['wx'] = float(row['wx_peds']) + float(row['wx_bike']) + float(row['wx_other'])
row_dict['nb_r'] = float(row['nb_cars_r']) + float(row['nb_truck_r']) + float(row['nb_bus_r'])
row_dict['nb_t'] = float(row['nb_cars_t']) + float(row['nb_truck_t']) + float(row['nb_bus_t'])
row_dict['nb_l'] = float(row['nb_cars_l']) + float(row['nb_truck_l']) + float(row['nb_bus_l'])
row_dict['sb_r'] = float(row['sb_cars_r']) + float(row['sb_truck_r']) + float(row['sb_bus_r'])
row_dict['sb_t'] = float(row['sb_cars_t']) + float(row['sb_truck_t']) + float(row['sb_bus_t'])
row_dict['sb_l'] = float(row['sb_cars_l']) + float(row['sb_truck_l']) + float(row['sb_bus_l'])
row_dict['eb_r'] = float(row['eb_cars_r']) + float(row['eb_truck_r']) + float(row['eb_bus_r'])
row_dict['eb_t'] = float(row['eb_cars_t']) + float(row['eb_truck_t']) + float(row['eb_bus_t'])
row_dict['eb_l'] = float(row['eb_cars_l']) + float(row['eb_truck_l']) + float(row['eb_bus_l'])
row_dict['wb_r'] = float(row['wb_cars_r']) + float(row['wb_truck_r']) + float(row['wb_bus_r'])
row_dict['wb_t'] = float(row['wb_cars_t']) + float(row['wb_truck_t']) + float(row['wb_bus_t'])
row_dict['wb_l'] = float(row['wb_cars_l']) + float(row['wb_truck_l']) + float(row['wb_bus_l'])
data_list.append(row_dict)
finalData = pd.DataFrame(data_list)
但是,当我这样做并查看数据框时,我只看到一行重复了 28000 多次。 但是当我只使用 iterrows 打印行时,它会正确打印所有内容:
for index, row in data.iterrows():
print(row['location_id']
是我做错了什么还是我没有按预期使用该功能?
【问题讨论】: