【问题标题】:pandas iterrows into a dictionary and creating a new dataframe not workingpandas iterrows 进入字典并创建一个新的数据框不起作用
【发布时间】:2021-05-25 16:20:12
【问题描述】:

我有一个数据框(大约 28000+ 行,61 列)

原始数据列是:

data.columns
Index(['count_id', 'count_date', 'location_id', 'lanes', 'is_oneway',
       'location', 'lng', 'lat', 'centreline_type', 'centreline_id', 'px',
       'time_start', 'time_end', 'sb_cars_r', 'sb_cars_t', 'sb_cars_l',
       'nb_cars_r', 'nb_cars_t', 'nb_cars_l', 'wb_cars_r', 'wb_cars_t',
       'wb_cars_l', 'eb_cars_r', 'eb_cars_t', 'eb_cars_l', 'sb_truck_r',
       'sb_truck_t', 'sb_truck_l', 'nb_truck_r', 'nb_truck_t', 'nb_truck_l',
       'wb_truck_r', 'wb_truck_t', 'wb_truck_l', 'eb_truck_r', 'eb_truck_t',
       'eb_truck_l', 'sb_bus_r', 'sb_bus_t', 'sb_bus_l', 'nb_bus_r',
       'nb_bus_t', 'nb_bus_l', 'wb_bus_r', 'wb_bus_t', 'wb_bus_l', 'eb_bus_r',
       'eb_bus_t', 'eb_bus_l', 'nx_peds', 'sx_peds', 'ex_peds', 'wx_peds',
       'nx_bike', 'sx_bike', 'ex_bike', 'wx_bike', 'nx_other', 'sx_other',
       'ex_other', 'wx_other'],
      dtype='object')

我正在尝试创建一个仅包含所需列的新数据框

首先我创建一个带有列标题的字典:

row_dict = {
    'location_id': 0,
    'year': 0,
    'month': 0,
    'day': 0,
    'time_start_hour': 0,
    'time_start_min': 0,
    'time_end_hour': 0,
    'time_end_min': 0,
    'num_lanes': 0,
    'is_oneway': 0,
    'is_weekend': 0,
    'is_holiday': 0,
    'nx': 0,
    'sx': 0,
    'ex': 0,
    'wx': 0,
    'nb_r': 0,
    'nb_t': 0,
    'nb_l': 0,
    'sb_r': 0,
    'sb_t': 0,
    'sb_l': 0,
    'eb_r': 0,
    'eb_t': 0,
    'eb_l': 0,
    'wb_r': 0,
    'wb_t': 0,
    'wb_l': 0
}

然后我创建一个空列表,我将在其中存储每一行​​:data_list = []

然后我遍历原始数据框并将相关信息放入我的字典中。我将字典附加到列表中。最后我将列表转换为数据框:

def getTime(time):
    time = time.split(' ')[1].split('-')[0]
    hour, minute, _ = time.split(':')
    return float(hour), float(minute)

def isWeekend(date):
    return datetime.strptime(date, '%Y-%m-%d').weekday() > 4

def isHoliday(date):
    return datetime.strptime(date, '%Y-%m-%d') in holidays.CA()

for index, row in data.iterrows():
    row_dict['location_id'] = row['location_id']
    row_dict['year'], row_dict['month'], row_dict['day'] = row['count_date'].split('-')
    row_dict['time_start_hour'], row_dict['time_start_min'] = getTime(row['time_start'])
    row_dict['time_end_hour'], row_dict['time_end_min'] = getTime(row['time_end'])
    row_dict['num_lanes'] = row['lanes']
    row_dict['is_oneway'] = row['is_oneway']
    row_dict['is_weekend'] = isWeekend(row['count_date'])
    row_dict['is_holiday'] = isHoliday(row['count_date'])

    row_dict['nx'] = float(row['nx_peds']) + float(row['nx_bike']) + float(row['nx_other'])
    row_dict['sx'] = float(row['sx_peds']) + float(row['sx_bike']) + float(row['sx_other'])
    row_dict['ex'] = float(row['ex_peds']) + float(row['ex_bike']) + float(row['ex_other'])
    row_dict['wx'] = float(row['wx_peds']) + float(row['wx_bike']) + float(row['wx_other'])

    row_dict['nb_r'] = float(row['nb_cars_r']) + float(row['nb_truck_r']) + float(row['nb_bus_r'])
    row_dict['nb_t'] = float(row['nb_cars_t']) + float(row['nb_truck_t']) + float(row['nb_bus_t'])
    row_dict['nb_l'] = float(row['nb_cars_l']) + float(row['nb_truck_l']) + float(row['nb_bus_l'])
    
    row_dict['sb_r'] = float(row['sb_cars_r']) + float(row['sb_truck_r']) + float(row['sb_bus_r'])
    row_dict['sb_t'] = float(row['sb_cars_t']) + float(row['sb_truck_t']) + float(row['sb_bus_t'])
    row_dict['sb_l'] = float(row['sb_cars_l']) + float(row['sb_truck_l']) + float(row['sb_bus_l'])
    
    row_dict['eb_r'] = float(row['eb_cars_r']) + float(row['eb_truck_r']) + float(row['eb_bus_r'])
    row_dict['eb_t'] = float(row['eb_cars_t']) + float(row['eb_truck_t']) + float(row['eb_bus_t'])
    row_dict['eb_l'] = float(row['eb_cars_l']) + float(row['eb_truck_l']) + float(row['eb_bus_l'])
    
    row_dict['wb_r'] = float(row['wb_cars_r']) + float(row['wb_truck_r']) + float(row['wb_bus_r'])
    row_dict['wb_t'] = float(row['wb_cars_t']) + float(row['wb_truck_t']) + float(row['wb_bus_t'])
    row_dict['wb_l'] = float(row['wb_cars_l']) + float(row['wb_truck_l']) + float(row['wb_bus_l'])
    data_list.append(row_dict)
finalData = pd.DataFrame(data_list)

但是,当我这样做并查看数据框时,我只看到一行重复了 28000 多次。 但是当我只使用 iterrows 打印行时,它会正确打印所有内容:

for index, row in data.iterrows():
    print(row['location_id']

是我做错了什么还是我没有按预期使用该功能?

【问题讨论】:

    标签: pandas dataframe csv


    【解决方案1】:

    我想通了 我需要在附加之前复制字典 修复是:data_list.append(row_dict.copy())

    【讨论】:

      猜你喜欢
      • 2022-09-28
      • 2017-07-03
      • 1970-01-01
      • 1970-01-01
      • 2017-03-02
      • 1970-01-01
      • 2020-07-28
      • 1970-01-01
      • 2020-01-05
      相关资源
      最近更新 更多