【问题标题】:How to split column with dictionary into 2 column如何将带有字典的列拆分为 2 列
【发布时间】:2020-06-28 21:57:00
【问题描述】:

将以下列拆分为数据框的最佳方法是,其中一列包含每个国家/地区的名称,另外两列包含第一列(历史)的数据?

从此数据框:

+-----------------------------------------+----------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------------+
| coordinates                             | country                          | country_code   | history                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |   latest | province                     |
|-----------------------------------------+----------------------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------------|
| {'lat': '15', 'long': '101'}            | Thailand                         | TH             | {'1/22/20': 0, '1/23/20': 0, '1/24/20': 0, '1/25/20': 0, '1/26/20': 0, '1/27/20': 0, '1/28/20': 0, '1/29/20': 0, '1/30/20': 0, '1/31/20': 0, '2/1/20': 0, '2/10/20': 0, '2/11/20': 0, '2/12/20': 0, '2/13/20': 0, '2/14/20': 0, '2/15/20': 0, '2/16/20': 0, '2/17/20': 0, '2/18/20': 0, '2/19/20': 0, '2/2/20': 0, '2/20/20': 0, '2/21/20': 0, '2/22/20': 0, '2/23/20': 0, '2/24/20': 0, '2/25/20': 0, '2/26/20': 0, '2/27/20': 0, '2/28/20': 0, '2/29/20': 0, '2/3/20': 0, '2/4/20': 0, '2/5/20': 0, '2/6/20': 0, '2/7/20': 0, '2/8/20': 0, '2/9/20': 0, '3/1/20': 1, '3/10/20': 1, '3/11/20': 1, '3/12/20': 1, '3/13/20': 1, '3/14/20': 1, '3/15/20': 1, '3/16/20': 1, '3/2/20': 1, '3/3/20': 1, '3/4/20': 1, '3/5/20': 1, '3/6/20': 1, '3/7/20': 1, '3/8/20': 1, '3/9/20': 1}                                                                                                                                            |        1 |                              |
| {'lat': '36', 'long': '138'}            | Japan                            | JP             | {'1/22/20': 0, '1/23/20': 0, '1/24/20': 0, '1/25/20': 0, '1/26/20': 0, '1/27/20': 0, '1/28/20': 0, '1/29/20': 0, '1/30/20': 0, '1/31/20': 0, '2/1/20': 0, '2/10/20': 0, '2/11/20': 0, '2/12/20': 0, '2/13/20': 1, '2/14/20': 1, '2/15/20': 1, '2/16/20': 1, '2/17/20': 1, '2/18/20': 1, '2/19/20': 1, '2/2/20': 0, '2/20/20': 1, '2/21/20': 1, '2/22/20': 1, '2/23/20': 1, '2/24/20': 1, '2/25/20': 1, '2/26/20': 2, '2/27/20': 4, '2/28/20': 4, '2/29/20': 5, '2/3/20': 0, '2/4/20': 0, '2/5/20': 0, '2/6/20': 0, '2/7/20': 0, '2/8/20': 0, '2/9/20': 0, '3/1/20': 6, '3/10/20': 10, '3/11/20': 15, '3/12/20': 16, '3/13/20': 19, '3/14/20': 22, '3/15/20': 22, '3/16/20': 27, '3/2/20': 6, '3/3/20': 6, '3/4/20': 6, '3/5/20': 6, '3/6/20': 6, '3/7/20': 6, '3/8/20': 6, '3/9/20': 10}                                                                                                                                    |       27 |                              

进入这个:

 country  days    values
Thailand  1/2/22     0
Thailand  2/2/22     0
Thailand  2/2/22     0
....
Sweden    3/4/55     0
Sweden    3/4/55     0

【问题讨论】:

  • 请勿发布图片
  • 为什么?我不介意展示它
  • 我们使用 pd.read_clipboard 来复制您的数据框并帮助您
  • 刚刚更新,感谢您的关注
  • 对不起明天告诉我,我断线了,11小时工作......

标签: python-3.x pandas dataframe


【解决方案1】:

IIUC,

new_df = (pd.DataFrame(df['history'].tolist(),
                       index = df['country'])
             .reset_index()
             .melt('country',var_name = 'days')
             .sort_values('country'))

或表示:

#import numpy as np
pd.DataFrame(data = np.concatenate([[(k, v) for k, v in d.items()] 
                                    for d in df['history']]),
             columns = ['days','values'],
            index = df['country'].repeat(df['history'].str.len())).reset_index()

示例

print(df)
  country  country_code       history
0       A             0  {1: 0, 2: 0}
1       B             1  {1: 0, 2: 0}
2       C             2  {1: 0, 2: 0}

new_df = (pd.DataFrame(df['history'].tolist(),
                       index = df['country'])
             .reset_index()
             .melt('country',var_name = 'days',value_name='values')
             .sort_values('country'))
print(new_df)
  country days  values
0       A    1       0
3       A    2       0
1       B    1       0
4       B    2       0
2       C    1       0
5       C    2       0

也许第二种方法更好

%%timeit
pd.DataFrame(data = np.concatenate([[(k,v) for k,v in d.items()] 
                                    for d in df['history']]),
             columns = ['days','values'],
            index = df['country'].repeat(df['history'].str.len())).reset_index()
1.71 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
new_df = (pd.DataFrame(df['history'].tolist(),
                       index = df['country'])
             .reset_index()
             .melt('country',var_name = 'days')
             .sort_values('country'))
new_df
5.01 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

【讨论】:

  • TypeError: 'str' 对象不能被解释为整数,然后 KeyError 'history' 因为“newdf = pd.DataFrame(data=np.conc .....”这尝试了简单的方法(最快)
  • 对不起错了,最后错误只是“TypeError:列表索引必须是整数或切片,而不是str”
  • 谢谢,我的数据只需要转换到数据框,对不起,你救了我的命,太好了;)
  • 顺便问一下,我如何添加省份列?忘了说
猜你喜欢
  • 2010-12-19
  • 1970-01-01
  • 2021-04-13
  • 1970-01-01
  • 2020-04-23
  • 2011-05-04
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多