【问题标题】:extract an element from a dictionary inside a list in a dataframe从数据框中列表内的字典中提取元素
【发布时间】:2021-05-10 08:50:06
【问题描述】:

假设我们有一个格式如下的数据框:

col1
[{'overall_prop': '0.812'}, {'overall_prop': '0.125'}, {'overall_prop': '0.062'}]
{}

原始数据为json格式。我想从每行列表中的第一个元素中提取'overall_prop' 的值,这是我尝试提取第一个元素的内容:

 df['col1'].str[0]

一切都很好,然后提取'overall_prop'

df['col1'].str[0].map(lambda x: x.get('overall_prop'))    

但抱怨:

{AttributeError}'float' object has no attribute 'get'

因为{}(python dict 对象)变成了nan

然后我尝试了这个:

df['col1'].where(df['col1'].notna(), lambda x: [{}]).str[0].map(lambda x: x.get('overall_prop'))

但这一次:

{TypeError}argument of type 'NoneType' is not iterable

总之,我正在寻找一种解决方案,从可以处理空值的列表中的字典中提取元素。

【问题讨论】:

  • 你试过了吗df['col1'].str.map(lambda x: x[0]['overall_prop'])
  • @JoeFerndz: StringMethods 没有地图
  • 我的错。我应该多加注意。你不能在 str 上使用 map。而是使用df.col1.apply(lambda x: x[0]['overall_prop'])。请参阅下面的答案

标签: python json pandas dataframe dictionary


【解决方案1】:

EDIT Ver 1:col1 是字典列表,x[0] 有overall_prop

你可以这样做。使用df.col1.apply(lambda x: x[0]['overall_prop']) 从列表中获取第一个元素,并在第一个元素中从字典中获取overall_prop 值。

这里的假设是col1中的每一行都是一个字典,并且有一个键overall_prop

import pandas as pd
df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
                            {'overall_prop': '0.002'},
                            {'overall_prop': '0.003'}],
                           [{'overall_prop': '0.004'},
                            {'overall_prop': '0.005'},
                            {'overall_prop': '0.006'}],
                           [{'overall_prop': '0.007'},
                            {'overall_prop': '0.008'},
                            {'overall_prop': '0.009'}],
                           [{'overall_prop': '0.010'},
                            {'overall_prop': '0.011'},
                            {'overall_prop': '0.012'}],
                           [{'overall_prop': '0.013'},
                            {'overall_prop': '0.014'},
                            {'overall_prop': '0.015'}]]})

print (df)

df['overall_prop'] = df['col1'].apply(lambda x: x[0]['overall_prop'])
print (df)

这个输出将是:

                                                col1 overall_prop
0  [{'overall_prop': '0.001'}, {'overall_prop': '...        0.001
1  [{'overall_prop': '0.004'}, {'overall_prop': '...        0.004
2  [{'overall_prop': '0.007'}, {'overall_prop': '...        0.007
3  [{'overall_prop': '0.010'}, {'overall_prop': '...        0.010
4  [{'overall_prop': '0.013'}, {'overall_prop': '...        0.013

EDIT Ver 2: col1 是字典列表和列表中的空字典

如果你的行没有overall_prop 作为键,你可以使用它。

df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
                            {'overall_prop': '0.002'},
                            {'overall_prop': '0.003'}],
                           [{}],
                           [{'incorrect_key': '0.004'},
                            {'overall_prop': '0.005'},
                            {'overall_prop': '0.006'}],
                           [{'overall_prop': '0.007'},
                            {'overall_prop': '0.008'},
                            {'overall_prop': '0.009'}],
                           [{'overall_prop': '0.010'},
                            {'overall_prop': '0.011'},
                            {'overall_prop': '0.012'}],
                           [{'overall_prop': '0.013'},
                            {'overall_prop': '0.014'},
                            {'overall_prop': '0.015'}]]})

import numpy as np

df['overall_prop'] = df['col1'].apply(lambda x: x[0]['overall_prop'] if 'overall_prop' in x[0] else np.NaN)

这个输出将是:

                                                col1 overall_prop
0  [{'overall_prop': '0.001'}, {'overall_prop': '...        0.001
1                                               [{}]          NaN
2  [{'incorrect_key': '0.004'}, {'overall_prop': ...          NaN
3  [{'overall_prop': '0.007'}, {'overall_prop': '...        0.007
4  [{'overall_prop': '0.010'}, {'overall_prop': '...        0.010
5  [{'overall_prop': '0.013'}, {'overall_prop': '...        0.013

编辑版本 3:col1 具有不同类型的数据

df = pd.DataFrame({'col1':[[{'overall_prop': '0.001'},
                            {'overall_prop': '0.002'},
                            {'overall_prop': '0.003'}],
                           [{}],
                           {'bad':'0.999'},
                           {},
                           'just a bad string',
                           250,
                           35.25,
                           True,
                           False,
                           (10,20),
                           [{'incorrect_key': '0.004'},
                            {'overall_prop': '0.005'},
                            {'overall_prop': '0.006'}],
                           [{'overall_prop': '0.007'},
                            {'overall_prop': '0.008'},
                            {'overall_prop': '0.009'}],
                           [{'overall_prop': '0.010'},
                            {'overall_prop': '0.011'},
                            {'overall_prop': '0.012'}],
                           [{'overall_prop': '0.013'},
                            {'overall_prop': '0.014'},
                            {'overall_prop': '0.015'}]]})

def prop_check(x):
    if isinstance(x,list) and isinstance(x[0],dict) and 'overall_prop' in x[0]:
        return x[0]['overall_prop']
    else: return np.NaN

df['overall_prop'] = df['col1'].apply(lambda x: prop_check(x))
print (df)

这个输出将是:

                                                 col1 overall_prop
0   [{'overall_prop': '0.001'}, {'overall_prop': '...        0.001
1                                                [{}]          NaN
2                                    {'bad': '0.999'}          NaN
3                                                  {}          NaN
4                                   just a bad string          NaN
5                                                 250          NaN
6                                               35.25          NaN
7                                                True          NaN
8                                               False          NaN
9                                            (10, 20)          NaN
10  [{'incorrect_key': '0.004'}, {'overall_prop': ...          NaN
11  [{'overall_prop': '0.007'}, {'overall_prop': '...        0.007
12  [{'overall_prop': '0.010'}, {'overall_prop': '...        0.010
13  [{'overall_prop': '0.013'}, {'overall_prop': '...        0.013

【讨论】:

  • 正如我在描述中所说,问题在于其中一行中有一个空字典对象会破坏所有内容。
  • 也解决了这个问题。请参阅我的答案部分底部的更新代码
  • 感谢更新解决方案但仍然抱怨:{TypeError}'float' object is not subscriptable
  • 我认为原因在您的示例中,空 dic 在列表中。就我而言,空 dic 不在列表中。
  • 知道了。我明确地检查它是否是一个列表,并且在列表内有一个字典,如果该字典的第一个元素是overall_prop。看看能不能解决
猜你喜欢
  • 2020-08-04
  • 1970-01-01
  • 2021-10-31
  • 2021-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-01-23
  • 2021-11-30
相关资源
最近更新 更多