使用熊猫从字典列中提取值答案

【问题标题】：Extract values from column of dictionaries using pandas使用熊猫从字典列中提取值
【发布时间】：2019-05-18 12:02:20
【问题描述】：

我正在尝试从以下字典中提取名称：

df = df[[x.get('Name') for x in df['Contact']]]

下面是我的 Dataframe 的样子：

data = [{'emp_id': 101,
  'name': {'Name': 'Kevin',
   'attributes': {'type': 'Contact',
    'url': '/services/data/v38.0/sobjects/Contact/00985300000bt4HEG4'}}},
 {'emp_id': 102,
  'name': {'Name': 'Scott',
   'attributes': {'type': 'Contact',
    'url': '/services/data/v38.0/sobjects/Contact/00985300000yr5UTR9'}}}]

df = pd.DataFrame(data)
df

   emp_id                                               name
0     101  {'Name': 'Kevin', 'attributes': {'type': 'Cont...
1     102  {'Name': 'Scott', 'attributes': {'type': 'Cont...

我收到一个错误：

AttributeError: 'NoneType' object has no attribute 'get'

【问题讨论】：

你的问题是什么？
@njzk2，我正在尝试提取与“名称”对应的值
如果这是一个数据框，请提供minimal reproducible example。
@coldspeed，对不起，我已经用 Dataframe 的样子更新了我的初始帖子..
当前代码的结果是什么？

标签： python python-3.x pandas dictionary dataframe

【解决方案1】：

如果没有 NaN，请使用 json_normalize。

pd.io.json.json_normalize(df.name.tolist())['Name']

0    Kevin
1    Scott
Name: Name, dtype: object

如果有 NaN，您需要先删除它们。但是，保留索引很容易。

df

   emp_id                                               name
0   101.0  {'Name': 'Kevin', 'attributes': {'type': 'Cont...
1   102.0                                                NaN
2   103.0  {'Name': 'Scott', 'attributes': {'type': 'Cont...

idx = df.index[df.name.notna()]
names = pd.io.json.json_normalize(df.name.dropna().tolist())['Name']  
names.index = idx

names

0    Kevin
2    Scott
Name: Name, dtype: object

【讨论】：

感谢您的回复。与所做的编辑相比，我的源中的字典格式看起来不同。我拥有的数据为 {'attributes': {'type': 'Contact' , 'url': '/services/data/v38.0/sobjects/Contact/00985300000bt4HEG4'}, 'Name': 'Kevin'} .. 在尝试上述方法时，我收到错误“AttributeError: 'NoneType' object has没有属性“值””
@scottmartin 我希望您尝试了“如果有 NaN...”部分之后的解决方案？
如果我可以要求更多帮助，我正在尝试将其添加到现有的 Dataframe 并执行以下操作：idx = df.index[df.name.notna()] df[' names'] = pd.io.json.json_normalize(name().tolist())['Name'] 但是当我尝试做 df['names'].index = idx.我收到一个错误 ValueError: Length mismatch
@scottmartin 只需要做idx = df.index[df.name.notna()]; names = pd.io.json.json_normalize(df.name.dropna().tolist())['Name'] ; names.index = idx; df['new_col'] = names。

【解决方案2】：

使用apply，并使用tolist 使其成为一个列表：

print(df['name'].apply(lambda x: x.get('Name')).tolist())

输出：

['Kevin', 'Scott']

如果不需要列表，想要Series，使用：

print(df['name'].apply(lambda x: x.get('Name')))

输出：

0    Kevin
1    Scott
Name: name, dtype: object

更新：

print(df['name'].apply(lambda x: x['attributes'].get('Name')).tolist())

【讨论】：

感谢您的回复。与所做的编辑相比，我的源中的字典格式看起来不同。我拥有的数据为 {'attributes': {'type': 'Contact', 'url': '/services/data/v38.0 /sobjects/Contact/00985300000bt4HEG4'}，“姓名”：“凯文”}
我仍然收到错误“AttributeError: 'NoneType' object has no attribute 'get'”
@scottmartin 现在？
它返回“TypeError: 'NoneType' object is not subscriptable”.. 请您帮忙..

【解决方案3】：

尝试以下行：

names = [name.get('Name') for name in df['name']]

【讨论】：

感谢回复，报错：AttributeError: 'NoneType' object has no attribute 'get'