Pandas KeyError：“['value'] 不在索引中”答案

【问题标题】：Pandas KeyError: “['value'] not in index”Pandas KeyError：“['value'] 不在索引中”
【发布时间】：2018-04-30 07:19:05
【问题描述】：

我在使用 Pandas 数据框的索引时遇到了一些问题。我要做的是从 JSON 文件加载数据，创建 Pandas 数据框，然后从该数据框中选择特定字段并将其发送到我的数据库。

以下是指向 JSON 文件中内容的链接，以便您可以看到实际存在的字段： https://pastebin.com/Bzatkg4L

import pandas as pd
from pandas.io import sql
import MySQLdb
from sqlalchemy import create_engine

# Open and read the text file where all the Tweets are
with open('US_tweets.json') as f:
    tweets = f.readlines()

# Convert the list of Tweets into a structured dataframe
df = pd.DataFrame(tweets)
# Attributes needed should be here
df = df[['created_at', 'screen_name', 'id', 'country_code', 'full_name', 'lang', 'text']]

# To create connection and write table into MySQL
engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                       .format(user="blah",
                               pw="blah",
                               db="blah"))

df.to_sql(con=engine, name='US_tweets_Table', if_exists='replace', flavor='mysql')

感谢您的帮助！

【问题讨论】：

您的原始数据框构造正确吗？具体来说，该数据框中存在哪些列？
@Evan 我认为您可能是对的，我将如何为数据框创建列？如果我错了，请纠正我，但似乎您是在说我应该在数据框中创建与 JSON 文件中的属性相关联的列。制作完这些列后，我可以将属性添加到列中吗？
发生错误是因为您尝试引用的列不在索引中：也就是说，它们不存在于您创建的第一个 df 中。它们存在于 JSON 文件中的对象中，但 pandas 不会为 JSON 中的每个对象创建列，仅针对最高级别。

标签： python json pandas

【解决方案1】：

Pandas 不会将 JSON 文件中的每个对象都映射到数据框中的列。您的示例文件包含 24 列：

with open('tweets.json') as f:
    df = pd.read_json(f, lines = True)
df.columns

Index(['contributors', 'coordinates', 'created_at', 'entities',
   'favorite_count', 'favorited', 'geo', 'id', 'id_str',
   'in_reply_to_screen_name', 'in_reply_to_status_id',
   'in_reply_to_status_id_str', 'in_reply_to_user_id',
   'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'metadata',
   'place', 'retweet_count', 'retweeted', 'source', 'text', 'truncated',
   'user'],
  dtype='object')

为了更深入地挖掘 JSON 数据，我找到了这个解决方案，但我希望存在更优雅的方法：How do I access embedded json objects in a Pandas DataFrame?

例如，df['entities'].apply(pd.Series)['urls'].apply(pd.Series)[0].apply(pd.Series)['indices'][0][0] 返回117。

要访问 full_name 并将其复制到 df，请尝试以下操作： df['full_name'] = df['place'].apply(pd.Series)['full_name']，返回0 Austin, TX。

【讨论】：

Hey Evan，您提供了一个非常好的解决方案，但是当我尝试以相同方式访问其他属性（例如“text”和“id”）时，出现错误。你为什么将 df['place'] 应用于'full_name'？我在没有“地方”的情况下尝试了它，它给出了与访问其他属性相同的错误。
UPDATE 好的，可以通过print(df['attribute_here']) 轻松访问以下属性：text、created_at、id 和 lang。只有 screen_name 和 country_code 是空的。
UPDATE 2 好的，所以我想出了如何打印 screen_name，我不明白你为什么将 'full_name' 的 'place' 添加到工作直到我查看了 JSON 文件。属性“用户”包含“屏幕名称”，这就是它起作用的原因。太好了，我现在会尽力导入数据库。谢谢埃文！
很高兴你明白了——我对 JSON 不是很熟悉，所以这对我来说也是一个很好的学习机会。