Python：字典列表到 pd 数据框（Twitter API）

【问题标题】：Python: list of dictionary to pd dataframe (Twitter API)Python：字典列表到 pd 数据框（Twitter API）
【发布时间】：2021-09-06 15:00:21
【问题描述】：

我使用 Twitter API 学术轨道收集了推文数据。其中一列是关于引用的推文唯一 ID 的字典列表。

No	Referenced_tweets
1	[{'type': 'replied_to', 'id': '1212086431889313792'}]
2	[{'type': 'quoted', 'id': '1345063319540002817'}, {'type': 'replied_to', 'id': '1345066320761655296'}]
3	[{'type': 'retweeted', 'id': '1344718164974833667'}, {'type': 'replied_to', 'id': '1211798476062908422'}]

我想像下面这样转换这些数据。

No	replied_to	quoated	retweeted
1	1212086431889313792
2	1345066320761655296	1345063319540002817
3	1211798476062908422		1344718164974833667

如果我使用“json_normalize”，它会导致错误消息（TypeError：字符串索引必须是整数）。如何使用 Python？

【问题讨论】：

你能提供这个例子的原始json回复吗？

标签： python list api dictionary twitter

【解决方案1】：

这是一种方法（如果您需要解释代码，请告诉我）：

def f(l):
    a={'replied_to':'', 'quoted':'', 'retweeted':''}
    x=pd.DataFrame(l)
    x=x.set_index('type')
    x=x.T
    x=x.reset_index(drop=True)
    x=x.to_dict(orient='records')
    a.update(x[0])
    return a

df['Referenced_tweets_2'] = [f(k) for k in df['Referenced_tweets']]

result = pd.DataFrame.from_dict(df['Referenced_tweets_2'].to_list())
    
print(result)

输出：

            replied_to               quoted            retweeted
0  1212086431889313792
1  1345066320761655296  1345063319540002817
2  1211798476062908422                       1344718164974833667

【讨论】：

非常感谢。没有你的帮助，解决起来比我想象的要困难得多。如果我有空行怎么办？在这种情况下，我收到错误消息“未正确调用 DataFrame 构造函数！”
只删除带有NULL的行，因为这不会影响结果：df=df[pd.notna(df.Referenced_tweets)]