如何在数据框中打开嵌套的 json 或列表？答案

【问题标题】：How to turn nested json or list inside dataframe?如何在数据框中打开嵌套的 json 或列表？
【发布时间】：2021-01-09 04:48:38
【问题描述】：

我有一组来自 Elasticsearch 的使用聚合查询的响应，反应就像

'aggregations': {'group': {'doc_count_error_upper_bound': 0,
   'sum_other_doc_count': 0,
   'buckets': [{'key': 1365,
     'doc_count': 518,
     'group_docs': {'hits': {'total': {'value': 518, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': 'n1X04XYBlaUrIoJskq9q',
         '_score': None,
         '_source': {'hId': 1365,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610108665027]}]}}},
    {'key': 1372,
     'doc_count': 517,
     'group_docs': {'hits': {'total': {'value': 517, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': 'qFUw4nYBlaUrIoJs6rdz',
         '_score': None,
         '_source': {'hId': 1372,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610112617581]}]}}},
    {'key': 1392,
     'doc_count': 491,
     'group_docs': {'hits': {'total': {'value': 491, 'relation': 'eq'},
       'max_score': None,
       'hits': [{'_index': 'mdata',
         '_type': 'ter',
         '_id': '8VXR4XYBlaUrIoJsYKrS',
         '_score': None,
         '_source': {'hId': 1392,
          'Id': 5348,
          'type': 'data'},
         'sort': [1610106358393]}]}}}]},
  'bucketcount': {'count': 3,
   'min': 491.0,
   'max': 518.0,
   'avg': 508.6666666666667,
   'sum': 1526.0}}}

所以我尝试使用获取数据框

df= pd.json_normalize(result['aggregations']['group']['buckets'])

key doc_count   group_docs.hits.total.value group_docs.hits.total.relation  group_docs.hits.max_score   group_docs.hits.hits
0   1365    518 518 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...
1   1372    517 517 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...
2   1392    491 491 eq  None    [{'_index': 'mdata', '_type': 'ter', '_...

我在这里有 apply 方法 enter link description here

使用 forreal = pd.DataFrame(result.get('group_docs.hits.hits')) 对我无效，返回为空

和

works_data = pd.json_normalize(df,record_path ='group_docs.hits.hits') 返回错误“TypeError：字符串索引必须是整数”

我尝试过的一种慢速方法正在使用

df= pd.json_normalize(result['aggregations']['group']['buckets'])
df_1 = (df.hits[0]['hits'])

然后附加 Dataframe ，但是它对我来说很慢，因为我有很多 DF 要连接或附加，我想有更好的方法吗？

【问题讨论】：

标签： python pandas dataframe elasticsearch

【解决方案1】：

您没有具体说明您要达到的目标。以下将完全扩展您问题中的示例 JSON

pd.json_normalize(
pd.json_normalize(results['aggregations']['group']['buckets']).explode("group_docs.hits.hits")
    .to_dict(orient="records")
).explode("group_docs.hits.hits.sort")

【讨论】：

对不起，我想把整个 json 扁平化成一个 daraframe
谢谢！你解决了我的问题~你能解释一下它是怎么做的吗？因为我在那挣扎了 1 个小时~我在这里找不到那个pandas.pydata.org/pandas-docs/stable/reference/api/…
有3个嵌套列表，1. aggregations.group.buckets 2. aggregations.group.buckets.group_docs.hits.hits 3. aggregations.group.buckets.group_docs.hits.hits.sort。 1. 由直接json_normalize() 处理，2. 使用 1 的 explode()，3. 将 1&2 转换回 JSON，json_normalize() 再次扩展 dicts，最后是 sort 的另一个 explode()