【问题标题】:Convert CSV Data to Nested JSON将 CSV 数据转换为嵌套 JSON
【发布时间】:2022-01-07 00:44:00
【问题描述】:

我的任务是使用 python 将数据从 csv 文件转换为嵌套的 JSON 文件以供 Web 使用。我在this article 中尝试了 Python 代码。所需的输出将是一个 member_id 在 JSON 文件中显示一次,对于一个 member_id 下的 tag_name 相同。问题是,当我尝试仅使用member_idgroupby 时,tag_name 'm1' 会在'abc123' 下多次显示。如果我尝试使用groupbymember_idtag_name,'abc123' 将针对标签'm1' 和'm2' 出现两次。我已经用谷歌搜索了一段时间,但大多数分辨率只针对一个嵌套(不确定我是否使用了正确的术语)。如果有任何可能的方法,请告诉我。

示例代码:

import json
import pandas as pd
df = pd.read_csv('../detail.csv', sep=',', header=0
                 , index_col=False
                 , dtype = {'member_id':str,'tag_name':str,'detail_name':str,'detail_value':str} )
group = df.groupby(['member_id','tag_name'])

finalList, finalDict = [], {}
for key, value in group:
    dictionary, dictionary1, dictList, dictList1 = {}, {}, [], []
    j = group.get_group(key).reset_index(drop=True)
    dictionary['member_id'] = j.at[0,'member_id']
    dictionary1['tag_name'] = j.at[0,'tag_name']
    
    for i in j.index:
        anotherDict = {}
        anotherDict['detail_name'] = j.at[i,'detail_name']
        anotherDict['detail_value'] = j.at[i,'detail_value']
        dictList1.append(anotherDict.copy())
        dictionary1['detail'] = dictList1 
     
    dictList.append(dictionary1)
    dictionary['tag'] = dictList
    finalList.append(dictionary)

json.dumps(finalList,ensure_ascii = False)

detail.csv:

member_id, tag_name, detail_name, detail_value
-------------------------------------------------------
abc123, m1, Service_A, 20
abc123, m1, Service_B, 20
abc123, m2, Service_C, 10
xyz456, m3, Service A, 5
xyz456, m3, Service A, 10

所需的输出 JSON:

{   "member_id": "abc123",
    "tag":[ {"tag_name": "m1",
            "detail":[{ "detail_name": "Service_A",
                        "detail_value": "20"},
                    {   "detail_name": "Service_B",
                        "detail_value": "20"}]},
            {"tag_name": "m2",
            "detail":[{ "detail_name": "Service_C",
                        "detail_value": "10"}]}]},
{   "member_id": "xyz456",
    "tag":[{"tag_name": "m3",
            "detail":[{ "detail_name": "Service_A",
                        "detail_value": "5"},
                      { "detail_name": "Service_A",
                        "detail_value": "10"}]}]}

【问题讨论】:

  • 请分享您当前的代码。
  • @balderman 添加。

标签: python json python-3.x


【解决方案1】:

我不知道允许直接实现这一点的 pandas 函数。此外,您引入了不属于初始数据帧的键(tagdetail)。所以实现一个通用的解决方案似乎很困难。

但是,如果您的列数不超过问题中所述的列数,则可以遍历数据框,逐列分组:

result = []

for member_id, member_df in df.groupby('member_id'):
    member_dict = {'member_id': member_id}
    member_dict['tag'] = []
    for tag_name, tag_df in member_df.groupby('tag_name'):
        tag_dict = {'tag_name': tag_name}
        tag_dict['detail'] = []
        for detail_name, detail_df in tag_df.groupby('detail_name'):
            detail_dict = {'detail_name': detail_name}
            detail_dict['detail_value'] = detail_df.detail_value.mean() # should be only one value, taking 'mean' just in case
            tag_dict['detail'].append(detail_dict)
        member_dict['tag'].append(tag_dict)
    result.append(member_dict)

print(json.dumps(result, indent=4))

输出:

[
    {
        "member_id": "abc123",
        "tag": [
            {
                "tag_name": "m1",
                "detail": [
                    {
                        "detail_name": "Service_A",
                        "detail_value": 20.0
                    },
                    {
                        "detail_name": "Service_B",
                        "detail_value": 20.0
                    }
                ]
            },
            {
                "tag_name": "m2",
                "detail": [
                    {
                        "detail_name": "Service_C",
                        "detail_value": 10.0
                    }
                ]
            }
        ]
    },
    {
        "member_id": "xyz456",
        "tag": [
            {
                "tag_name": "m3",
                "detail": [
                    {
                        "detail_name": "Service A",
                        "detail_value": 5.0
                    }
                ]
            }
        ]
    }
]
编辑:如果您不希望列表中出现唯一的详细信息名称,请使用更短的名称:
result = []

for member_id, member_df in df.groupby('member_id'):
    member_dict = {'member_id': member_id}
    member_dict['tag'] = []
    for tag_name, tag_df in member_df.groupby('tag_name'):
        tag_dict = {'tag_name': tag_name}
        tag_dict['detail'] = tag_df[['detail_name', 'detail_value']].to_dict(orient='records')
        member_dict['tag'].append(tag_dict)
    result.append(member_dict)

print(json.dumps(result, indent=4))

【讨论】:

  • Tranbi,有什么方法可以提高性能吗? csv 可能有一千万行。
  • 我想不出一种可以大大提高性能的方法。但是你真的需要这么大的 json 吗?您可以将每个 member_dict 保存到其自己的 json 文件中。这样会更容易记忆。
  • 明白。还有一件事,如果我有一个像'xyz456,m3,服务A,10'这样的新行,并且我希望它们显示在不同的字典中,比如(abc123,m1)下的服务A,服务B,我应该如何修改代码?
  • 什么意思? xyz456 已经在不同的字典中
  • 知道了。检查我的更新答案!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-02-13
  • 2021-12-01
  • 1970-01-01
  • 2018-01-07
  • 2020-04-02
  • 2020-10-28
相关资源
最近更新 更多