【问题标题】:Convert CSV Data to Nested JSON in Python在 Python 中将 CSV 数据转换为嵌套的 JSON
【发布时间】:2018-12-08 06:46:41
【问题描述】:

我需要将 CSV 数据文件转换为 Python 中的嵌套 JSON 以用于应用程序。我的以下当前 Python 代码适用于 1 个客户/帐户文档,但不知何故无法为 CSV 文件中的所有客户创建 json 转储。

我在下面提供 Python 代码,它可以让您深入了解我想要实现的目标。如果有任何现有的解决方案,请告诉我。

示例 Python 代码:

import pandas as pd
from itertools import groupby 
from collections import OrderedDict
import json    

df = pd.read_csv('cust.csv', dtype={
        "ClientID" : str,
        "ClientName" : str,
        "AcctID" : str,
        "AcctNbr" : str,
        "AcctTyp" : str
    })

results = []

for (ClientID, ClientName), bag in df.groupby(["ClientID", "ClientName"]):
contents_df = bag.drop(["ClientID", "ClientName"], axis=1)
subset = [OrderedDict(row) for i,row in contents_df.iterrows()]
results.append(OrderedDict([("ClientID", ClientID),("ClientName", ClientName),("subset", subset)]))

print json.dumps(results[0], indent=4)

with open('ExpectedJsonFile.json', 'w') as outfile:
outfile.write(json.dumps(results[0], indent=4))

输入 CSV 示例:

ClientID,ClientName,AcctID,AcctNbr,AcctTyp
----------------------------------------------------------
00001,John George,812001,812001095,DDA
00001,John George,813002,813002096,SAV
00001,John George,814003,814003097,AFS
00024,Richard Polado,512987,512987085,ML
00024,Richard Polado,512983,512983086,IL
00345,John Cruze,1230,123001567,SAV
00345,John Cruze,5145,514502096,CD
00345,John Cruze,7890,7890033527,SGD

所需的输出 JSON:

{  
   "clientId":00001,
   "ClientName":"John George",
   "subset":[  
      {  
         "AcctID":812001,
         "AcctNbr":"812001095",
         "AcctTyp":"DDA",
      },
      {  
         "AcctID":813002,
         "AcctNbr":"813002096",
         "AcctTyp":"SAV",
      },
      {  
         "AcctID":814003,
         "AcctNbr":"814003097",
         "AcctTyp":"AFS",
      }
   ]
},
{  
   "clientId":00024,
   "ClientName":"Richard Polado",
   "subset":[  
      {  
         "AcctID":512987,
         "AcctNbr":"512987085",
         "AcctTyp":"ML",
      },
      {  
         "AcctID":512983,
         "AcctNbr":"512983086",
         "AcctTyp":"IL",
      }
   ]
}

这些文档应该继续为其他成千上万的客户创建。

【问题讨论】:

  • 你所说的“不知何故不能”是什么意思 - 你具体有什么问题?它不做什么?
  • 它只为第一个 ClientID 创建一个文档。 { "clientId":00001, "ClientName":"John George", "subset":[ { "AcctID":812001, "AcctNbr":"812001095", "AcctTyp":"DDA", }, { "AcctID" :813002,“AcctNbr”:“813002096”,“AcctTyp”:“SAV”,},{“AcctID”:814003,“AcctNbr”:“814003097”,“AcctTyp”:“AFS”,}]}
  • 听起来像是调试问题。编写一些代码来检测它是否在循环的第二遍到达文件写入部分。它没有到达那里吗?或者,也许您的文件写入失败,因为它使用了相同的文件名?
  • 我不熟悉Python,但是循环不需要缩进吗?在这种情况下,您似乎没有。
  • @ShivKumar 我的解决方案有效吗?

标签: python json pandas


【解决方案1】:

解决方案按每个 'ClientID','ClientName' 对分组

你的数据框

df = pd.DataFrame([['00001','John George','812001','812001095','DDA'],
['00001','John George','813002','813002096','SAV'],
['00001','John George','814003','814003097','AFS'],
['00024','Richard Polado','512987','512987085','ML'],
['00024','Richard Polado','512983','512983086','IL'],
['00345','John Cruze','1230','123001567','SAV'],
['00345','John Cruze','5145','514502096','CD'],
['00345','John Cruze','7890','7890033527','SGD']])

df.columns = ['ClientID','ClientName','AcctID','AcctNbr','AcctTyp'] 

现在

finalList = []
finalDict = {}
grouped = df.groupby(['ClientID', 'ClientName'])
for key, value in grouped:


    dictionary = {}

    j = grouped.get_group(key).reset_index(drop=True)
    dictionary['ClientID'] = j.at[0, 'ClientID']
    dictionary['ClientName'] = j.at[0, 'ClientName']


    dictList = []
    anotherDict = {}
    for i in j.index:

        anotherDict['AcctID'] = j.at[i, 'AcctID']
        anotherDict['AcctNbr'] = j.at[i, 'AcctNbr']
        anotherDict['AcctTyp'] = j.at[i, 'AcctTyp']

        dictList.append(anotherDict)

    dictionary['subset'] = dictList


    finalList.append(dictionary)

import json
json.dumps(finalList)

给予:

'[
   {"ClientID": "00001", 
    "ClientName": "John George", 
    "subset": 
            [{"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}, 

             {"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}, 

             {"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}]

   }, 

  {
   "ClientID": "00024", 
   "ClientName": "Richard Polado", 
   "subset": 
            [{"AcctID": "512983", 
              "AcctNbr": "512983086", 
              "AcctTyp": "IL"}, 

             {"AcctID": "512983", 
              "AcctNbr": "512983086", 
              "AcctTyp": "IL"}]
   }, 

  {
   "ClientID": "00345", 
   "ClientName": "John Cruze", 
   "subset": 
            [{"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}, 

             {"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}, 

             {"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}]
   }

]'

这就是你想要的吗?

【讨论】:

  • 嗨 Abhishek,只是一个更正,我现在查看了更多数据,实际上它没有按预期工作......当更多记录内部字典总是带来第一个值时
  • 嗨@imperialgendarme,它对我有用......但你能帮我吗?我有一点疑问?如果我想在子集中使用另一个名称怎么办?如何在子集中插入我的信息?提前感谢您的帮助!!!
  • Comment by Mubeen Ghafooranother_dict = { } 应该在 for 循环内。
【解决方案2】:

使用dictList.append(anotherDict.copy()) 否则您将在列表中获得相同的 dict 对象。

这个问题的更多细节: Create List of Dictionary Python

【讨论】:

    猜你喜欢
    • 2022-01-07
    • 1970-01-01
    • 2018-10-27
    • 2021-05-17
    • 1970-01-01
    • 2019-09-24
    • 1970-01-01
    • 2017-05-02
    相关资源
    最近更新 更多