【问题标题】:converting a csv file to json + python with specific json format将 csv 文件转换为具有特定 json 格式的 json + python
【发布时间】:2016-11-06 15:40:57
【问题描述】:

我可以将 csv 文件转换为 json,如下所示:
csv = 第 1 行中的标题,其值低于
json = [{"key1":"value1",...},{"key1":"value2",...}...]

这是我的 csv 文件:

$ cat -v head_data.csv
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
"2016-05-30","686","230","63979","Samsung SM-G935FD ","Samsung SM-G935FD","$29.95 Carryover Plan (1GB)"
"2016-05-30","533","970","171631866","Apple iPhone 6 (A1586)","iPhone 6 (A1586)","$69.95 Plan"
"2016-05-30","191","610","145713","Samsung GT-I9195","Samsung GT-I9195","$29.95 Plan"
"2016-05-30","660","660","2994742","Samsung SM-N920I","Samsung SM-N920I","GOVERNMENT TIER 2 PLAN"
"2016-05-30","182","970","37799939","Samsung SM-J200Y","Samsung SM-J200Y","PREPAY PLUS - $0 -"
"2016-05-30","993","360","14096114","Samsung SM-A300Y","Samsung SM-A300Y","$39.95 Carryover Plan"
"2016-05-30","894","730","9851177","Samsung GT-N7105","Samsung GT-N7105","PREPAY STD - $0 - #2"
"2016-05-30","600","070","18420650","Apple iPhone 5C (A1529)","Apple iPhone 5C (A1529)","PREPAY PLUS - $0 -"
"2016-05-30","234","000","1769661","Galaxy S7 SM-G930F ","Galaxy S7 SM-G930F","$39.95 Plan"

这是我的脚本:

$ cat csv_to_json.py

#!/usr/bin/python

#from here
#https://stackoverflow.com/a/7550352/2392358

import csv, json
csvreader = csv.reader(open('head_data.csv', 'rb'), delimiter='\t',
quotechar='"')
data = []
for row in csvreader:
    r = []
    for field in row:
        if field == '': field = None
        else: field = unicode(field, 'ISO-8859-1')
        r.append(field)
    data.append(r)
jsonStruct = {
    'header': data[0],
    'data': data[1:]
}
open('head_data.json', 'wb').write(json.dumps(jsonStruct))

运行我的脚本并输出

$ python csv_to_json.py


$ cat -v head_data.json
{"header": ["Rec Open Date,\"MSISDN\",\"IMEI\",\"Data Volume (Bytes)\",\"Device Manufacturer\",\"Device Model\",\"Product Description\""], "data": [["2016-05-30,\"686\",\"230\",\"63979\",\"Samsung SM-G935FD \",\"Samsung SM-G935FD\",\"$29.95 Carryover Plan (1GB)\""], ["2016-05-30,\"533\",\"970\",\"171631866\",\"Apple iPhone 6 (A1586)\",\"iPhone 6 (A1586)\",\"$69.95 Plan\""], ["2016-05-30,\"191\",\"610\",\"145713\",\"Samsung GT-I9195\",\"Samsung GT-I9195\",\"$29.95 Plan\""], ["2016-05-30,\"660\",\"660\",\"2994742\",\"Samsung SM-N920I\",\"Samsung SM-N920I\",\"GOVERNMENT TIER 2 PLAN\""], ["2016-05-30,\"182\",\"970\",\"37799939\",\"Samsung SM-J200Y\",\"Samsung SM-J200Y\",\"PREPAY PLUS - $0 -\""], ["2016-05-30,\"993\",\"360\",\"14096114\",\"Samsung SM-A300Y\",\"Samsung SM-A300Y\",\"$39.95 Carryover Plan\""], ["2016-05-30,\"894\",\"730\",\"9851177\",\"Samsung GT-N7105\",\"Samsung GT-N7105\",\"PREPAY STD - $0 - #2\""], ["2016-05-30,\"600\",\"070\",\"18420650\",\"Apple iPhone 5C (A1529)\",\"Apple iPhone 5C (A1529)\",\"PREPAY PLUS - $0 -\""], ["2016-05-30,\"234\",\"000\",\"1769661\",\"Galaxy S7 SM-G930F \",\"Galaxy S7 SM-G930F\",\"$39.95 Plan\""]]}

我可以稍微修改一下代码,这样我就可以得到这样的输出:

[{"Rec Open Date":"2016-07-03","MSISDN":540,"IMEI":990,"Data Volume (Bytes)":36671453,"Device Manufacturer":"HUAWEI Technologies Co Ltd","Device Model":"H1512","Product Description":"PREPAY PLUS - $0 -"},
{"Rec Open Date":"2016-07-03","MSISDN":334,"IMEI":340,"Data Volume (Bytes)":129835114,"Device Manufacturer":"Apple Inc","Device Model":"Apple iPhone S (A1530)","Product Description":"$29.95 Plan"},
{"Rec Open Date":"2016-07-03","MSISDN":133,"IMEI":870,"Data Volume (Bytes)":42213030,"Device Manufacturer":"Apple Inc","Device Model":"Apple iPhone 6 Plus (A1524)","Product Description":"$49.95 Plan"}]

相关Qherehere

edit1 找到了这个here 但这会在浏览器中进行转换,我认为它使用的是 js。

EDIT2 - 根据下面的答案,这就是我想要的

这是我要转换的文件

$ cat -v head_data.csv
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
"2016-05-30","686","230","63979","Samsung SM-G935FD ","Samsung,A, SM-G935FD","$29.95 Carryover Plan (1GB)"
"2016-05-30","533","970","171631866","Apple iPhone 6 (A1586)","iPhone 6 (A1586)","$69.95 Plan"
"2016-05-30","191","610","145713","Samsung GT-I9195","Samsung GT-I9195","$29.95 Plan"
"2016-05-30","660","660","2994742","Samsung SM-N920I","Samsung SM-N920I","GOVERNMENT TIER 2 PLAN"
"2016-05-30","182","970","37799939","Samsung SM-J200Y","Samsung SM-J200Y","PREPAY PLUS - $0 -"
"2016-05-30","993","360","14096114","Samsung SM-A300Y","Samsung SM-A300Y","$39.95 Carryover Plan"
"2016-05-30","894","730","9851177","Samsung GT-N7105","Samsung GT-N7105","PREPAY STD - $0 - #2"
"2016-05-30","600","070","18420650","Apple iPhone 5C (A1529)","Apple iPhone 5C (A1529)","PREPAY PLUS - $0 -"
"2016-05-30","234","000","1769661","Galaxy S7 SM-G930F ","Galaxy S7 SM-G930F","$39.95 Plan"

这是脚本:

$ cat -v csv_to_json2.py
#!/usr/bin/python

#from here
#https://stackoverflow.com/a/38193687/2392358

import csv
import json
from collections import OrderedDict

dR=csv.DictReader(open("head_data.csv"))
oD=[ OrderedDict(
         sorted(dct.iteritems(),
                key=lambda item:dR.fieldnames.index(item[0])))
     for dct in dR ]

#print to terminal
print json.dumps(oD)

#write to file
#json.dump(oD,"head_op.json")
open('head_op.json', 'wb').write(json.dumps(oD))

运行脚本:

$ python csv_to_json2.py
[{"Rec Open Date": "2016-05-30", "MSISDN": "686", "IMEI": "230", "Data Volume (Bytes)": "63979", "Device Manufacturer": "Samsung SM-G935FD ", "Device Model": "Samsung,A, SM-G935FD", "Product Description": "$29.95 Carryover Plan (1GB)"}, {"Rec Open Date": "2016-05-30", "MSISDN": "533", "IMEI": "970", "Data Volume (Bytes)": "171631866", "Device Manufacturer": "Apple iPhone 6 (A1586)", "Device Model": "iPhone 6 (A1586)", "Product Description": "$69.95 Plan"}, {"Rec Open Date": "2016-05-30", "MSISDN": "191", "IMEI": "610", "Data Volume (Bytes)": "145713", "Device Manufacturer": "Samsung GT-I9195", "Device Model": "Samsung GT-I9195", "Product Description": "$29.95 Plan"}, {"Rec Open Date": "2016-05-30", "MSISDN": "660", "IMEI": "660", "Data Volume (Bytes)": "2994742", "Device Manufacturer": "Samsung SM-N920I", "Device Model": "Samsung SM-N920I", "Product Description": "GOVERNMENT TIER 2 PLAN"}, {"Rec Open Date": "2016-05-30", "MSISDN": "182", "IMEI": "970", "Data Volume (Bytes)": "37799939", "Device Manufacturer": "Samsung SM-J200Y", "Device Model": "Samsung SM-J200Y", "Product Description": "PREPAY PLUS - $0 -"}, {"Rec Open Date": "2016-05-30", "MSISDN": "993", "IMEI": "360", "Data Volume (Bytes)": "14096114", "Device Manufacturer": "Samsung SM-A300Y", "Device Model": "Samsung SM-A300Y", "Product Description": "$39.95 Carryover Plan"}, {"Rec Open Date": "2016-05-30", "MSISDN": "894", "IMEI": "730", "Data Volume (Bytes)": "9851177", "Device Manufacturer": "Samsung GT-N7105", "Device Model": "Samsung GT-N7105", "Product Description": "PREPAY STD - $0 - #2"}, {"Rec Open Date": "2016-05-30", "MSISDN": "600", "IMEI": "070", "Data Volume (Bytes)": "18420650", "Device Manufacturer": "Apple iPhone 5C (A1529)", "Device Model": "Apple iPhone 5C (A1529)", "Product Description": "PREPAY PLUS - $0 -"}, {"Rec Open Date": "2016-05-30", "MSISDN": "234", "IMEI": "000", "Data Volume (Bytes)": "1769661", "Device Manufacturer": "Galaxy S7 SM-G930F ", "Device Model": "Galaxy S7 SM-G930F", "Product Description": "$39.95 Plan"}]

这是输出:

$ cat -v head_op.json
[{"Rec Open Date": "2016-05-30", "MSISDN": "686", "IMEI": "230", "Data Volume (Bytes)": "63979", "Device Manufacturer": "Samsung SM-G935FD ", "Device Model": "Samsung,A, SM-G935FD", "Product Description": "$29.95 Carryover Plan (1GB)"}, {"Rec Open Date": "2016-05-30", "MSISDN": "533", "IMEI": "970", "Data Volume (Bytes)": "171631866", "Device Manufacturer": "Apple iPhone 6 (A1586)", "Device Model": "iPhone 6 (A1586)", "Product Description": "$69.95 Plan"}, {"Rec Open Date": "2016-05-30", "MSISDN": "191", "IMEI": "610", "Data Volume (Bytes)": "145713", "Device Manufacturer": "Samsung GT-I9195", "Device Model": "Samsung GT-I9195", "Product Description": "$29.95 Plan"}, {"Rec Open Date": "2016-05-30", "MSISDN": "660", "IMEI": "660", "Data Volume (Bytes)": "2994742", "Device Manufacturer": "Samsung SM-N920I", "Device Model": "Samsung SM-N920I", "Product Description": "GOVERNMENT TIER 2 PLAN"}, {"Rec Open Date": "2016-05-30", "MSISDN": "182", "IMEI": "970", "Data Volume (Bytes)": "37799939", "Device Manufacturer": "Samsung SM-J200Y", "Device Model": "Samsung SM-J200Y", "Product Description": "PREPAY PLUS - $0 -"}, {"Rec Open Date": "2016-05-30", "MSISDN": "993", "IMEI": "360", "Data Volume (Bytes)": "14096114", "Device Manufacturer": "Samsung SM-A300Y", "Device Model": "Samsung SM-A300Y", "Product Description": "$39.95 Carryover Plan"}, {"Rec Open Date": "2016-05-30", "MSISDN": "894", "IMEI": "730", "Data Volume (Bytes)": "9851177", "Device Manufacturer": "Samsung GT-N7105", "Device Model": "Samsung GT-N7105", "Product Description": "PREPAY STD - $0 - #2"}, {"Rec Open Date": "2016-05-30", "MSISDN": "600", "IMEI": "070", "Data Volume (Bytes)": "18420650", "Device Manufacturer": "Apple iPhone 5C (A1529)", "Device Model": "Apple iPhone 5C (A1529)", "Product Description": "PREPAY PLUS - $0 -"}, {"Rec Open Date": "2016-05-30", "MSISDN": "234", "IMEI": "000", "Data Volume (Bytes)": "1769661", "Device Manufacturer": "Galaxy S7 SM-G930F ", "Device Model": "Galaxy S7 SM-G930F", "Product Description": "$39.95 Plan"}]

【问题讨论】:

  • 我宁愿为行中的每个字段定义一个具有特定正确命名成员的类,然后使用默认的 JSON 序列化将其序列化为 JSON。

标签: python json csv


【解决方案1】:

如果您不关心密钥的顺序,请执行以下操作:

import csv
import json
json.dumps(list(csv.DictReader(open("file.csv"))))

查看手册中的pretty printing 部分以获取更多选项,或者执行

json.dumps(list( csv.DictReader(open("file.csv")) ])).replace("}, ","},\n")

获得预期的输出。


如果您想订购打印,您可以通过 OrderedDict 订购钥匙:

import csv
import json
from collections import OrderedDict

dR=csv.DictReader(open("/tmp/ah.csv"))
oD=[ OrderedDict(
         sorted(dct.iteritems(),
                key=lambda item:dR.fieldnames.index(item[0])))
     for dct in dR ]
json.dumps(oD)

【讨论】:

  • tks 但我无法将所有这些拼凑在一起。如何将其写入输出文件?
  • json.dumps() 输出是一个字符串。使用file.write() 保存它。一个示例在您问题脚本的最后一行。
  • tks,我希望密钥顺序与 csv 相同,可以吗?
  • C:这里不需要列表理解,只需 list(csv.DictReader(...)) 就可以,D不要 .replace("}, ","},\n"),如果一个字符串包含}, ,那么将在该字符串中插入一个换行符,这会破坏数据。
  • 自定义 JSONEncoder 可能是处理 B&D 的正确方法
【解决方案2】:

如果您想保持键的顺序,请不要使用csv.DictReader,因为它会使事情变得过于复杂,只需记录标题,然后将zip 与每一行一起记录:

import csv
from collections import OrderedDict
reader = csv.reader(open("text.csv"))

header = next(reader)

data = [OrderedDict(zip(header,fields)) for fields in reader]

然后你可以用这个把它写到一个文件中:

import json

with open("new.json","w") as f:
    json.dump(data, f)

【讨论】:

    【解决方案3】:

    另一种命令行解决方案:

    1. 安装依赖项:
    pip install pyexcel-cli pyexcel-text
    
    1. 运行以下命令将 csv 转换为 json
    pyexcel transcode --name-columns-by-row 0 --output-file-type json example.csv -
    

    输出:

    {"example.csv": [{"Data Volume (Bytes)": 63979, "Device Manufacturer": "Samsung SM-G935FD ", "Device Model": "Samsung SM-G935FD", "IMEI": 230, "MSISDN": 686, "Product Description": "$29.95 Carryover Plan (1GB)", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 171631866, "Device Manufacturer": "Apple iPhone 6 (A1586)", "Device Model": "iPhone 6 (A1586)", "IMEI": 970, "MSISDN": 533, "Product Description": "$69.95 Plan", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 145713, "Device Manufacturer": "Samsung GT-I9195", "Device Model": "Samsung GT-I9195", "IMEI": 610, "MSISDN": 191, "Product Description": "$29.95 Plan", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 2994742, "Device Manufacturer": "Samsung SM-N920I", "Device Model": "Samsung SM-N920I", "IMEI": 660, "MSISDN": 660, "Product Description": "GOVERNMENT TIER 2 PLAN", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 37799939, "Device Manufacturer": "Samsung SM-J200Y", "Device Model": "Samsung SM-J200Y", "IMEI": 970, "MSISDN": 182, "Product Description": "PREPAY PLUS - $0 -", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 14096114, "Device Manufacturer": "Samsung SM-A300Y", "Device Model": "Samsung SM-A300Y", "IMEI": 360, "MSISDN": 993, "Product Description": "$39.95 Carryover Plan", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 9851177, "Device Manufacturer": "Samsung GT-N7105", "Device Model": "Samsung GT-N7105", "IMEI": 730, "MSISDN": 894, "Product Description": "PREPAY STD - $0 - #2", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 18420650, "Device Manufacturer": "Apple iPhone 5C (A1529)", "Device Model": "Apple iPhone 5C (A1529)", "IMEI": "070", "MSISDN": 600, "Product Description": "PREPAY PLUS - $0 -", "Rec Open Date": "2016-05-30"}, {"Data Volume (Bytes)": 1769661, "Device Manufacturer": "Galaxy S7 SM-G930F ", "Device Model": "Galaxy S7 SM-G930F", "IMEI": "000", "MSISDN": 234, "Product Description": "$39.95 Plan", "Rec Open Date": "2016-05-30"}]}
    

    【讨论】:

    • 试过这个解决方案,但没有得到与作者所说的相同的输出。
    【解决方案4】:

    使用pandas 库对我来说是最简单的。

    1. 安装依赖项
    pip install pandas
    
    1. 创建您的 csv 到 json 脚本(我们称之为 csv2json.py
    import sys
    import pandas as pd
    
    data_frame = pd.read_csv(sys.argv[1])
    data_frame.to_json(sys.argv[1].replace('.csv', '.json'), orient='records', indent=2)
    
    1. example.csv 文件输入上运行csv2json.py 脚本
    python csv2json.py example.csv
    
    1. 您的 json 已在 example.json 文件中生成

    示例:

    输入(example.csv):

    "Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
    "2016-05-30","686","230","63979","Samsung SM-G935FD ","Samsung SM-G935FD","$29.95 Carryover Plan (1GB)"
    "2016-05-30","533","970","171631866","Apple iPhone 6 (A1586)","iPhone 6 (A1586)","$69.95 Plan"
    "2016-05-30","191","610","145713","Samsung GT-I9195","Samsung GT-I9195","$29.95 Plan"
    "2016-05-30","660","660","2994742","Samsung SM-N920I","Samsung SM-N920I","GOVERNMENT TIER 2 PLAN"
    "2016-05-30","182","970","37799939","Samsung SM-J200Y","Samsung SM-J200Y","PREPAY PLUS - $0 -"
    "2016-05-30","993","360","14096114","Samsung SM-A300Y","Samsung SM-A300Y","$39.95 Carryover Plan"
    "2016-05-30","894","730","9851177","Samsung GT-N7105","Samsung GT-N7105","PREPAY STD - $0 - #2"
    "2016-05-30","600","070","18420650","Apple iPhone 5C (A1529)","Apple iPhone 5C (A1529)","PREPAY PLUS - $0 -"
    "2016-05-30","234","000","1769661","Galaxy S7 SM-G930F ","Galaxy S7 SM-G930F","$39.95 Plan"
    

    输出(example.json):

    [
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":686,
        "IMEI":230,
        "Data Volume (Bytes)":63979,
        "Device Manufacturer":"Samsung SM-G935FD ",
        "Device Model":"Samsung SM-G935FD",
        "Product Description":"$29.95 Carryover Plan (1GB)"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":533,
        "IMEI":970,
        "Data Volume (Bytes)":171631866,
        "Device Manufacturer":"Apple iPhone 6 (A1586)",
        "Device Model":"iPhone 6 (A1586)",
        "Product Description":"$69.95 Plan"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":191,
        "IMEI":610,
        "Data Volume (Bytes)":145713,
        "Device Manufacturer":"Samsung GT-I9195",
        "Device Model":"Samsung GT-I9195",
        "Product Description":"$29.95 Plan"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":660,
        "IMEI":660,
        "Data Volume (Bytes)":2994742,
        "Device Manufacturer":"Samsung SM-N920I",
        "Device Model":"Samsung SM-N920I",
        "Product Description":"GOVERNMENT TIER 2 PLAN"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":182,
        "IMEI":970,
        "Data Volume (Bytes)":37799939,
        "Device Manufacturer":"Samsung SM-J200Y",
        "Device Model":"Samsung SM-J200Y",
        "Product Description":"PREPAY PLUS - $0 -"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":993,
        "IMEI":360,
        "Data Volume (Bytes)":14096114,
        "Device Manufacturer":"Samsung SM-A300Y",
        "Device Model":"Samsung SM-A300Y",
        "Product Description":"$39.95 Carryover Plan"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":894,
        "IMEI":730,
        "Data Volume (Bytes)":9851177,
        "Device Manufacturer":"Samsung GT-N7105",
        "Device Model":"Samsung GT-N7105",
        "Product Description":"PREPAY STD - $0 - #2"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":600,
        "IMEI":70,
        "Data Volume (Bytes)":18420650,
        "Device Manufacturer":"Apple iPhone 5C (A1529)",
        "Device Model":"Apple iPhone 5C (A1529)",
        "Product Description":"PREPAY PLUS - $0 -"
      },
      {
        "Rec Open Date":"2016-05-30",
        "MSISDN":234,
        "IMEI":0,
        "Data Volume (Bytes)":1769661,
        "Device Manufacturer":"Galaxy S7 SM-G930F ",
        "Device Model":"Galaxy S7 SM-G930F",
        "Product Description":"$39.95 Plan"
      }
    ]
    

    【讨论】:

      猜你喜欢
      • 2016-02-12
      • 1970-01-01
      • 1970-01-01
      • 2023-03-08
      • 1970-01-01
      • 2018-03-06
      • 2019-08-15
      • 2020-11-06
      • 2019-02-01
      相关资源
      最近更新 更多