【问题标题】:Convert CSV to a nested JSON while formatting values for specific keys to numeric/int/float将 CSV 转换为嵌套 JSON,同时将特定键的值格式化为 numeric/int/float
【发布时间】:2019-11-26 08:25:31
【问题描述】:

我正在尝试将 CSV 文件转换为嵌套 JSON,这是我的 CSV,第一行作为列。

CLID,District, attribute,value
C001,Tebuslik, Name,Philip
C001,Tebuslik,Age,34
C002,Hontenlo,Name,Jane
C002,Hontenlo,Age,23

我想要的输出是一个嵌套的 json,其中 Age 键的值是数字而不是字符串。

[
    {
        "CLID": "C001",
        "District": "Tebuslik",
        "attributes": [
            {
                "attribute": "Name",
                "value": "Philip"
            },
            {
                "attribute": "Age",
                "value": 34
            }
        ]
    },
    {
        "CLID": "C002",
        "District": "Hontenlo",
        "attributes": [
            {
                "attribute": "Name",
                "value": "Jane"
            },
            {
                "attribute": "Age",
                "value": 23
            }
        ]
    }
]

在我的 CSV 中,所有键共享同一列(属性),值可以是字符串或数字格式,具体取决于属性。

这是我的 python 脚本,它运行了一半:

from csv import DictReader
from itertools import groupby
from pprint import pprint
import json

with open('teis.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

    groups = []
    uniquekeys = []

    for k, g in groupby(data, lambda r: (r['CLID'], r['District'])):
        groups.append({
            "CLID": k[0],
            "District": k[1],
            "attributes": [{k:v for k, v in d.items() if k not in ['CLID','District']} for d in list(g)]
        })
        uniquekeys.append(k)

print(json.dumps(groups, indent = 4) + '\n}')

但是,下面是我得到的带有引用数字年龄值的输出;

[
    {
        "CLID": "C001",
        "District": "Tebuslik",
        "attributes": [
            {
                "attribute": "Name",
                "value": "Philip"
            },
            {
                "attribute": "Age",
                "value": "34"
            }
        ]
    },
    {
        "CLID": "C002",
        "District": "Hontenlo",
        "attributes": [
            {
                "attribute": "Name",
                "value": "Jane"
            },
            {
                "attribute": "Age",
                "value": "23"
            }
        ]
    }
]

【问题讨论】:

    标签: python json python-3.x csvtojson


    【解决方案1】:

    使用str.isdigit 检查字符串,然后使用int

    例如:

    from csv import DictReader
    from itertools import groupby
    from pprint import pprint
    import json
    
    with open(filename) as csvfile:
        r = DictReader(csvfile, skipinitialspace=True)
        data = [dict(d) for d in r]
    
        groups = []
        uniquekeys = []
    
        for k, g in groupby(data, lambda r: (r['CLID'], r['District'])):
            groups.append({
                "CLID": k[0],
                "District": k[1],
                "attributes": [{k:int(v) if v.isdigit() else v for k, v in d.items() if k not in ['CLID','District']} for d in list(g)]  #Update
            })
            uniquekeys.append(k)
    
    print(json.dumps(groups, indent = 4) + '\n}')
    

    【讨论】:

    • 你拯救了我的一天。我不知道我会在 for 块中使用这种智能字符串方法。我想如果我扩展我的 CSV,我也可以检查日期值。
    • 如何处理浮点值以使它们没有引号?
    • 您可以调用float() 并使用try/except 捕获错误。 @Aleu
    猜你喜欢
    • 2021-10-29
    • 1970-01-01
    • 1970-01-01
    • 2019-05-28
    • 1970-01-01
    • 1970-01-01
    • 2021-11-04
    相关资源
    最近更新 更多