以标准化形式将 Json 转换为 csv答案

【问题标题】：Json to csv in normalized form以标准化形式将 Json 转换为 csv
【发布时间】：2016-01-25 01:28:11
【问题描述】：

Json 格式：

[{"studios": [{"studioId": 539}, {"studioId": 540}], 
"id": 843, 
"title": "In the Mood for Love",
"crew": [{"personId": 12453, "department": "Directing", "job": "Director"}, {"personId": 12454, "department": "Sound", "job": "Music"}, {"personId": 12455, "department": "Sound", "job": "Original Music Composer"}, {"personId": 1357, "department": "Camera", "job": "Director of Photography"}, {"personId": 12453, "department": "Writing", "job": "Screenplay"}, {"personId": 12453, "department": "Production", "job": "Producer"}, {"personId": 21909, "department": "Production", "job": "Executive Producer"}, {"personId": 45818, "department": "Editing", "job": "Editor"}, {"personId": 232804, "department": "Camera", "job": "Director of Photography"}, {"personId": 12667, "department": "Camera", "job": "Director of Photography"}],
"releases": [{"releasedate": "2000-11-08", "country": "FR"}, {"releasedate": "2000-05-22", "country": "US"}]
"genres": ["Drama", "Romance"], 
"releasedate": "2000-05-22", 
"cast": [{"personId": 1337, "character": "Chow Mo-Wan", "order": 0}, {"personId": 1338, "character": "Su Li-Zhen", "order": 1}, {"personId": 12674, "character": "Ah Ping", "order": 2}, {"personId": 12462, "character": "Mrs. Suen", "order": 4}, {"personId": 12463, "character": "Mr. Ho", "order": 5}, {"personId": 12464, "character": "", "order": 6}, {"personId": 12465, "character": "", "order": 7}, {"personId": 12466, "character": "Mr. Chan", "order": 8}, {"personId": 12467, "character": "The Amah", "order": 9}, {"personId": 12468, "character": "", "order": 10}, {"personId": 12469, "character": "", "order": 11}, {"personId": 12470, "character": "Mrs. Chow", "order": 12}], 
"userrating": 7.6}]

我正在尝试将其转换为 .csv 文件。但是我遇到了错误。我希望 csv 文件在 1NF 中进行规范化，以便我可以直接传输到某个数据库

我的代码：

import json
import csv
with open("data3.json") as json_file, open("data3.csv", "w",encoding='utf-8') as csv_file:
    csv_file = csv.writer(csv_file)
    a = json.load(json_file)
    csv_file.writerow(["StudiosId", "Id", "Title","personId","Department","Job","ReleaseDate","PosterLink","Genres","Cast","Runtime"])
    for item in a:
        csv_file.writerow([item["studios"], item["id"], item["title"],item["crew"][0],item["crew"][1],item["crew"][2],item["poster"],item["genres"],item["releasedate"],item["cast"],item["runtime"]])

错误：

Traceback (most recent call last):
File "<stdin>", line 7, in <module>
IndexError: list index out of range

【问题讨论】：

问题是您使用的 Python 2 不支持 encoding 参数（它是 Python 3 中的新功能）。安装 Python 3 或查看stackoverflow.com/questions/10971033/…。
我也尝试过使用 Python 3.5.0。它给出了以下错误： Traceback (last recent call last): File "", line 7, in TypeError: list indices must be integers or slices, not str

标签： python json csv normalization

【解决方案1】：

您的 JSON 结构不正确（缺少逗号），代码看起来没问题。试试这个：

[{"studios": [{"studioId": 539}, {"studioId": 540}], 
"id": 843, 
"title": "In the Mood for Love",
"crew": [{"personId": 12453, "department": "Directing", "job": "Director"}, {"personId": 12454, "department": "Sound", "job": "Music"}, {"personId": 12455, "department": "Sound", "job": "Original Music Composer"}, {"personId": 1357, "department": "Camera", "job": "Director of Photography"}, {"personId": 12453, "department": "Writing", "job": "Screenplay"}, {"personId": 12453, "department": "Production", "job": "Producer"}, {"personId": 21909, "department": "Production", "job": "Executive Producer"}, {"personId": 45818, "department": "Editing", "job": "Editor"}, {"personId": 232804, "department": "Camera", "job": "Director of Photography"}, {"personId": 12667, "department": "Camera", "job": "Director of Photography"}],
"releases": [{"releasedate": "2000-11-08", "country": "FR"}, {"releasedate": "2000-05-22", "country": "US"}],
"genres": ["Drama", "Romance"], 
"releasedate": "2000-05-22", 
"cast": [{"personId": 1337, "character": "Chow Mo-Wan", "order": 0}, {"personId": 1338, "character": "Su Li-Zhen", "order": 1}, {"personId": 12674, "character": "Ah Ping", "order": 2}, {"personId": 12462, "character": "Mrs. Suen", "order": 4}, {"personId": 12463, "character": "Mr. Ho", "order": 5}, {"personId": 12464, "character": "", "order": 6}, {"personId": 12465, "character": "", "order": 7}, {"personId": 12466, "character": "Mr. Chan", "order": 8}, {"personId": 12467, "character": "The Amah", "order": 9}, {"personId": 12468, "character": "", "order": 10}, {"personId": 12469, "character": "", "order": 11}, {"personId": 12470, "character": "Mrs. Chow", "order": 12}], 
"userrating": 7.6}]

您提供的示例中也没有 poster 或 runtime 键。

编辑更新。

import csv
import json

with open('data.json', 'r') as f, open("data3.csv", "w") as csv_file:
    csv_file = csv.writer(csv_file)
    a = json.load(f)
    csv_file.writerow(["StudiosId", "Id", "Title","personId","Department","Job","ReleaseDate","PosterLink","Genres","Cast","Runtime"])
    for item in a:
        csv_file.writerow([item["studios"], item["id"], item["title"],item["crew"][0]['personId'],item["crew"][0]['department'],item["crew"][0]['job'],item["genres"],item["releasedate"],item["cast"]])

【讨论】：

无法区分“crew”和“cast”，因为它们有多个值。
@Blank 已更新，这对我来说效果很好，但如果没有发布更完整的示例。
我想要 3 个不同表中的人员数据，首先是 personId，其次是 Job，最后是 Department。现在只是把导演、音乐、编剧等分开了。
@Blank 那么您将需要重新考虑您正在使用的方法，这与所提出的问题无关。为什么不直接进入数据库？它会为您省去很多麻烦。
我使用这种方法是因为我觉得将 csv 文件导入任何数据库都会简单得多。