【问题标题】:Json to csv in normalized form以标准化形式将 Json 转换为 csv
【发布时间】:2016-01-25 01:28:11
【问题描述】:

Json 格式:

[{"studios": [{"studioId": 539}, {"studioId": 540}], 
"id": 843, 
"title": "In the Mood for Love",
"crew": [{"personId": 12453, "department": "Directing", "job": "Director"}, {"personId": 12454, "department": "Sound", "job": "Music"}, {"personId": 12455, "department": "Sound", "job": "Original Music Composer"}, {"personId": 1357, "department": "Camera", "job": "Director of Photography"}, {"personId": 12453, "department": "Writing", "job": "Screenplay"}, {"personId": 12453, "department": "Production", "job": "Producer"}, {"personId": 21909, "department": "Production", "job": "Executive Producer"}, {"personId": 45818, "department": "Editing", "job": "Editor"}, {"personId": 232804, "department": "Camera", "job": "Director of Photography"}, {"personId": 12667, "department": "Camera", "job": "Director of Photography"}],
"releases": [{"releasedate": "2000-11-08", "country": "FR"}, {"releasedate": "2000-05-22", "country": "US"}]
"genres": ["Drama", "Romance"], 
"releasedate": "2000-05-22", 
"cast": [{"personId": 1337, "character": "Chow Mo-Wan", "order": 0}, {"personId": 1338, "character": "Su Li-Zhen", "order": 1}, {"personId": 12674, "character": "Ah Ping", "order": 2}, {"personId": 12462, "character": "Mrs. Suen", "order": 4}, {"personId": 12463, "character": "Mr. Ho", "order": 5}, {"personId": 12464, "character": "", "order": 6}, {"personId": 12465, "character": "", "order": 7}, {"personId": 12466, "character": "Mr. Chan", "order": 8}, {"personId": 12467, "character": "The Amah", "order": 9}, {"personId": 12468, "character": "", "order": 10}, {"personId": 12469, "character": "", "order": 11}, {"personId": 12470, "character": "Mrs. Chow", "order": 12}], 
"userrating": 7.6}]

我正在尝试将其转换为 .csv 文件。但是我遇到了错误。我希望 csv 文件在 1NF 中进行规范化,以便我可以直接传输到某个数据库

我的代码:

import json
import csv
with open("data3.json") as json_file, open("data3.csv", "w",encoding='utf-8') as csv_file:
    csv_file = csv.writer(csv_file)
    a = json.load(json_file)
    csv_file.writerow(["StudiosId", "Id", "Title","personId","Department","Job","ReleaseDate","PosterLink","Genres","Cast","Runtime"])
    for item in a:
        csv_file.writerow([item["studios"], item["id"], item["title"],item["crew"][0],item["crew"][1],item["crew"][2],item["poster"],item["genres"],item["releasedate"],item["cast"],item["runtime"]])

错误:

Traceback (most recent call last):
File "<stdin>", line 7, in <module>
IndexError: list index out of range

【问题讨论】:

  • 问题是您使用的 Python 2 不支持 encoding 参数(它是 Python 3 中的新功能)。安装 Python 3 或查看stackoverflow.com/questions/10971033/…
  • 我也尝试过使用 Python 3.5.0。它给出了以下错误: Traceback (last recent call last): File "", line 7, in TypeError: list indices must be integers or slices, not str

标签: python json csv normalization


【解决方案1】:

您的 JSON 结构不正确(缺少逗号),代码看起来没问题。试试这个:

[{"studios": [{"studioId": 539}, {"studioId": 540}], 
"id": 843, 
"title": "In the Mood for Love",
"crew": [{"personId": 12453, "department": "Directing", "job": "Director"}, {"personId": 12454, "department": "Sound", "job": "Music"}, {"personId": 12455, "department": "Sound", "job": "Original Music Composer"}, {"personId": 1357, "department": "Camera", "job": "Director of Photography"}, {"personId": 12453, "department": "Writing", "job": "Screenplay"}, {"personId": 12453, "department": "Production", "job": "Producer"}, {"personId": 21909, "department": "Production", "job": "Executive Producer"}, {"personId": 45818, "department": "Editing", "job": "Editor"}, {"personId": 232804, "department": "Camera", "job": "Director of Photography"}, {"personId": 12667, "department": "Camera", "job": "Director of Photography"}],
"releases": [{"releasedate": "2000-11-08", "country": "FR"}, {"releasedate": "2000-05-22", "country": "US"}],
"genres": ["Drama", "Romance"], 
"releasedate": "2000-05-22", 
"cast": [{"personId": 1337, "character": "Chow Mo-Wan", "order": 0}, {"personId": 1338, "character": "Su Li-Zhen", "order": 1}, {"personId": 12674, "character": "Ah Ping", "order": 2}, {"personId": 12462, "character": "Mrs. Suen", "order": 4}, {"personId": 12463, "character": "Mr. Ho", "order": 5}, {"personId": 12464, "character": "", "order": 6}, {"personId": 12465, "character": "", "order": 7}, {"personId": 12466, "character": "Mr. Chan", "order": 8}, {"personId": 12467, "character": "The Amah", "order": 9}, {"personId": 12468, "character": "", "order": 10}, {"personId": 12469, "character": "", "order": 11}, {"personId": 12470, "character": "Mrs. Chow", "order": 12}], 
"userrating": 7.6}]

您提供的示例中也没有 posterruntime 键。

编辑更新。

import csv
import json

with open('data.json', 'r') as f, open("data3.csv", "w") as csv_file:
    csv_file = csv.writer(csv_file)
    a = json.load(f)
    csv_file.writerow(["StudiosId", "Id", "Title","personId","Department","Job","ReleaseDate","PosterLink","Genres","Cast","Runtime"])
    for item in a:
        csv_file.writerow([item["studios"], item["id"], item["title"],item["crew"][0]['personId'],item["crew"][0]['department'],item["crew"][0]['job'],item["genres"],item["releasedate"],item["cast"]])

【讨论】:

  • 无法区分“crew”和“cast”,因为它们有多个值。
  • @Blank 已更新,这对我来说效果很好,但如果没有发布更完整的示例。
  • 我想要 3 个不同表中的人员数据,首先是 personId,其次是 Job,最后是 Department。现在只是把导演、音乐、编剧等分开了。
  • @Blank 那么您将需要重新考虑您正在使用的方法,这与所提出的问题无关。为什么不直接进入数据库?它会为您省去很多麻烦。
  • 我使用这种方法是因为我觉得将 csv 文件导入任何数据库都会简单得多。
猜你喜欢
  • 1970-01-01
  • 2013-09-02
  • 1970-01-01
  • 2015-03-26
  • 2012-10-20
  • 1970-01-01
  • 2023-02-07
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多