将嵌套字典写入 csv答案

【问题标题】：Writing Nested Dictionary to csv将嵌套字典写入 csv
【发布时间】：2018-10-27 02:49:03
【问题描述】：

我有一本字典：

dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}}

我想将此字典制成一个 csv 表格，其中最上面的键是最左边的列，子键是最上面一行的标题，随后的每一行都用子键值填充：

Location    a   b   c
Location1   1   2   3
Location2   4   5   6

我使用以下脚本已成功完成：

import csv

dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}}
fields = ["Location","a","b","c"]

with open(r"C:\Users\tyler.cowan\Desktop\tabulated.csv", "w", newline='') as f:
    w = csv.DictWriter(f, extrasaction='ignore', fieldnames = fields)
    w.writeheader()
    for k in dic:
        w.writerow({field: dic[k].get(field) or k for field in fields})

奇怪的是我把这个测试用例写成了一个真实的用例，结果相当于我的位置键被分发到其他列。现在我的第一个想法是，我一定搞砸了构建字典，但经过检查，我得到了完全相同格式的字典，除了更多的键值。然而输出像

Location    a   b   c   d           e   f   g   h
Location1   1   2   3   Location1   7   8   9   10
Location2   4   5   6   Location2   2   3   4   5

以下是我的完整脚本

# -*- coding: utf-8 -*-

import os
import csv


def pretty(d, indent=0):
    #prettify dict for visual Inspection
   for key, value in d.items():
      print('\t' * indent + str(key))
      if isinstance(value, dict):
         pretty(value, indent+1)
      else:
         if value == "":
             print("fubar")
         print('\t' * (indent+1) + str(value))



inFolder = "Folder"
dirList = os.listdir(inFolder)

#print(dirList)
fields = [ 'Lat-Long']
allData = {}
for file in dirList:
    fname, ext = os.path.splitext(file)
    if fname not in fields:
        fields.append(fname)

    #handle .dat in this block
    if ext.lower() == ".dat":
        #print("found dat ext: " + str(ext))
        with open(os.path.join(inFolder,file), "r") as f:
            for row in f:
                try:
                    row1 = row.split(" ")
                    if str(row1[0])+"-"+str(row1[1]) not in allData:
                        allData[str(row1[0])+"-"+str(row1[1])] = {}
                    else:
                        allData[str(row1[0])+"-"+str(row1[1])][fname] = row1[2]

                except IndexError:
                    row2 = row.split("\t")
                    if str(row2[0])+"-"+str(row2[1]) not in allData:
                        allData[str(row2[0])+"-"+str(row2[1])] = {}
                    else:
                        allData[str(row2[0])+"-"+str(row2[1])][fname] = "NA"

    elif ext.lower() == ".csv":
        with open(os.path.join(inFolder,file), "r") as f:
            for row in f:
                row1 = row.split(",")
                if str(row1[0])+"-"+str(row1[1]) not in allData:
                    allData[str(row1[0])+"-"+str(row1[1])] = {}
                else:
                    allData[str(row1[0])+"-"+str(row1[1])][fname] = row1[2]



pretty(allData)

with open("testBS.csv", "w", newline='') as f:
    w = csv.DictWriter(f, extrasaction='ignore', fieldnames = fields)
    w.writeheader()
    for k in allData:
        w.writerow({field: allData[k].get(field) or k for field in fields})

输入数据如下：

"example.dat"

32.1    101.3   65
32.1    101.3   66
32.1    101.3   67
32.1    101.3   68
32.1    101.3   69
32.1    101.3   70
32.1    101.3   71

我想弄清楚如何诊断和解决该行为，因为我似乎无法弄清楚测试和真实案例之间的区别。

【问题讨论】：

我推荐 pandas，如果你有的话。
我有 pandas 并且会回退到那个但想了解 vanilla 解决方案。

标签： python csv dictionary

【解决方案1】：

一种可能性是创建一个csv 标头，其中包含位置值以及所有子字典键的完整列表。这样，所有子字典的值都可以写在它们正确的“键”列下：

import csv
dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}, "Location3":{'e':7,'f':8, 'g':9, 'h':10}, "Location4":{'e': 2, 'f': 3, 'g': 4, 'h': 5}}
header = sorted(set(i for b in map(dict.keys, dic.values()) for i in b))
with open('filename.csv', 'w', newline="") as f:
  write = csv.writer(f)
  write.writerow(['location', *header])
  for a, b in dic.items():
     write.writerow([a]+[b.get(i, '') for i in header])

输出：

location,a,b,c,e,f,g,h
Location1,1,2,3,,,,
Location2,4,5,6,,,,
Location3,,,,7,8,9,10
Location4,,,,2,3,4,5

【讨论】：

为清楚起见，您还应该考虑循环和调用 writerow。
@cᴏʟᴅsᴘᴇᴇᴅ 这样就干净多了。请查看我最近的编辑。
@cᴏʟᴅsᴘᴇᴇᴅ 谢谢你的建议！
是的，但我不是，@cᴏʟᴅsᴘᴇᴇᴅ
@TylerCowan 最后一行创建了一个包含位置字符串和标头值的列表。 dict.get 尝试访问哈希到给定键的字典中的值，但如果该键不存在，则可以返回可选值。在这种情况下，该行遍历所有标题键并返回该迭代中存在的相应值，或者返回0。

【解决方案2】：

你可以用 pandas 来做。

import pandas as pd
dic = {"Location1":{"a":1,"b":2,"c":3},"Location2":{"a":4,"b":5,"c":6}, "Location3":{'e':7,'f':8, 'g':9, 'h':10}, "Location4":{'e': 2, 'f': 3, 'g': 4, 'h': 5}}
pd.DataFrame.from_dict(dic, orient='index').to_csv('temp.csv')

输出：

 ,a,b,c,e,f,g,h
 Location1,1.0,2.0,3.0,,,,
 Location2,4.0,5.0,6.0,,,,
 Location3,,,,7.0,8.0,9.0,10.0
 Location4,,,,2.0,3.0,4.0,5.0

【讨论】：