【问题标题】:create a nested dictionary from two csv files从两个 csv 文件创建嵌套字典
【发布时间】:2017-07-25 01:57:17
【问题描述】:

我有两个 csv 文件
文件1.csv:

ID,map1,map2  
a,x1,x2  
b,y1,  
c,z1,z2  

file2.csv:

ID,map1Val1,map1Val2,map2Val1
a,a1,a2,l1
b,b1,b2,
c,c1,c2,n1

我希望输出看起来像:

{'ID': {'map1':['map1Val1','map1Val2'], 'map2':'map2Val1'},'a': {'x1':['a1','a2'], 'x2':'l1'},'b': {'y1':['b1','b2']},'c': {'z1':['c1','c2'], 'z2':'n1'},}  

我想不出任何方法来创建它。到目前为止,我只有一个从一个 csv 文件创建字典的代码:

import csv
new_data_dict = {}
with open("file1.csv", 'r') as map_file:
    mapping = csv.DictReader(map_file, delimiter=",")
    for row in mapping:
        new_data_dict= {row[0]:{row[1],row[2]}}
print new_data_dict

输出:

{"ID":{map1,map2}, "a":{x1,x2}, "b":{y1}, "a":{z1,z2}}

【问题讨论】:

    标签: python csv dictionary nested key-value


    【解决方案1】:

    这是一个更动态的解决方案,允许您预先配置来自file1 的哪些列映射到来自file2 的哪些列:

    import csv
    
     = {'map1': ['map1Val1', 'map1Val2'],
                  'map2': ['map2Val1']
                  }
    
    joined_data = dict()
    joined_data['ID'] = column_map
    
    with open("file1.txt") as f1, open("file2.txt") as f2:
        key_list = list(csv.DictReader(f1))
        value_list = list(csv.DictReader(f2))
    
    for kl, vl in zip(key_list, value_list):
        inner = {}
        for key, value_list in column_map.items():
            if kl[key]:
                inner[kl[key]] = [vl[el] for el in value_list]
    
        joined_data[kl['ID']] = inner
    

    csv.DictReader 的使用让我们将每一行的数据映射到一个dict,其键(默认情况下)由文件的第一行给出。两个DictReader 对象被转换为列表并使用zip 进行迭代。使用column_map 作为我们的指导,我们创建了一个新的inner 字典,将key_list 中的键与value_list 中的值相关联。

    编辑

    对于完全动态的解决方案,您可以通过比较来自file1 的列标题与来自file2 的列标题来动态创建column_map

    import csv
    from collections import defaultdict
    
    joined_data = dict()
    column_map = defaultdict(list)
    
    with open("file1.txt") as f1, open("file2.txt") as f2:
        kh = next(f1).strip()
        vh = next(f2).strip()
        key_headers = kh.split(',')
        value_headers = vh.split(',')
    
        [column_map[k].append(v) for k in key_headers[1:] for v in value_headers[1:] if v.startswith(k)]
        joined_data['ID'] = dict(column_map)
    
        key_list = list(csv.DictReader(f1, fieldnames=key_headers))
        value_list = list(csv.DictReader(f2, fieldnames=value_headers))
    
    for kl, vl in zip(key_list, value_list):
        inner = {}
        for key, value_list in column_map.items():
            if kl[key]:
                inner[kl[key]] = [vl[el] for el in value_list]
    
        joined_data[kl['ID']] = inner
    

    【讨论】:

      【解决方案2】:

      您可以使用zip 聚合来自两个 csv 文件的行:

      >>> list(zip([1,2,3], [4,5,6]))   # assume 1, 2, 3 /  4, 5, 6 as row values
      [(1, 4), (2, 5), (3, 6)]
      

      import csv
      
      new_data_dict = {}
      with open('file1.csv') as f1, open('file2.csv') as f2:
          reader1, reader2 = csv.reader(f1), csv.reader(f2)
          for row1, row2 in zip(reader1, reader2):
              id_, map1, map2 = row1
              new_data_dict[id_] = {map1: row2[1:3]}
              map2 = map2.strip()
              if map2:  # put map2 only if map2 key exists
                  new_data_dict[id_][map2] = row2[3]
      

      new_data_dict 变为:

      {'ID': {'map1': ['map1Val1', 'map1Val2'], 'map2': 'map2Val2'},
       'a': {'x1': ['a1', 'a2'], 'x2': 'l1'},
       'b': {'y1': ['b1', 'b2']},
       'c': {'z1': ['c1', 'c2'], 'z2  ': 'n1'}}
      

      【讨论】:

      • 感谢您的回答。但我更喜欢@waterboy5281 的回答,因为它更有活力。
      猜你喜欢
      • 1970-01-01
      • 2021-02-20
      • 1970-01-01
      • 1970-01-01
      • 2021-08-13
      • 1970-01-01
      • 2021-03-04
      • 2020-06-28
      • 2016-11-22
      相关资源
      最近更新 更多