【问题标题】:Nesting dictionaries in python with a csv and I need to decrement or increment a date用csv在python中嵌套字典,我需要减少或增加日期
【发布时间】:2018-08-17 12:02:01
【问题描述】:

我很兴奋,因为我从朋友那里得到了一些提示。我正在尝试使用循环创建字典字典。我正在使用的 csv 的日期为 2008 年至 2014 年,我使用日期作为键。 csv 看起来像这样:
年份,title_field,值

2014,Total Housing Units,49109
2014,Vacant Housing Units,2814
2014,Occupied Housing Units,46295
2013,Total Housing Units,47888
2013,Vacant Housing Units,4215
2013,Occupied Housing Units,43673
2012,Total Housing Units,45121
2012,Vacant Housing Units,3013
2012,Occupied Housing Units,42108
2011,Total Housing Units,44917
2011,Vacant Housing Units,4213
2011,Occupied Housing Units,40704
2010,Total Housing Units,44642
2010,Vacant Housing Units,3635
2010,Occupied Housing Units,41007
2009,Total Housing Units,39499
2009,Vacant Housing Units,3583
2009,Occupied Housing Units,35916
2008,Total Housing Units,41194
2008,Vacant Housing Units,4483
2008,Occupied Housing Units,36711

这是我的代码:

import csv

denton_housing = {}
filename = 'denton_housing.csv'
key = 2014

with open(filename, 'r', encoding='utf8', newline='') as f:
    for row in csv.DictReader(f, delimiter=','):
        while key not in denton_housing:
            denton_housing[key] = {}
            denton_housing[key][row['title_field']] = int(row['value'])
            key-1

当我打印出来时,我得到:

{2014: {'Total Housing Units': 49109}}

这太棒了!但我需要这个:

{2014: {'Total Housing Units': 49109}, {'Vacant Housing Units': 2814}, \    {'Occupied Housing Units': 46295}}

更重要的是,我需要它在 2013 年、2012 年、2011 年、2010 年、2009 年和 2008 年循环并执行相同的操作,但到此为止。

【问题讨论】:

  • 您显示为所需结果的数据结构不是有效的python。您可能想要的是一个字典,其中的值是字典列表。这看起来像 {2014: [{'Total Housing Units': 49109}, {'Vacant Housing Units': 2814], 2015: [...]} 注意额外的方括号。小改变,大不同!
  • 您是否想要一个将 2014 映射到三个单键 dict 元组的字典,如您所展示的那样,或者一个将 2014 映射到具有三个键的字典的字典,这可能是更有用?

标签: python loops csv dictionary


【解决方案1】:

您可以使用itertools.groupby 创建一个字典,其中每个键是一年,使用Total Housing UnitsVacant Housing Units 存储字典:

import itertools
import csv
with open('filename.csv') as f:
   data = [[int(a), b, int(c)] for a, b, c in sorted(list(csv.reader(f))[1:], key=lambda x:int(x[0]))]

final_data = {a:dict(i[1:] for i in b) for a, b in itertools.groupby(data, key=lambda x:x[0])}

输出:

{2008: {'Total Housing Units': 41194, 'Vacant Housing Units': 4483, 'Occupied Housing Units': 36711}, 2009: {'Total Housing Units': 39499, 'Vacant Housing Units': 3583, 'Occupied Housing Units': 35916}, 2010: {'Total Housing Units': 44642, 'Vacant Housing Units': 3635, 'Occupied Housing Units': 41007}, 2011: {'Total Housing Units': 44917, 'Vacant Housing Units': 4213, 'Occupied Housing Units': 40704}, 2012: {'Total Housing Units': 45121, 'Vacant Housing Units': 3013, 'Occupied Housing Units': 42108}, 2013: {'Total Housing Units': 47888, 'Vacant Housing Units': 4215, 'Occupied Housing Units': 43673}, 2014: {'Total Housing Units': 49109, 'Vacant Housing Units': 2814, 'Occupied Housing Units': 46295}}

【讨论】:

  • 当我带来 final_data 时,我似乎明白了这一点:ValueError: invalid literal for int() with base 10: 'year'
  • @ArchivistG 您的 csv 文件中可能有一个附加列。我完全根据您在问题中发布的数据对此进行了测试。你能澄清一下你的 csv 文件有多少列吗?
  • 只有3个,'year','title_field',和'value'
  • @ArchivistG 啊,我刚刚意识到该文件包含一个标题行。请查看我最近的编辑。
  • 那是我的错。我没有包括标题。啊。对不起。
【解决方案2】:
In[2]: import csv
  ...: from collections import defaultdict
  ...: 
  ...: denton_housing = defaultdict(dict)
  ...: filename = 'denton_housing.csv'
  ...: key = 2014
  ...: 
  ...: with open(filename, 'r', encoding='utf8', newline='') as f:
  ...:     for row in csv.DictReader(f):
  ...:         denton_housing[row['year']].update({
  ...:             row['title_field']: int(row['value'])
  ...:         })
  ...: 
In[3]: import json
In[4]: print(json.dumps(denton_housing, indent=4))
{
    "2014": {
        "Total Housing Units": 49109,
        "Vacant Housing Units": 2814,
        "Occupied Housing Units": 46295
    },
    "2013": {
        "Total Housing Units": 47888,
        "Vacant Housing Units": 4215,
        "Occupied Housing Units": 43673
    },
    "2012": {
        "Total Housing Units": 45121,
        "Vacant Housing Units": 3013,
        "Occupied Housing Units": 42108
    },
    "2011": {
        "Total Housing Units": 44917,
        "Vacant Housing Units": 4213,
        "Occupied Housing Units": 40704
    },
    "2010": {
        "Total Housing Units": 44642,
        "Vacant Housing Units": 3635,
        "Occupied Housing Units": 41007
    },
    "2009": {
        "Total Housing Units": 39499,
        "Vacant Housing Units": 3583,
        "Occupied Housing Units": 35916
    },
    "2008": {
        "Total Housing Units": 41194,
        "Vacant Housing Units": 4483,
        "Occupied Housing Units": 36711
    }
}

【讨论】:

    【解决方案3】:

    这里的诀窍是,您不希望每个值中都有一个dict,而是需要dicts 中的list。 (实际上,如果您想要您指定的确切输出,tuple,但我认为您会对 list 感到满意。)

    所以,不要这样:

    denton_housing[key] = {}
    

    ……这样做:

    denton_housing[key] = []
    

    现在,而不是设置

    而不是这个:

    denton_housing[key][row['title_field']] = int(row['value'])
    

    ……这样做:

    new_dict = {}
    new_dict[row['title_field']] = int(row['value'])
    denton_housing[key].append(new_dict)
    

    另外,始终执行第二部分,而不仅仅是在key not in 时。 (另外,你可能想要if,而不是while。)

    您可以从这里简化事情,并构建一个更好的数据结构(您有三个单键字典;一个具有多个键的字典不是更好吗?)等等。但希望这对你来说很容易理解,这样你就可以摆脱困境,从那里开始疯狂。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-12-30
      • 1970-01-01
      • 2022-11-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多