【问题标题】:Creating a dictionary of dictionaries (or a tuple?) from CSV in Python在 Python 中从 CSV 创建字典(或元组?)
【发布时间】:2015-02-17 06:56:03
【问题描述】:

我正在寻找读取 CSV 文件并在 Python 中创建一个对象来存储大型数据集。数据位于 CSV 文件中(带有标题),每行中的前两个条目代表 X、Y 坐标。稍后在程序中,我将对每个 X、Y 坐标的数据进行排序和操作。

这里是示例数据:

x, y, field1, field2, field3
1, 2, 10, 20, 30
1, 2, 20, 30 40
7, 4, 2, 49, 39

我认为我想由此创建的对象如下所示:

位置,价值观

(1,2) => {field1=10,field2=20,field3=30},{field1=20,field2=30,field3=40}
(7,4) => {field1=2,field2=49,field3=39}

这是带有元组键的字典中的字典吗?我一直在网上搜索这方面的一个例子,但找不到它。这样处理数据有意义吗?

到目前为止,我一直在尝试将数据放入一个字典中,但我遇到了麻烦。下面的代码只打印标题:

import csv
import sys

dict={}

with open('data.csv') as file:
    data = csv.reader(file)
    headers = next(data)[0:]
    length = len(headers)
    for row in data:
        for i in range(length):
            dict[headers[i]]=row[i]

for x in dict:
    print x

【问题讨论】:

  • 你所拥有的将是一本带有字典列表的字典。例如{(1,2):[{'field1':10,'field2':20},{'field1':20,'field2':30}], {(7,4):[{'field1':2,'field2':49}]

标签: python csv dictionary tuples


【解决方案1】:
import csv

# let's create a class to hold the data in each line
class Capsule:
    def __init__(x,y,f1,f2,f3):
        self.x = x
        self.y = y
        self.field1 = f1
        self.field2 = f2
        self.field3 = f3

# let's read the file
with open('/path/to/file') as infile:
    infile.readline()
    capsules = []
    for x, y, f1, f2, f3 in csv.reader(infile):
        capsules.append(Capsule(x,y,f1,f2,f3))


# done reading all data
# let's sort the list by x,y coordinates
capsules.sort(key=lambda c : (c.x, c.y))

列表的这种用法有助于对事物进行排序等。但是,如果您有兴趣了解特定坐标集中的对象,那么您最好使用字典:

with open('/path/to/file') as infile:
    infile.readline()
    capsules = {}
    for x, y, f1, f2, f3 in csv.reader(infile):
        if (x,y) not in capsules:
            capsules[(x,y)] = []
        capsules[(x,y)].append(Capsule(x,y,f1,f2,f3))

# sort by x,y coordinates:
sortedCapsules = [capsules[k] for k in sorted(capsules)]

【讨论】:

    【解决方案2】:

    假设您的 csv 结构是已知且固定的:

    import csv
    import sys
    from collections import defaultdict
    
    HEADERS = ["x", "y", "field1", "field2", "field3"]
    
    def read_data(source):
        data = defaultdict(list)
        reader = csv.DictReader(source, fieldnames=HEADERS)
        next(reader) # skip headers
        for row in reader:
            # this will at once build the key tuple
            # and remove the "x" and "y" keys from the 
            # row dict
            key = row.pop("x"), row.pop("y")
            data[key].append(row)
        return data
    
    with open('data.csv') as source:
        data = read_data(source)
    
    print data
    

    附带说明:不要使用 dictfile 作为 var 名称,特别是在顶层,因为它会影响内置的 dictfile 类型。

    【讨论】:

    • 感谢您的提示,我相应地更改了变量名!
    【解决方案3】:

    我认为这段代码会有所帮助

    import csv
    import sys
    
    with open('data.csv') as file:
        data = csv.reader(file)
        headers = next(data)[0:]
        length = len(headers)
    
        res = dict()
        for row in data:
    
            fields = dict()
            for i in range(2,length):
                fields[headers[i]]=int(row[i])
            res[(int(row[0]),int(row[1]))] = fields
    
    for x in res:
        print x,res[x]
    

    【讨论】:

      猜你喜欢
      • 2016-11-20
      • 2021-05-15
      • 2021-03-02
      • 2016-08-19
      • 2017-05-07
      • 2018-04-07
      • 2013-11-18
      • 2014-11-11
      • 1970-01-01
      相关资源
      最近更新 更多