在 Python 中从 CSV 创建字典（或元组？）答案

【问题标题】：Creating a dictionary of dictionaries (or a tuple?) from CSV in Python在 Python 中从 CSV 创建字典（或元组？）
【发布时间】：2015-02-17 06:56:03
【问题描述】：

我正在寻找读取 CSV 文件并在 Python 中创建一个对象来存储大型数据集。数据位于 CSV 文件中（带有标题），每行中的前两个条目代表 X、Y 坐标。稍后在程序中，我将对每个 X、Y 坐标的数据进行排序和操作。

这里是示例数据：

x, y, field1, field2, field3
1, 2, 10, 20, 30
1, 2, 20, 30 40
7, 4, 2, 49, 39

我认为我想由此创建的对象如下所示：

位置，价值观

(1,2) => {field1=10,field2=20,field3=30},{field1=20,field2=30,field3=40}
(7,4) => {field1=2,field2=49,field3=39}

这是带有元组键的字典中的字典吗？我一直在网上搜索这方面的一个例子，但找不到它。这样处理数据有意义吗？

到目前为止，我一直在尝试将数据放入一个字典中，但我遇到了麻烦。下面的代码只打印标题：

import csv
import sys

dict={}

with open('data.csv') as file:
    data = csv.reader(file)
    headers = next(data)[0:]
    length = len(headers)
    for row in data:
        for i in range(length):
            dict[headers[i]]=row[i]

for x in dict:
    print x

【问题讨论】：

你所拥有的将是一本带有字典列表的字典。例如{(1,2):[{'field1':10,'field2':20},{'field1':20,'field2':30}], {(7,4):[{'field1':2,'field2':49}]

标签： python csv dictionary tuples

【解决方案1】：

import csv

# let's create a class to hold the data in each line
class Capsule:
    def __init__(x,y,f1,f2,f3):
        self.x = x
        self.y = y
        self.field1 = f1
        self.field2 = f2
        self.field3 = f3

# let's read the file
with open('/path/to/file') as infile:
    infile.readline()
    capsules = []
    for x, y, f1, f2, f3 in csv.reader(infile):
        capsules.append(Capsule(x,y,f1,f2,f3))


# done reading all data
# let's sort the list by x,y coordinates
capsules.sort(key=lambda c : (c.x, c.y))

列表的这种用法有助于对事物进行排序等。但是，如果您有兴趣了解特定坐标集中的对象，那么您最好使用字典：

with open('/path/to/file') as infile:
    infile.readline()
    capsules = {}
    for x, y, f1, f2, f3 in csv.reader(infile):
        if (x,y) not in capsules:
            capsules[(x,y)] = []
        capsules[(x,y)].append(Capsule(x,y,f1,f2,f3))

# sort by x,y coordinates:
sortedCapsules = [capsules[k] for k in sorted(capsules)]

【讨论】：

【解决方案2】：

假设您的 csv 结构是已知且固定的：

import csv
import sys
from collections import defaultdict

HEADERS = ["x", "y", "field1", "field2", "field3"]

def read_data(source):
    data = defaultdict(list)
    reader = csv.DictReader(source, fieldnames=HEADERS)
    next(reader) # skip headers
    for row in reader:
        # this will at once build the key tuple
        # and remove the "x" and "y" keys from the 
        # row dict
        key = row.pop("x"), row.pop("y")
        data[key].append(row)
    return data

with open('data.csv') as source:
    data = read_data(source)

print data

附带说明：不要使用 dict 或 file 作为 var 名称，特别是在顶层，因为它会影响内置的 dict 和 file 类型。

【讨论】：

感谢您的提示，我相应地更改了变量名！

【解决方案3】：

我认为这段代码会有所帮助

import csv
import sys

with open('data.csv') as file:
    data = csv.reader(file)
    headers = next(data)[0:]
    length = len(headers)

    res = dict()
    for row in data:

        fields = dict()
        for i in range(2,length):
            fields[headers[i]]=int(row[i])
        res[(int(row[0]),int(row[1]))] = fields

for x in res:
    print x,res[x]

【讨论】：