将行从 CSV 添加到新文件答案

【问题标题】：Adding rows from CSV to new file将行从 CSV 添加到新文件
【发布时间】：2020-08-29 21:05:40
【问题描述】：

您好，我希望对从 CSV 文件读取的文档进行汇总和添加数字。

例如我的 csv 看起来像这样

Date,Customer number,Customer,Project number,Project,Worked time
2020,2020010,Apple,12345,Buying laptops,1,00
2020,2020010,Apple,12345,Buying laptops,4,00
2020,2020010,Apple,12345,Buying laptops,3,00
2020,2020010,Nokia,98738,Buying phones,1,00
2020,2020010,Nokia,98738,Buying phones,4,00
2020,2020010,Apple,12345,Buying laptops,3,00

我想将它输出到一个 csv 文件，并让脚本像这样总结每个客户的工作时间数

苹果，11 诺基亚，5

到目前为止我只有这个

 
results = []
with open('Time_export.csv') as File:
    reader = csv.DictReader(File)
    for row in reader:
        results.append(row)
    print (results)

我是这方面的新手 :) 一直在尝试谷歌但无法弄清楚:( 有什么想法吗？

【问题讨论】：

您的标题有 6 列，但您的行有 7 列。这里的事情似乎没有排列。工作时间是“1,00”（注意逗号）吗？如果是这样，那不是一个有效的 CSV 文件，逗号应该被转义。这使得在 csv 阅读器上获得正确的列变得更加困难。

标签： python sum summary

【解决方案1】：

我发现collections.defaultdict 对这类事情很有用。它会根据需要自动创建新的键/值对。在这种情况下，默认为 int，它会根据需要创建 0。

import csv
import collections

with open('Time_export.csv') as File:
    results = collections.defaultdict(int)
    reader = csv.DictReader(File)
    for row in reader:
        results[row['Customer']] += int(row['Worked time'])

for name, num in sorted(results.items()):
    print(f"{name}: {num}")

【讨论】：

【解决方案2】：

使用字典存储客户姓名和总数：

import csv

data = '''
Date,Customer number,Customer,Project number,Project,Worked time
2020,2020010,Apple,12345,Buying laptops,1,00
2020,2020010,Apple,12345,Buying laptops,4,00
2020,2020010,Apple,12345,Buying laptops,3,00
2020,2020010,Nokia,98738,Buying phones,1,00
2020,2020010,Nokia,98738,Buying phones,4,00
2020,2020010,Apple,12345,Buying laptops,3,00
'''.strip()

with open('Time_export.csv','w') as f: f.write(data)  # write test file

################################

cust = {}  # customer totals

with open('Time_export.csv') as File:
    reader = csv.DictReader(File)
    for row in reader:
        if row['Customer'] in cust:
           cust[row['Customer']] += int(row['Worked time'])
        else:
           cust[row['Customer']] = int(row['Worked time'])
        
    print (cust)

输出

{'Apple': 11, 'Nokia': 5}

如果你想试试 Pandas，代码会变小：

import pandas
df = pandas.read_csv('Time_export.csv', index_col=False )
df['Worked time'] = df['Worked time'].astype(int)
gb = df.groupby('Customer')["Worked time"].sum().reset_index()
print(gb.to_string(index=False))

输出

Customer  Worked time
   Apple           11
   Nokia            5

【讨论】：

【解决方案3】：

pandas 是一个用于处理表格的强大库。它很难学习，但值得努力。您的数据在“工作时间”列中使用逗号，使其无效 CSV。如果将其更改为“。”或正确转义，然后您可以在几行代码中完成工作。

import pandas as pd
df = pd.read_csv('Time_export.csv')
sums = df.groupby("Customer")["Worked time"].sum()

这按客户分组，删除除“工作时间”列之外的所有列，然后对组求和。结果是一个行为很像字典的系列对象：

>>> print(sums)
Customer
Apple    11.0
Nokia     5.0
Name: Worked time, dtype: float64
>>> for name, val in sorted(sums.items()):
...     print(name, val)
... 
Apple 11.0
Nokia 5.0
>>> print(sums["Apple"])
11.0

【讨论】：