【问题标题】:Adding rows from CSV to new file将行从 CSV 添加到新文件
【发布时间】:2020-08-29 21:05:40
【问题描述】:

您好,我希望对从 CSV 文件读取的文档进行汇总和添加数字。

例如我的 csv 看起来像这样

Date,Customer number,Customer,Project number,Project,Worked time
2020,2020010,Apple,12345,Buying laptops,1,00
2020,2020010,Apple,12345,Buying laptops,4,00
2020,2020010,Apple,12345,Buying laptops,3,00
2020,2020010,Nokia,98738,Buying phones,1,00
2020,2020010,Nokia,98738,Buying phones,4,00
2020,2020010,Apple,12345,Buying laptops,3,00

我想将它输出到一个 csv 文件,并让脚本像这样总结每个客户的工作时间数

苹果,11 诺基亚,5

到目前为止我只有这个

 
results = []
with open('Time_export.csv') as File:
    reader = csv.DictReader(File)
    for row in reader:
        results.append(row)
    print (results)

我是这方面的新手 :) 一直在尝试谷歌但无法弄清楚:( 有什么想法吗?

【问题讨论】:

  • 您的标题有 6 列,但您的行有 7 列。这里的事情似乎没有排列。工作时间是“1,00”(注意逗号)吗?如果是这样,那不是一个有效的 CSV 文件,逗号应该被转义。这使得在 csv 阅读器上获得正确的列变得更加困难。

标签: python sum summary


【解决方案1】:

我发现collections.defaultdict 对这类事情很有用。它会根据需要自动创建新的键/值对。在这种情况下,默认为 int,它会根据需要创建 0

import csv
import collections

with open('Time_export.csv') as File:
    results = collections.defaultdict(int)
    reader = csv.DictReader(File)
    for row in reader:
        results[row['Customer']] += int(row['Worked time'])

for name, num in sorted(results.items()):
    print(f"{name}: {num}")

【讨论】:

    【解决方案2】:

    使用字典存储客户姓名和总数:

    import csv
    
    data = '''
    Date,Customer number,Customer,Project number,Project,Worked time
    2020,2020010,Apple,12345,Buying laptops,1,00
    2020,2020010,Apple,12345,Buying laptops,4,00
    2020,2020010,Apple,12345,Buying laptops,3,00
    2020,2020010,Nokia,98738,Buying phones,1,00
    2020,2020010,Nokia,98738,Buying phones,4,00
    2020,2020010,Apple,12345,Buying laptops,3,00
    '''.strip()
    
    with open('Time_export.csv','w') as f: f.write(data)  # write test file
    
    ################################
    
    cust = {}  # customer totals
    
    with open('Time_export.csv') as File:
        reader = csv.DictReader(File)
        for row in reader:
            if row['Customer'] in cust:
               cust[row['Customer']] += int(row['Worked time'])
            else:
               cust[row['Customer']] = int(row['Worked time'])
            
        print (cust)
    

    输出

    {'Apple': 11, 'Nokia': 5}
    

    如果你想试试 Pandas,代码会变小:

    import pandas
    df = pandas.read_csv('Time_export.csv', index_col=False )
    df['Worked time'] = df['Worked time'].astype(int)
    gb = df.groupby('Customer')["Worked time"].sum().reset_index()
    print(gb.to_string(index=False))
    

    输出

    Customer  Worked time
       Apple           11
       Nokia            5
    

    【讨论】:

      【解决方案3】:

      pandas 是一个用于处理表格的强大库。它很难学习,但值得努力。您的数据在“工作时间”列中使用逗号,使其无效 CSV。如果将其更改为“。”或正确转义,然后您可以在几行代码中完成工作。

      import pandas as pd
      df = pd.read_csv('Time_export.csv')
      sums = df.groupby("Customer")["Worked time"].sum()
      

      这按客户分组,删除除“工作时间”列之外的所有列,然后对组求和。结果是一个行为很像字典的系列对象:

      >>> print(sums)
      Customer
      Apple    11.0
      Nokia     5.0
      Name: Worked time, dtype: float64
      >>> for name, val in sorted(sums.items()):
      ...     print(name, val)
      ... 
      Apple 11.0
      Nokia 5.0
      >>> print(sums["Apple"])
      11.0
      

      【讨论】:

        猜你喜欢
        • 2022-01-23
        • 1970-01-01
        • 2012-07-09
        • 1970-01-01
        • 2019-07-25
        • 2012-04-06
        • 1970-01-01
        • 2015-08-19
        • 1970-01-01
        相关资源
        最近更新 更多