【问题标题】:Aggregating CSV Records Using Python使用 Python 聚合 CSV 记录
【发布时间】:2015-10-31 14:53:24
【问题描述】:

我的 CSV 文件如下所示:

Mike,6
Mike,5
Bill,3
Bill,1
Sally,4
Sally,2

我想修改它,使计数按名称相加,如下所示:

Mike,11
Bill,4
Sally,6

【问题讨论】:

  • 罗伯特-谢谢你的想法。我已经投入了一些工作。不过,我认为我的工作可能会使其他人感到困惑;)以下是我的解决方案。理想情况下,每一行结果都是一个单独的字典。如果您对如何执行此操作有任何想法,请告诉我。谢谢

标签: python csv group-by aggregate


【解决方案1】:

看看 pandas 库。

df = pd.read_csv('data.csv')
df_grouped = df.groupby('name').sum()

这里有更多细节

http://pandas.pydata.org/pandas-docs/stable/groupby.html

【讨论】:

  • pandas 对于这个简单的场景来说可能是矫枉过正。
  • 感谢熊猫的想法。我在下面发布了我的熊猫代码。我想将上面的每一行结果放入单独的字典中。如果您对如何做到这一点有任何想法,请告诉我。谢谢
【解决方案2】:
def records_from_file(fname, column_names):
    with open(fname,'r') as input_handler:
    for line in input_handler:
        line = line.strip('\n') #strip out newline
        x={} #this creates x as an empty dictionary
        for i in range(len(column_names)):
            x[column_names[i]] = line.split(",")[i]  #append each key and value to the dictionary
        yield x

record_stream = records_from_file('names.csv',['name', 'count'])

class Object:                   #Object to store unique data
    def __init__(self, name, count):
        self.name = name
        self.count = count

rownum = 0 #Row Number currently iterating over
list = []  #List to store objects

def checkList(name, count):
    for object in list:  #Iterate through list        
        count=int(count)
        if object.name == name:  #Check if name and produce combination exists
            object.count += int(count) #If it does add to amount variable and break out
            return
    newObject = Object(name, count) #Create a new object with new name, produce, and amount
    list.append(newObject)  #Add to list and break out

for record in record_stream:  #Iterate through all the rows
    name = record['name']  #Store name
    count = int(record['count']) #Store count
    checkList(name, count)

rownum += 1

for each in list: #Print out result
    print each.name,each.count

【讨论】:

    【解决方案3】:
    import pandas as pd
    
    df = pd.read_csv('names.csv')
    df.columns = ['name','count']
    df_grouped = df.groupby('name').sum()
    print df_grouped
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-09-05
      • 2020-07-03
      • 2020-05-08
      • 2014-02-07
      • 1970-01-01
      相关资源
      最近更新 更多