【问题标题】:Python dictionary sumPython字典总和
【发布时间】:2014-11-16 11:42:02
【问题描述】:

大家好,我有疑问如何在字典中对相同的 IP 地址求和。 我有输入文件,该文件看起来像:

IP           , Byte
10.180.176.61,3669
10.164.134.193,882
10.164.132.209,4168
10.120.81.141,4297
10.180.176.61,100

我的做法是打开该文件并用逗号后的数字解析 IP 地址,这样我就可以对一个 IP 地址的所有字节求和。所以我可以得到如下结果:

IP 10.180.176.61 , 37669

我的代码如下:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re,sys, os
from collections import defaultdict

f     = open('splited/small_file_1000000.csv','r')
o     = open('gotovo1.csv','w')

list_of_dictionaries = {}

for line in f:
    if re.search(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.*',line):
        line_ip = re.findall(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}',line)[0]
        line_by = re.findall(r'\,\d+',line)[0]
        line_b = re.sub(r'\,','',line_by)

        list_of_dictionaries['IP']  = line_ip
        list_of_dictionaries['VAL'] = int(line_b)


c = defaultdict(int)
for d in list_of_dictionaries:
    c[d['IP']] += d['VAL']

print c

任何想法都会很棒。

【问题讨论】:

    标签: python regex parsing csv dictionary


    【解决方案1】:

    如果您的文件看起来像您提供的示例,则不需要正则表达式来解析它。只需使用逗号分隔行:

    list_of_dictionaries = {}
    with open('splited/small_file_1000000.csv', 'r') as f:
         header = f.readline()
         for line in f:
                 ip, bytes = line.split(',')
                 if list_of_dictionaries.has_key(ip):
                     list_of_dictionaries[ip] += int(bytes.strip())
                 else:
                     list_of_dictionaries[ip] = int(bytes.strip())
    OUT: {'10.180.176.61': 3769, '10.164.134.193': 882, '10.164.132.209': 4168, '10.120.81.141': 4297}
    

    【讨论】:

      【解决方案2】:

      使用csv 模块读取您的文件并使用collections.Counter 总结每个IP 地址的总数:

      from collections import Counter
      import csv
      
      
      def read_csv(fn):
          with open(fn, 'r') as csvfile:
              reader = csv.reader(csvfile, delimiter=',')
              reader.next()    # Skip header
              for row in reader:
                  ip, bytes = row
                  yield ip, int(bytes)
      
      
      totals = Counter()
      for ip, bytes in read_csv('data.txt'):
          totals[ip] += bytes
      
      print totals
      

      输出:

      Counter({'10.120.81.141': 4297, '10.164.132.209': 4168, '10.180.176.61': 3769, '10.164.134.193': 882})
      

      【讨论】:

        猜你喜欢
        • 2012-07-26
        • 2016-05-28
        • 1970-01-01
        • 2019-06-13
        • 2021-06-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多