【问题标题】:Convert log file into json file using python使用python将日志文件转换为json文件
【发布时间】:2019-02-14 11:22:35
【问题描述】:

我是 python 新手。我正在尝试使用 python 脚本将日志文件转换为 json 文件。我创建了一个主文件和一个 del6 文件。在这里,它将转换日志文件并写入一个新的 json 文件。在执行时,它向我显示以下错误。

Traceback (most recent call last):
  File "main.py", line 23, in <module>
    main()
  File "main.py", line 14, in main
    print toJson(sys.argv[2])
  File "/home/paulsteven/BEAT/apache/del6.py", line 46, in toJson
    entries = readfile(file)
  File "/home/paulsteven/BEAT/apache/del6.py", line 21, in readfile
    filecontent[index] = line2dict(line)
  File "/home/paulsteven/BEAT/apache/del6.py", line 39, in line2dict
    res = m.groupdict()
AttributeError: 'NoneType' object has no attribute 'groupdict'

我试过这个链接log to json 但这并没有给我正确的解决方案。 有没有办法解决这个问题。

这是我的示例日志文件:

February 14 2019, 15:38:47      172.217.160.132     www.google.com      up      tcp-tcp@ www.google.com     172.217.160.132
February 14 2019, 15:38:47      104.28.4.86     www.smackcoders.com     up      tcp-tcp@ www.smackcoders.com        104.28.4.86     

输出应该是这样的:

{"1": {"timestamp": "February 14 2019, 15:38:47", "monitorip": "172.217.160.132 ", "monitorhost": "www.google.com", "monitorstatus": "up", "monitorid": "tcp-tcp@ www.google.com", "resolveip": "172.217.160.132"}, "2": {"timestamp": "February 14 2019, 15:38:47", "monitorip": "104.28.4.86", "monitorhost": "www.smackcoders.com", "monitorstatus": "up", "monitorid": "tcp-tcp@ www.smackcoders.com", "resolveip": "104.28.4.86"}

这是主要的python代码:

import sys
from del6 import *

def main():
    if len(sys.argv) < 3:
        print "Incorrect Syntax. Usage: python main.py -f <filename>"
        sys.exit(2)
    elif sys.argv[1] != "-f":
        print "Invalid switch '"+sys.argv[1]+"'"
        sys.exit(2)
    elif os.path.isfile(sys.argv[2]) == False:
        print "File does not exist"
        sys.exit(2)
    print toJson(sys.argv[2])
    text_file = open("tcp.json", "a+")
    text_file.write(toJson(sys.argv[2]))
    text_file.write("\n")
    text_file.close()



if __name__ == "__main__":
    main()

这是我的 del6 代码:

import fileinput
import re
import os
try: import simplejson as json
except ImportError: import json

#read input file and return entries' Dict Object
def readfile(file):
    filecontent = {}
    index = 0
    #check necessary file size checking
    statinfo = os.stat(file)

    #just a guestimate. I believe a single entry contains atleast 150 chars
    if statinfo.st_size < 150:
        print "Not a valid access_log file. It does not have enough data"
    else:
        for line in fileinput.input(file):
            index = index+1
            if line != "\n": #don't read newlines
                filecontent[index] = line2dict(line)

        fileinput.close()
    return filecontent

#gets a line of string from Log and convert it into Dict Object
def line2dict(line):
    #Snippet, thanks to http://www.seehuhn.de/blog/52
    parts = [
    r'(?P<timestamp>\S+)',                  
    r'(?P<monitorip>\S+)',               
    r'(?P<monitorhost>\S+)',                
    r'(?P<monitorstatus>\S+)',              
    r'"(?P<monitorid>\S+)"',              
    r'(?P<resolveip>\S+)',             
]
    pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')
    m = pattern.match(line)
    res = m.groupdict()
    return res

#to get jSon of entire Log
#returns JSON object
def toJson(file):
    #get dict object for each entry
    entries = readfile(file)
    return json.JSONEncoder().encode(entries)

【问题讨论】:

  • 显示您想要的输出格式示例。

标签: python json python-3.x


【解决方案1】:

我看到列被双标签分隔。所以基于此:

i = 1
result = {}
with open('log.txt') as f:
    lines = f.readlines()
    for line in lines:
        r = line.split('\t\t')
        result[i] = {'timestamp': r[0], 'monitorip': r[1], 'monitorhost': r[2], 'monitorstatus': r[3], 'monitorid': r[4], 'resolveip': r[5]}
        i += 1

输出:

{1: {'timestamp': 'February 14 2019, 15:38:47', 'monitorip': '172.217.160.132', 'monitorhost': 'www.google.com', 'monitorstatus': 'up', 'monitorid': 'tcp-tcp@ www.google.com', 'resolveip': '172.217.160.132\n'}, 2: {'timestamp': 'February 14 2019, 15:38:47', 'monitorip': '104.28.4.86', 'monitorhost': 'www.smackcoders.com', 'monitorstatus': 'up', 'monitorid': 'tcp-tcp@ www.smackcoders.com', 'resolveip': '104.28.4.86'}}

或者如果你想要更自然的字典列表,那么:

result = []
with open('log.txt') as f:
    lines = f.readlines()
    for line in lines:
        r = line.split('\t\t')
        result.append({'timestamp': r[0], 'monitorip': r[1], 'monitorhost': r[2], 'monitorstatus': r[3], 'monitorid': r[4], 'resolveip': r[5]})

输出:

[{'timestamp': 'February 14 2019, 15:38:47', 'monitorip': '172.217.160.132', 'monitorhost': 'www.google.com', 'monitorstatus': 'up', 'monitorid': 'tcp-tcp@ www.google.com', 'resolveip': '172.217.160.132\n'}, {'timestamp': 'February 14 2019, 15:38:47', 'monitorip': '104.28.4.86', 'monitorhost': 'www.smackcoders.com', 'monitorstatus': 'up', 'monitorid': 'tcp-tcp@ www.smackcoders.com', 'resolveip': '104.28.4.86'}]

【讨论】:

    【解决方案2】:

    感谢您的回答。 要将其保存在 JSON 文件中:

    import json
    
    
    i = 1
    result = {}
    with open('tcp.log') as f:
        lines = f.readlines()
        for line in lines:
            r = line.split('\t\t')
            result[i] = {'timestamp': r[0], 'monitorip': r[1], 'monitorhost': r[2], 'monitorstatus': r[3], 'monitorid': r[4], 'resolveip': r[5]}
            i += 1 
    print(result) 
    with open('data.json', 'w') as fp:
        json.dump(result, fp)
    

    【讨论】:

      【解决方案3】:

      以下是解决问题的通用方法。函数“log_lines_to_json”将处理字段由“field_delimiter”分隔且字段名称为“field_names”的任何文本文件

      FIELD_NAMES = ['timestamp', 'monitorip', 'monitorhost', 'monitorstatus', 'monitorid', 'resolveip']
      FIELD_DELIMITER = '\t\t'
      
      
      def log_lines_to_json(log_file, field_names, field_delimiter):
          result = []
          with open(log_file) as f:
              lines = f.readlines()
              for line in lines:
                  fields = line.split(field_delimiter)
                  result.append({field_name: fields[idx] for idx, field_name in enumerate(field_names)})
          return result
      
      
      entries = log_lines_to_json('log.txt', FIELD_NAMES, FIELD_DELIMITER)
      for entry in entries:
          print(entry)
      

      输出:

      {'monitorid': 'tcp-tcp@ www.google.com', 'monitorstatus': 'up', 'timestamp': 'February 14 2019, 15:38:47', 'monitorhost': 'www.google.com', 'monitorip': '172.217.160.132', 'resolveip': '172.217.160.132\n'}
      {'monitorid': 'tcp-tcp@ www.smackcoders.com', 'monitorstatus': 'up', 'timestamp': 'February 14 2019, 15:38:47', 'monitorhost': 'www.smackcoders.com', 'monitorip': '104.28.4.86', 'resolveip': '104.28.4.86'}
      

      【讨论】:

        猜你喜欢
        • 2021-09-24
        • 2023-04-10
        • 1970-01-01
        • 1970-01-01
        • 2021-03-23
        • 2019-03-25
        • 1970-01-01
        • 2019-12-10
        • 2017-03-02
        相关资源
        最近更新 更多