【问题标题】:read output log file, and print all unique file paths using bash/python script读取输出日志文件,并使用 bash/python 脚本打印所有唯一的文件路径
【发布时间】:2019-04-17 03:33:43
【问题描述】:

从下面的输出日志文件中,我想使用 bash/python 脚本打印所有唯一的文件路径(例如 /AWS Cloud/Test/DEMO/Service/DEV )

操作系统平台:Linux

这是输出日志文件 (output.log):

/AWS Cloud/Test/DEMO/Service/DEV:    google.service.instance = https://aoodev.com (ms: azure_mico, cs: docker_telco)
/AWS Cloud/Test/DEMO/Service/QA1:    yahoo.service.instance = aoodit.com (ms: yahoo_mico, cs: yahoo_telco)
/AWS Cloud/Test/Blender/Service/QA1:    google.service.instance = aoodev.com (ms: azure_mico, cs: google_telco)
/AWS Cloud/Test/DEMO/Service/QA1:    yahoo.service.instance = aoodqa.com
/Azure Cloud/Test/DEMO/Service/DEV:    google.service.instance = aoodev.com
/Azure Cloud/Test/DEMO/Service/QA1:    https://yahoo.service.instance = aoodit.com
/Azure Cloud/Test/DEMO/Service/DEV:    google.service.instance = aoodev.com

预期输出:

azure_micro docker_telco /AWS Cloud/Test/DEMO/Service/DEV yahoo_mico yahoo_telco /AWS Cloud/Test/DEMO/Service/QA1
azure_micro google_telco /AWS Cloud/Test/Blender/Service/QA1
/Azure 云/测试/DEMO/服务/DEV
/Azure 云/测试/DEMO/服务/DIT

【问题讨论】:

  • 这个 oneliner 应该从 bash 终端 awk -F: {'print $1'} FILENAME |sed 's/^\|$/"/g' | sort | uniq | sed ' s/"//g'

标签: python linux python-3.x bash


【解决方案1】:

你需要regex和python模块re

应该这样做:

paths = [] # Create an empty list of paths
regex = r'^(\/.+\:).*(ms: )(.+), (cs: )(.+)\)$'

with open("logs.txt") as file: # Open your log file
    for line in file:
        if "cs" in line: # If your line has a cs parameter
            result = re.findall(regex, line)[0]
            paths.append(result[2] + " " + result[4] + " " + result[0])
        else:
            paths.append(line.split(":")[0] + ":") # Old way

paths = list(set(paths)) # Convert to set and then back to list to get all unique path only

print(paths)

【讨论】:

  • 我们有同样的想法,除了使用r’:\t’ 作为分割函数之外,你可以做同样的事情。好像有https://的网站。
  • 我关注的是第一个 :,就在 AWS 路径之后。所以应该没有问题。
  • 谢谢@ggrelet,你能再看看问题吗,我已经更新了对问题的一些更改以优化预期的输出。
  • 我编辑了我的答案@itgeek。如果您喜欢它,请点赞并将其标记为已接受的答案
  • @ggrelet 我正面临这个错误结果 = re.findall(regex, line)[0] IndexError: list index out of range
【解决方案2】:

这行得通吗:

import os
fh = os.open(‘path/to/log’, mode=‘r’)
file_ = fh.readlines()

def parse_paths(file_):
    directories_list = []
    for line in file_:
        path, message = line.split(r‘:\t’)
        directories_list.append(path)
    return directories_list

【讨论】:

    【解决方案3】:
    #!/usr/bin/python3
    
    # Open log file as read-only
    logFile = open('file.log', 'r')
    
    # To have a nice array with every line of the log file
    logLines = logFile.read().split('\n') 
    
    for path in logLines:
    
        # Divides every line into an array, where line[0] would be the path, and line[1] would have everything after the colon. Then, print it.
        print(path.split(':')[0]) 
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-07-06
      • 1970-01-01
      • 2023-03-06
      • 1970-01-01
      • 2015-08-28
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多