Python 脚本从输入文件中提取字符串并以 csv 格式输出答案

【问题标题】：Python script to Grep strings from input file and output in csv formatPython 脚本从输入文件中提取字符串并以 csv 格式输出
【发布时间】：2019-11-15 04:00:49
【问题描述】：

我正在编写一个 python 脚本来从文件中 grep 字符串并以以下格式在 csv 文件中显示输出

enter image description here

输入文件（result_EPFT_config_device）：

Hostname SIM-MPL-LTE-PE-RTR-134
loopback 22.13.7.34
lpts punt excessive-flow-trap
penalty-rate arp 10
penalty-rate icmp 50
penalty-rate igmp 50
penalty-rate ip 100
exclude interface Bundle-Ether6
exclude interface Bundle-Ether8
exclude interface Bundle-Ether15
exclude interface Bundle-Ether16
exclude interface Bundle-Ether53
exclude interface TenGigE0/0/1/1
exclude interface TenGigE0/1/1/0
exclude interface Bundle-Ether6.2
exclude interface Bundle-Ether6.4
exclude interface Bundle-Ether8.2
exclude interface Bundle-Ether8.4
exclude interface Bundle-Ether16.2
exclude interface Bundle-Ether16.4
exclude interface Bundle-Ether53.2
exclude interface TenGigE0/0/1/3.100
exclude interface TenGigE0/0/1/3.102
exclude interface TenGigE0/0/1/3.103
exclude interface TenGigE0/0/1/3.104
exclude interface TenGigE0/1/1/0.100
exclude interface GigabitEthernet0/0/0/1
exclude interface GigabitEthernet0/0/0/6
exclude interface GigabitEthernet0/0/0/9
dampening.
non-subscriber-interfaces
report-threshold 10

下面是我目前准备的python脚本。只能grep字符串并打印出来

import sys
import telnetlib
import os
import subprocess
import re
import csv

fh = open("result_EPFT_config_device", "r")
fh1 = open("testingAjay", "w+")
line = fh.readlines()
for lines in line:
        if re.search("(lpts punt excessive-flow-trap)", lines):
                m =  (lines.split(' '))
                print m[0], m[1], m[2]
        if re.search("(penalty-rate arp)", lines):
                n =  (lines.split(' '))
                print n[0], n[1], n[2]
        if re.search("(penalty-rate icmp)", lines):
                a =  (lines.split(' '))
                print a[0], a[1], a[2]
        if re.search("(penalty-rate igmp)", lines):
                b =  (lines.split(' '))
                print b[0], b[1], b[2]
        if re.search("(penalty-rate ip)", lines):
                c =  (lines.split(' '))
                print c[0], c[1], c[2]
        if re.search("(dampening)", lines):
                c =  (lines.split(' '))
                print c[0]
        if re.search("(non-subscriber-interfaces)", lines):
                c =  (lines.split('-'))
                print c[0], c[1], c[2]
        if re.search("(report-threshold 10)", lines):
                c =  (lines.split(' '))
                print c[0], c[1]

我的脚本输出：

lpts punt excessive-flow-trap
penalty-rate arp 10
penalty-rate icmp 50
penalty-rate igmp 50
penalty-rate ip 100
dampening.
non subscriber interfaces

report-threshold 10

现在我想把输出放在 csv 文件中，如下所示

enter image description here

Hostname|loopback|lpts punt excessive-flow-trap|penalty-rate arp|penalty-rate icmp|penalty-rate igmp|penalty-rate ip|dampening|non-subscriber-interfaces|report-threshold
SIM-MPL-LTE-PE-RTR-134|1.1.1.1|yes|10|50|50|100|Yes|Yes|10
NDL-MPL-PE-RTR-195|2.2.2.2|No|No|No|20|50|NO|20Yes

如上图所示，列lpt spunt overflow trap如果存在于输入文件中，则必须标记为YES，否则标记为NO。阻尼列和非订阅者界面列

需要类似的逻辑

你能帮我实现如上所示的csv格式的require输出

【问题讨论】：

嘿，介意我用python3回答吗？
您应该为行中的所有数据创建列表或字典，使用您的代码填充它，并在您获得下一个Hostname时在csv中写入行
看来您可以在没有正则表达式的情况下对其进行测试 - 即。 if lines.startswith('lpts punt excessive-flow-trap'):

标签： python scripting

【解决方案1】：

我们来了！因此，正如上面的评论所说，您可以只使用“startswith”而不是正则表达式来匹配行。

我在这里使用了 python3 而不是 python2。

如果您在使用“python3 main.py”的目录中运行它，它将在“inputs”子目录中搜索所有要解析的文件。

然后我们为每个文件构建一个包含相关字段的字典并加载它们的值。我们将这些字典添加到列表中。最后，我们只需将标题写入 csv，然后遍历行并写入值。您可能可以在读取文件时写入行，但我发现在心理上分离解析和输出更清晰。

我将您的“for line in line”的顺序更改为“for line in lines”，因为您想遍历行中的每一行。

import os
import csv

def parseFile(fileName):

    # We are using a dictionary to store info for each file
    data = dict()

    # Set all Yes/Nos to NO by default
    data["lpts punt excessive-flow-trap"] = "NO"
    data["dampening"] = "NO"
    data["non-subscriber-interfaces"] = "NO"

    fh = open(fileName, "r")
    lines = fh.readlines()
    for line in lines:

        # We need this so we don't end up with newline characters in our CSV
        line = line.rstrip("\n")

        # We dont need regular expressions here as matching whole line
        # Do YES/NO first
        if line == "lpts punt excessive-flow-trap":
            data["lpts punt excessive-flow-trap"] = "YES"
            continue;

        if line == "dampening":
            data["dampening."] = "YES"
            continue;

        if line == "non-subscriber-interfaces":
            data["non-subscriber-interfaces"] = "YES"
            continue;

        # Now do the rest
        if line.startswith("Hostname"):
            splitted = line.split(' ')
            data["Hostname"] = splitted[1]
            continue;

        if line.startswith("loopback"):
            splitted = line.split(' ')
            data["loopback"] = splitted[1]
            continue;

        if line.startswith("penalty-rate arp"):
            print("ARP")
            splitted = line.split(' ')
            data["penalty-rate arp"] = splitted[2]
            continue;

        if line.startswith("penalty-rate icmp"):
            splitted = line.split(' ')
            data["penalty-rate icmp"] = splitted[2]
            continue;

        if line.startswith("penalty-rate igmp"):
            splitted = line.split(' ')
            data["penalty-rate igmp"] = splitted[2]
            continue;

        if line.startswith("penalty-rate ip"):
            splitted = line.split(' ')
            data["penalty-rate ip"] = splitted[2]
            continue;

        if line.startswith("report-threshold"):
            splitted = line.split(' ')
            data["report-threshold"] = splitted[1]
            continue;

    return data


if __name__ == "__main__":
    inputsDirectory = "inputs"
    path = os.path.abspath(inputsDirectory)
    fileList = ["{}/{}".format(path,x) for x in os.listdir(inputsDirectory)]
    print(fileList)

    # Load Each File and Build Dictionary
    csvRows = []
    for file in fileList:
        newRow = parseFile(file)
        csvRows.append(newRow)

    print(csvRows)

    # Output CSV using dictionaries for each file
    outputFile = "output.csv"
    with open(outputFile, 'w') as csvfile:
        fieldnames = ["Hostname",
                      "loopback",
                      "lpts punt excessive-flow-trap",
                      "penalty-rate arp",
                      "penalty-rate icmp",
                      "penalty-rate igmp",
                      "penalty-rate ip",
                      "dampening",
                      "non-subscriber-interfaces",
                      "report-threshold"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for row in csvRows:
            writer.writerow(row)

【讨论】：

@Researcher...非常感谢...有效。我只是用 wb 替换了 newline=''
太棒了，我已经更新了答案以删除它。我复制了一些 csv 代码，但没有看到换行符覆盖 :)
@Researcher...你能告诉我...怎么做...如果我有多个输入文件并且我想从另一个目录中的这些输入文件生成多个输出文件.. .Ex - 输入文件 A.txt、B.txt、C.txt 然后我需要与另一个文件夹相同的输出文件...A.csv、B.csv、C.csv