【问题标题】:How to properly parse a file in python, and save its contents to a dictionary如何正确解析python中的文件,并将其内容保存到字典中
【发布时间】:2021-07-27 21:30:16
【问题描述】:

所以我有以下文件摘要1:

---
Project: pgm1
Last-Status: success
summary:  102 passed, 88 warnings in 26.11s
---
Project: pgm2
Last-Status: failed
summary:  1 failed, 316 passed, 204 warnings in 2919.10s
---
Project: pgm3
Last-Status: success
summary:  400 passed, 40 skipped, 1 xfailed in 3.17s
---

我需要解析它的内容,然后在一个循环中,使用预定义的值创建一个字典,然后使用文件中的解析值填充相应的字典键,结果如下:

entry =  {
              "{#STATUS}": 0
              "{#DESCRIPTION}": "pgm2",
              "{#PASSED}": 316,
              "{#FAILED}": 1,
              "{#WARNING}": 204,
              "{#SKIPPED}": 0,
              "{#XFAILED}": 0,
              "{#DURATION}": 2919.10
}... 

对于文件中的所有数据部分,依此类推。 但是,我无法获取并打印字段“{#DURATION}”的正确值。相反,这就是我得到的:

 {
  "{#PASSED}": "316",
  "{#FAILED}": "1",
  "{#DURATION}": "10",
  "{#XFAILED}": 0,
  "{#SKIPPED}": 0,
  "{#STATUS}": 0,
  "{#DESCRIPTION}": "pmg2",
  "{#WARNING}": 204
 }

这是我的代码:

def break_text(lst_text):
    desc = re.findall(r": (.*)", lst_text[1])
    status = re.findall(r": (.*)", lst_text[2])
    summary = re.findall(r"\d+ \w+", lst_text[3])
    return desc, status, summary


def create_dict(lst):
    desc = ' '.join([str(elem) for elem in lst[0]])
    status = ' '.join([str(elem) for elem in lst[1]])
    if status == "success":
        status = 1
    else:
        status = 0

entry = {
    "{#STATUS}": status,
    "{#DESCRIPTION}": desc,
    "{#PASSED}": 0,
    "{#FAILED}": 0,
    "{#WARNING}": 0,
    "{#SKIPPED}": 0,
    "{#XFAILED}": 0,
    "{#DURATION}": 0,
}
dict_temp = {
    "passed": "{#PASSED}",
    "failed": "{#FAILED}",
    "warnings": "{#WARNING}",
    "skipped": "{#SKIPPED}",
    "xfailed": "{#XFAILED}",
    "seconds": "{#DURATION}",
}
for i in lst[2]:
    v, k = i.split()
    entry[dict_temp[k]] = v
return entry


with open("/tmp/summary1", "r") as file:
    file = file.read().splitlines()

data_list = []

for i in range(0, len(file), 4):  # Read 4 lines of the file each time
    text = file[i: i + 4]
    if len(text) <= 1:
        continue
    res_tmp = break_text(text)
    res = create_dict(res_tmp)
    data_list.append(res)

new_dict = dict()
new_dict["data"] = data_list

print(new_dict)

关于如何获取字段“{#DURATION}”的正确值以及我的代码有什么问题的任何建议?

【问题讨论】:

    标签: python json file dictionary parsing


    【解决方案1】:

    你可以试试这个例子来解析文本:

    import re
    from pprint import pprint
    
    txt = """
    ---
    Project: pgm1
    Last-Status: success
    summary:  102 passed, 88 warnings in 26.11s
    ---
    Project: pgm2
    Last-Status: failed
    summary:  1 failed, 316 passed, 204 warnings in 2919.10s
    ---
    Project: pgm3
    Last-Status: success
    summary:  400 passed, 40 skipped, 1 xfailed in 3.17s
    ---
    """
    
    project = re.findall(r"Project: (.*)", txt)
    last_status = re.findall(r"Last-Status: (.*)", txt)
    summary = re.findall(r"summary:(.*)", txt)
    
    entries = []
    for p, l, s in zip(project, last_status, summary):
        d = dict([(b, a) for a, b in re.findall(r"([\d+.]+)\s+?([a-z]+)", s)])
        entries.append(
            {
                "{#STATUS}": int(l == "success"),
                "{#DESCRIPTION}": p,
                "{#PASSED}": int(d.get("passed", 0)),
                "{#FAILED}": int(d.get("failed", 0)),
                "{#WARNING}": int(d.get("warnings", 0)),
                "{#SKIPPED}": int(d.get("skipped", 0)),
                "{#XFAILED}": int(d.get("xfailed", 0)),
                "{#DURATION}": float(re.search(r"([\d.]+)s\s*$", s).group(1)),
            }
        )
    
    
    pprint(entries)
    

    打印:

    [{'{#DESCRIPTION}': 'pgm1',
      '{#DURATION}': 26.11,
      '{#FAILED}': 0,
      '{#PASSED}': 102,
      '{#SKIPPED}': 0,
      '{#STATUS}': 1,
      '{#WARNING}': 88,
      '{#XFAILED}': 0},
     {'{#DESCRIPTION}': 'pgm2',
      '{#DURATION}': 2919.1,
      '{#FAILED}': 1,
      '{#PASSED}': 316,
      '{#SKIPPED}': 0,
      '{#STATUS}': 0,
      '{#WARNING}': 204,
      '{#XFAILED}': 0},
     {'{#DESCRIPTION}': 'pgm3',
      '{#DURATION}': 3.17,
      '{#FAILED}': 0,
      '{#PASSED}': 400,
      '{#SKIPPED}': 40,
      '{#STATUS}': 1,
      '{#WARNING}': 0,
      '{#XFAILED}': 1}]
    

    【讨论】:

      【解决方案2】:
      summary = re.findall(r"\d+ \w+", lst_text[3])
      

      提取摘要的正则表达式不考虑小数点,并且还需要数字和单词之间的空格。我不确定如何更正正则表达式,但这是一种可以暂时为您提供所需输出的解决方案。

      summary = re.findall(r"\d+ \w+", lst_text[3])
      summary += [re.findall(r"\d+\.\d+", lst_text[3])[0] + " seconds"]
      

      【讨论】:

      • 如果我添加那行代码,我会收到以下错误:文件“python4.py”,第 36 行,在 break_text 摘要中 += [re.findall(r"\d+\.\d+" , lst_text[3])[0] + " seconds"] IndexError: list index out of range
      • 这意味着,在这种情况下,它找不到匹配正则表达式的持续时间。对不起,我对正则表达式了解不多。我会重新检查你必须找到摘要的正则表达式,以包括浮点数。
      猜你喜欢
      • 2021-07-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-04-18
      • 1970-01-01
      • 1970-01-01
      • 2020-04-15
      • 1970-01-01
      相关资源
      最近更新 更多