【问题标题】:extract critical numbers from a mixed log file从混合日志文件中提取关键数字
【发布时间】:2015-09-10 15:42:10
【问题描述】:

我有一个日志文件包含许多这样的片段:

Align set A and merge into set B ...
    setA, 4 images , image size 146 X 131
    setA, image 1, shape center shift (7, -9) compared to image center
    setA, image 2, shape center shift (8, -10) compared to image center
    setA, image 3, shape center shift (6, -9) compared to image center
    setA, image 4, shape center shift (6, -8) compared to image center
    final set B, image size 143 X 129
Write set B ...

现在,我想将这个切片中的数字提取到一个表格中:

| width_A | height_A | shift_x | shift_y | width_B | height_B|
--- | --- | --- | ----| ---
A1 | 146 | 131 | 7 | -9 | 143 | 129
A2 | 146 | 131 | 8 | -10 | 143 | 129
A3 | 146 | 131 | 6 | -9 | 143 | 129
A4 | 146 | 131 | 6 | -8 | 143 | 129

如果把程序分成两部分,那么:

  1. 文本处理,将文本读入字典data,例如data['A1']['shift_x'] = 7
  2. 使用 pandas 将字典转换为数据框:df = pd.DataFrame(data)

但是我对python文本处理不熟悉:

有人对此有好的解决方案吗?首选 Python。提前致谢。

【问题讨论】:

    标签: python text-processing


    【解决方案1】:

    自己终于找到答案了:

    import re
    
    # store attribute as a turple, construct a dictionary, turple_attribute: pattern
    regexp = {
        ('title', ): re.compile(r'Merge (.*) into set B.*\n' ),
        ('nimages', 'height_A', 'width_A'): re.compile(r'\s+setA, (\d{1,}) images , image size (\d{1,}) X (\d{1,}).*\n'),
        ('image_no', 'shift_x', 'shift_y'): re.compile(r'\s+setA, image (\d{1,}), shape center shift \((-?\d{1,}), (-?\d{1,})\) compared to image center.*\n'),
        ('gauge_no', ): re.compile(r'Write gauge (\d{1,}), set B.*') }
    
    with open(log_file) as f:
        for line in f:
            print(line)
            for keys, pattern in regexp.iteritems():
                m = pattern.match(line)
                if m:          
                    # traverse attributes
                    for groupn, attr in enumerate(keys):  
                        # m.group(0): content of the entrire line
                        print str(groupn)+' '+attr + ' ' + m.group(groupn+1)
    

    参考

    1. 在我问之前没有注意到这个问题,Extracting info from large structured text files
    2. Regular expression cheat table

    【讨论】:

      猜你喜欢
      • 2018-09-08
      • 1970-01-01
      • 1970-01-01
      • 2019-06-25
      • 1970-01-01
      • 2022-10-13
      • 1970-01-01
      • 2014-09-28
      • 1970-01-01
      相关资源
      最近更新 更多