正则表达式查找文本后跟空格或 (答案

【问题标题】：regex to find text followed by space or (正则表达式查找文本后跟空格或 (
【发布时间】：2019-02-05 00:02:31
【问题描述】：

我正在尝试从以下 sample.log 中提取一些单词（如预期输出所示）。我在提取最后的预期输出（即 xuvs）时遇到了困难。该代码可以提取除最后一个以外的所有输出。我正在尝试查找如何编写正则表达式来暗示“查找文本后跟空格或 (”。非常感谢任何指向其他方法的指针。

sample.log

  for (i=0; i< models; i = i+1) begin:modelgen

 model_ip model_inst
     (
      .model_powerdown(model_powerdown),
      .mcg(model_powerdown),
      .lambda(_lambda[i])
      );
  assign fnl_verifier_lock = (tx_ready & rx_ready) ? &verifier_lock :1'b0;

native_my_ip native_my_inst
 (
  .tx_analogreset(tx_analogreset),
 //.unused_tx_parallel_data({1536{1'b0}})

  );

// END Section I

resync
 #(
   .INIT_VALUE (1)
   ) inst_reset_sync
   (
.clk    (tx_coreclkin),
.reset  (!tx_ready), // tx_digitalreset from reset
.d      (1'b0),
.q      (srst_tx_common  )
);

har HA2  (fs, ha, lf, c);                  

#need to extract xuvs
xuvs or1(fcarry_out, half_carry_2, half_carry_1);

预期输出

model_ip
native_my_ip
resync
har
xuvs

code.py

import re

input_file = open("sample.log", "r")
lines = input_file.read()   # reads all lines and store into a variable
input_file.close()
for m in re.finditer(r'^\s*([a-zA-Z_0-9]+)\s+([a-zA-Z_0-9]+\s+\(|#\()',   lines, re.MULTILINE):
   print m.group(1)

【问题讨论】：

将\s+\(更改为\s*\(
见regex101.com/r/svdM2P/1。另外，考虑^\s*(\w+)\s+(\w+|#)\s*\(，参见this regex demo (Python demo)。
如果我的回答对你有用，请考虑投票/接受。

标签： python regex

【解决方案1】：

您需要在( 之前匹配任何可选的空白字符：

^\s*(\w+)\s+(\w+|#)\s*\(
                   ^^^

请参阅regex demo。 [a-zA-Z0-9_] 可以缩短为\w（如果你需要在 Python 3 中使用它并且只匹配 ASCII 字母和数字，请使用 re.ASCII 标志编译）。

详情

^ - 行首（因为使用了re.MULTILINE）
\s* - 0+ 个空格
(\w+) - 第 1 组：一个或多个字母、数字或 _
\s+ - 1+ 个空格
(\w+|#) - 第 2 组：一个或多个字母、数字或 _ 或 #
\s* - 0+ 个空格
\( - 一个 ( 字符。

Python demo:

for m in re.finditer(r'^\s*(\w+)\s+(\w+|#)\s*\(',   lines, re.MULTILINE):
    print m.group(1)

输出：

model_ip
native_my_ip
resync
har
xuvs

【讨论】：