【问题标题】:JS Regex match between same pairs of charactersJS正则表达式匹配相同的字符对
【发布时间】:2021-05-10 08:24:51
【问题描述】:

我目前正在尝试将 wikitext 表格转换为 HTML。 (Parsoid 不是一个选项)

表格以以下格式编写。我想对代码进行正则表达式以提高速度,但我需要一种方法来捕获常用搜索词之间的文本。

{| class=\"wikitable\"
|-
|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''
|-
|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]
|-
|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.
|}

从下面我需要匹配“|-”子字符串之间的文本并以“|}”结尾

所以比赛将是

|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''

|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.

|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]

如您所见,缺少“|”会很复杂字符匹配需要通过字符对来完成。 (我还需要在以后的匹配/替换调用中通过 '\n|' 进行匹配)

在这上面花了好几个小时,我知道我需要进行前瞻和回溯(用 or 表示 |- 和 })。我认为/((?=(\|\-))[.]*)(?!(\|\-|\|\}))/mg 是最有可能的候选人,但并不高兴。

有什么建议吗?

【问题讨论】:

  • 如果您尝试制作解析器,我始终建议不要直接使用正则表达式,在线有一些有用的工具可以指导和帮助您从基本语法实现简单的解析器,例如 PEG.js .尝试使用正则表达式解析所有内容是一项巨大而毫无价值的工作。如果幸运的话,wikitext 表是公共领域的对象,你可能会发现一些已经完成的实现
  • 也许是(?<=\|-\n).*?(?=\s*\|[-}])regex101.com/r/uIvAN4/1

标签: javascript regex regex-lookarounds regex-greedy regexp-replace


【解决方案1】:

我认为正则表达式非常适合这项任务。作为一个额外的好处,它比 Lex 和 Yacc 方法快得多。此代码使用多个正则表达式处理 wiki 文本的 html 呈现:

let input = `{| class=\"wikitable\"
|-
|'''Ruler'''
|'''Stopwatch'''
|'''Magnifying Glass'''
|-
|[[File:Ruler30cmDiagonal.png|center|200px]]
|[[File:Stopwatch.png|center|200px]]
|[[File:MagnifyingGlass.png|center|200px]]
|-
|A ruler is a piece of '''equipment''' used to measure length.
|A scientist came '''equip''' with a [[stopwatch]].
|A magnifying glass is a useful piece of '''equipment''' for looking at very small things.
|}`;

let classAttr = '';
let html = '<table>\n  ' + input
  .split(/[\r\n]+\|[\-\}]/)
  .filter((row, idx) => {
    if(idx === 0) {
      // class row on first line
      let m = row.match(/class=.?"([a-zA-Z_\- ]+)/);
      if(m) {
        // save the table class attribute for later use
        classAttr = ' class="' + m[1] + '"';
      }
      return false;
    } else if(row.length) {
      return true;
    }
    return false; // remove empty rows
  })
  .map((row) => {
    row = row
      .split(/[\r\n]+\|/)
      .filter((row, idx) => {
        if(idx === 0) {
          return false; // remove first empty item, not a cell
        }
        return true;
      })
      .map((cell) => {
        cell = '\n    <td> '
          + cell // do additional cell rendering as needed
          + ' </td>';
        return cell;
      })
      .join('');
    return '<tr>' + row + '\n  </tr>';
  })
  .join('\n  ') + '\n</table>';
// insert the table class attribute (if any)
html = html.replace(/(?<=<table)/, classAttr);

console.log(html);

结果:

<table class="wikitable">
  <tr>
    <td> '''Ruler''' </td>
    <td> '''Stopwatch''' </td>
    <td> '''Magnifying Glass''' </td>
  </tr>
  <tr>
    <td> [[File:Ruler30cmDiagonal.png|center|200px]] </td>
    <td> [[File:Stopwatch.png|center|200px]] </td>
    <td> [[File:MagnifyingGlass.png|center|200px]] </td>
  </tr>
  <tr>
    <td> A ruler is a piece of '''equipment''' used to measure length. </td>
    <td> A scientist came '''equip''' with a [[stopwatch]]. </td>
    <td> A magnifying glass is a useful piece of '''equipment''' for looking at very small things. </td>
  </tr>
</table>

查看// do additional cell rendering as needed 评论,您可以在其中解决其他呈现问题,例如粗体文本和链接。

【讨论】:

  • @LeosSire:有什么问题吗?这是否满足您的需求?
  • @LeosSire:愿意接受答案吗?帮助是双向的。
猜你喜欢
  • 2014-05-21
  • 2021-04-08
  • 1970-01-01
  • 1970-01-01
  • 2011-03-27
  • 2015-03-15
  • 2015-05-14
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多