Javascript正则表达式替换不在html属性中的文本[重复]答案

【问题标题】：Javascript Regex to replace text NOT in html attributes [duplicate]Javascript正则表达式替换不在html属性中的文本[重复]
【发布时间】：2011-08-19 18:44:29
【问题描述】：

我想要一个 Javascript 正则表达式将给定的单词列表包装在给定的开始 (<span>) 和结束标记 (即</span>) 中，但前提是该单词实际上是“可见文本”页面，而不是在 html 属性内（例如链接的标题标签，或在 <script></script> 块内。

我创建了一个带有基本设置的 JS Fiddle：http://jsfiddle.net/4YCR6/1/

【问题讨论】：

正如其他人所说，使用正则表达式处理 HTML 通常不是最好的主意。但在某些情况下，它只是最简单的方法。试试这个：updated jsfiddle 开启 rubular
见stackoverflow.com/questions/3241169/…

标签： javascript regex

【解决方案1】：

HTML 过于复杂，无法使用正则表达式进行可靠解析。

如果您希望在客户端执行此操作，您可以创建一个文档片段和/或断开连接的 DOM 节点（它们都不会在任何地方显示）并使用您的 HTML 字符串对其进行初始化，然后遍历生成的 DOM 树并处理文本节点。（或者使用库来帮助你做到这一点，虽然它实际上很简单。）

这是一个 DOM 遍历示例。这个例子比你的问题稍微简单，因为它只是更新文本，它不会向结构添加新元素（在spans 中包装部分文本涉及更新结构），但是它应该让你继续前进。最后说明您需要更改的内容。

var html =
    "<p>This is a test.</p>" +
    "<form><input type='text' value='test value'></form>" +
    "<p class='testing test'>Testing here too</p>";
var frag = document.createDocumentFragment();
var body = document.createElement('body');
var node, next;

// Turn the HTML string into a DOM tree
body.innerHTML = html;

// Walk the dom looking for the given text in text nodes
walk(body);

// Insert the result into the current document via a fragment
node = body.firstChild;
while (node) {
  next = node.nextSibling;
  frag.appendChild(node);
  node = next;
}
document.body.appendChild(frag);

// Our walker function
function walk(node) {
  var child, next;

  switch (node.nodeType) {
    case 1:  // Element
    case 9:  // Document
    case 11: // Document fragment
      child = node.firstChild;
      while (child) {
        next = child.nextSibling;
        walk(child);
        child = next;
      }
      break;
    case 3: // Text node
      handleText(node);
      break;
  }
}

function handleText(textNode) {
  textNode.nodeValue = textNode.nodeValue.replace(/test/gi, "TEST");
}

Live example

您需要进行的更改将在handleText 中。具体来说，您不需要更新nodeValue，而是：

在nodeValue 字符串中查找每个单词的开头索引。
使用Node#splitText 将文本节点拆分为最多三个文本节点（匹配文本之前的部分、是匹配文本的部分以及匹配文本之后的部分）。
使用document.createElement 创建新的span（实际上就是span = document.createElement('span')）。
使用Node#insertBefore 将新的span 插入到第三个文本节点（包含匹配文本后面的文本的节点）前面；如果您不需要创建第三个节点，因为您的匹配文本位于文本节点的末尾，那也没关系，只需将 null 作为 refChild 传入即可。
使用Node#appendChild 将第二个文本节点（具有匹配文本的节点）移动到span。（无需先将其从其父级中删除；appendChild 会为您执行此操作。）

【讨论】：

有趣的事实：将近五年后，他们在the Drumpfinator Chrome extension 中使用了此代码，与 John Oliver 连接到 Last Week Tonight。搞笑！
你也发现了？哦，等一下，这是你的答案？您是否咨询过，或者您是否像我一样查看过扩展程序？
@brace110：一位非常漂亮的年轻女士查看了源代码给我发了一封电子邮件。 :-)
将它用于我自己类似的事情，当然要有适当的归属:)

【解决方案2】：

T.J. Crowder's answer 是正确的。我在代码方面做了一些进一步的说明：这是一个适用于所有主要浏览器的完整示例。我之前曾在 Stack Overflow 上发布过这段代码的变体（例如here 和here），并使它变得漂亮和通用，因此我（或其他任何人）不必对其进行太多更改即可重用它。

jsFiddle 示例：http://jsfiddle.net/7Vf5J/38/

代码：

// Reusable generic function
function surroundInElement(el, regex, surrounderCreateFunc) {
    // script and style elements are left alone
    if (!/^(script|style)$/.test(el.tagName)) {
        var child = el.lastChild;
        while (child) {
            if (child.nodeType == 1) {
                surroundInElement(child, regex, surrounderCreateFunc);
            } else if (child.nodeType == 3) {
                surroundMatchingText(child, regex, surrounderCreateFunc);
            }
            child = child.previousSibling;
        }
    }
}

// Reusable generic function
function surroundMatchingText(textNode, regex, surrounderCreateFunc) {
    var parent = textNode.parentNode;
    var result, surroundingNode, matchedTextNode, matchLength, matchedText;
    while ( textNode && (result = regex.exec(textNode.data)) ) {
        matchedTextNode = textNode.splitText(result.index);
        matchedText = result[0];
        matchLength = matchedText.length;
        textNode = (matchedTextNode.length > matchLength) ?
            matchedTextNode.splitText(matchLength) : null;
        // Ensure searching starts at the beginning of the text node
        regex.lastIndex = 0;
        surroundingNode = surrounderCreateFunc(matchedTextNode.cloneNode(true));
        parent.insertBefore(surroundingNode, matchedTextNode);
        parent.removeChild(matchedTextNode);
    }
}

// This function does the surrounding for every matched piece of text
// and can be customized  to do what you like
function createSpan(matchedTextNode) {
    var el = document.createElement("span");
    el.style.color = "red";
    el.appendChild(matchedTextNode);
    return el;
}

// The main function
function wrapWords(container, words) {
    // Replace the words one at a time to ensure "test2" gets matched
    for (var i = 0, len = words.length; i < len; ++i) {
        surroundInElement(container, new RegExp(words[i]), createSpan);
    }
}

wrapWords(document.getElementById("container"), ["test2", "test"]);

【讨论】：

这正是我要找的，我怎么能完全忽略大小写呢？
@MikeMellor：将new RegExp(words[i], "g") 更改为new RegExp(words[i], "gi")。
天哪，这很容易，我真的应该学习正则表达式。谢谢蒂姆
@MikeMellor：每个人都应该学习正则表达式:)
@TimDown：感谢代码。但是，必须注意它有一个错误：它会跳过一些匹配项。要修复它，必须在环绕匹配文本中的 textNode = ... 行之后添加 regex.lastIndex = 0;。