从 javascript dom 文本节点替换答案

【问题标题】：Replacing   from javascript dom text node从 javascript dom 文本节点替换
【发布时间】：2010-12-02 12:31:40
【问题描述】：

我正在使用 javascript 处理 xhtml。我通过连接 nodeType == Node.TEXT_NODE 的所有子节点的 nodeValue 来获取 div 节点的文本内容。

生成的字符串有时包含一个不间断的空格实体。如何将其替换为常规空格字符？

我的 div 看起来像这样...

<div><b>Expires On</b> Sep 30, 2009 06:30&nbsp;AM</div>

在网上找到的以下建议无效：

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");


var cleanText = replaceHtmlEntities(text);

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

有什么建议吗？

【问题讨论】：

"&nnbsp;" 你的数据不是问题……是吗？
我在帖子中输入了错字 - 如果我使用，Stack Overflow 会将实体转换为帖子预览中的实际空间
嘿，好像函数名有错别字。请参阅我的问题的编辑。

标签： javascript regex html-entities

【解决方案1】：

我认为当您使用“var foo = function() {...};”定义函数时，该函数仅在该行之后定义。换句话说，试试这个：

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");
cleanText = replaceHtmlEntities(text);

编辑：另外，仅在您第一次声明变量时使用“var”（您在 cleanText 变量上使用了两次）。

编辑2：问题在于函数名的拼写。你有“var replaceHtmlEntites =”。它应该是 "var replaceHtmlEntities ="

【讨论】：

是的，在我的脚本中，我在使用它的地方之前有这个功能。当我在这里发帖时，只是忘了这样做。但它没有用。

【解决方案2】：

如果您只需要替换&nbsp;，那么您可以使用更简单的正则表达式：

var textWithNBSpaceReplaced = originalText.replace(/&nbsp;/g, ' ');

另外，您的 div 示例中有一个拼写错误，它写的是 &nnbsp; 而不是 &nbsp;。

【讨论】：

如何与交互CDATA 块中的字符串（因为这是 XHTML）？
它并没有真正涵盖这种情况。如果需要走那么远，正则表达式可能是错误的解决方案。
我在帖子中输入了错字 - 如果我使用，Stack Overflow 会将实体转换为帖子预览中的实际空间
当我在 Firebug 中检查变量时，我没有看到 - 字符串看起来像一个有效的日期。使用 UTF8 编码将值粘贴到十六进制编辑器中显示 nbsp 已替换为 2 字节 unicode char \uC2A0

【解决方案3】：

这比你做的要容易得多。文本节点中不会有文字字符串"&nbsp;"，它会有对应的字符，代码为160。

function replaceNbsps(str) {
  var re = new RegExp(String.fromCharCode(160), "g");
  return str.replace(re, " ");
}

textNode.nodeValue = replaceNbsps(textNode.nodeValue);

更新

更简单：

textNode.nodeValue = textNode.nodeValue.replace(/\u00a0/g, " ");

【讨论】：

谢谢蒂姆。这行得通，事实证明比我做的要容易:)

【解决方案4】：

第一行相当混乱。它只需要：

var cleanText = text.replace(/\xA0/g,' ');

这应该就是你所需要的。

【讨论】：

这比接受的答案少。谢谢。

【解决方案5】：

我用过这个，效果很好：

var cleanText = text.replace(/&amp;nbsp;/g,"");

【讨论】：

【解决方案6】：

var text = "&quot;&nbsp;&amp;&lt;&gt;";
text = text.replaceHtmlEntites();

String.prototype.replaceHtmlEntites = function() {
var s = this;
var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
var translate = {"nbsp": " ","amp" : "&","quot": "\"","lt"  : "<","gt"  : ">"};
return ( s.replace(translate_re, function(match, entity) {
  return translate[entity];
}) );
};

试试这个.....这对我有用

【讨论】：

【解决方案7】：

删除所有此类符号具有的& 和; 之间的所有内容。如果你只是想摆脱它们。

text.replace(/&.*;/g,'');

【讨论】：

如果是否存在多个位置，则所有匹配的刺都将被替换。

【解决方案8】：

对我来说替换不起作用... 试试这个代码：

str = str.split("&quot;").join('"');

【讨论】：

【解决方案9】：

破解此问题的一种方法是将任何空行替换为两个或多个空格，并带有一些换行符和一个标记。然后发布降价，用该标记替换段落以换行。

// replace empty lines with "EMPTY_LINE"
rawMdText = rawMdText.replace(/\n  +(?=\n)/g, "\n\nEMPTY_LINE\n");
// put <br> at the end of any other line with two spaces
rawMdText = rawMdText.replace(/  +\n/, "<br>\n");

// parse
let rawHtml = markdownParse(rawMdText);

// for any paragraphs that end with a newline (injected above) 
// and are followed by multiple empty lines leading to
// another paragraph, condense them into one paragraph
mdHtml = mdHtml.replace(/(<br>\s*<\/p>\s*)(<p>EMPTY_LINE<\/p>\s*)+(<p>)/g, (match) => {
  return match.match(/EMPTY_LINE/g).map(() => "<br>").join("");
});

// for basic newlines, just replace them
mdHtml = mdHtml.replace(/<p>EMPTY_LINE<\/p>/g, "<br>");

它的作用是找到每个新行，只有几个空格+。它使用前瞻，以便它从正确的位置开始进行下一次替换，如果没有它，它将连续两行中断。

然后，markdown 会将这些行解析为只包含标记“EMPTY_LINE”的段落。因此，您可以浏览 rawHtml 并用换行符替换它们。

作为奖励，替换功能会将所有换行段落压缩为上段和下段（如果存在）。

实际上，您可以这样使用它：

A line with spaces at end  
  
  
and empty lines with spaces in between will condense into a multi-line paragraph.

A line with no spaces at end
  
  
and lines with spaces in between will be two paragraphs with extra lines between.

输出会是这样的：

<p>
  A line with spaces at end<br>
  <br>
  <br>
  and empty lines with spaces in between will condense into a multi-line paragraph.
</p>

<p>A line with no spaces at end</p>
<br>
<br>
<p>and lines with spaces in between will be two paragraphs with extra lines between.</p>

【讨论】：