正则表达式提取字符串的前 10 个字符答案

【问题标题】：Regex to extract first 10 characters of a string正则表达式提取字符串的前 10 个字符
【发布时间】：2014-05-02 06:27:51
【问题描述】：

有两种情况

第一种情况：

例如，我有一个 html 字符串，

 var str1="<b>hello how are you</b><h1>how to extract the first 10 characters of a htmlstring without losing the html of the string</h1>";

我必须在不丢失 html 的情况下提取字符串的前 10 个字符。所以预期的输出是

<b>hello how a<b>...

第二种情况：

我有一个简单的字符串如下

var str1="hello how are you.how to extract the first 10 characters of a htmlstring without losing the html of the string";

我必须提取字符串的前 10 个字符。所以预期的输出是

hello how a...

我想要一个适用于这两种情况的正则表达式。

我是正则表达式的新手。我尝试了很多，但我没有任何工作代码，所以我可以在这里发布它。请帮助。

【问题讨论】：

@aelor 预期输出是你好，如何..
你为什么需要一个正则表达式呢？如果您需要 10 个字符（我一直都理解），substring 不够吗？
&quot; 是 1 个字符还是 6 个字符？
您想剪切前 10 个字符但保留元素嵌套？例如输出将是<b>hello how a</b>?
@lpiepiora 那么我将失去 html

标签： javascript html regex

【解决方案1】：

根据your rephrased question;

Regexp 不是处理 html 的好工具。

正确的方法是解析 DOM。 Jack 举了一个例子，但假设您要保留的标记是您正在查看的节点的第一个子节点。

我上面链接的问题表明情况并非如此。然而，杰克的解决方案可以适应任意嵌套。我通过简单地计算节点的字符直到到达断点来做到这一点。然后递归地修改最终节点。最后，我删除了在找到所需字符数之后出现的所有节点。

function getNodeWithNChars(capture,node)
{
  var len=node.childNodes.length;
  var i=0;
  var toRemove=[];
  for(;i<len;i++)
  {
     if (capture===0)
     {
       toRemove.push(node.childNodes[i]);
     }
    else if (node.childNodes[i].textContent.length<capture)
    {
       capture=capture-node.childNodes[i].textContent.length;
    }
    else
    {
      if(node.childNodes[i].childNodes.length===0)
      {
        node.childNodes[i].textContent=node.childNodes[i].textContent.substring(0,capture);
        capture=0;
      }
      else
      {
        node.childNodes[i]=getNodeWithNChars(capture,node.childNodes[i]);
        capture=0;
      }
    }
  }
  i=0;
  for(;i<toRemove.length;i++)
  {
    node.removeChild(toRemove[i]);
  }
  return node;
}

function getNChars(n,str)
{
  var node = document.createElement('div');
  node.innerHTML = str;
  node=getNodeWithNChars(n,node);
  return node.innerHTML;
}

上述函数的调用示例；

console.log(getNChars(25,"hello how are <b>you</b> <em>how <b>to extract the</b> first 25 characters of a htmlstring without losing the html of the string</em>"));

【讨论】：

谢谢您，非常感谢您的努力，非常感谢，但是当@Taizo Ito 通知我无法在正则表达式中完成时，我已经完成了，可能对我有用不知何故，谢谢你的努力

【解决方案2】：

试试这个：

var str1="<b>hello how are you</b></h1>how to extract the first 10 characters of a htmlstring without losing the html of the string</h1>";
var res = str1.replace(/<(.*?\>)(.{11}).*/, '<$1$2</$1');
console.log(res);

【讨论】：

行内属性和值可能会弄乱...
@aleor ，但是当我在没有 html 的情况下提供输入时，它不起作用
你说你的预期输出是<b> hello how a <b>所以我给了你
@aelor，对不起，输入可能有或没有 html.. 请帮助.. 很抱歉我没有指定
您能否在每种情况下添加一两个场景和预期输出，这将非常有帮助

【解决方案3】：

这个怎么样：

regex = /(<[a-z0-9]+>|)([a-z0-9 ]{0,10})[a-z0-9 ]*(<\/[a-z0-9]+>|).*/

str1 = "hello how are you.how to extract the first 10 characters of a htmlstring without losing the html of the string"
console.log(str1.replace(regex, '$1$2$3'))

str1 = "<b>hello how are you</b><h1>how to extract the first 10 characters of a htmlstring without losing the html of the string</h1>"
console.log(str1.replace(regex, '$1$2$3'))

【讨论】：

@谢谢，但我怎么能把字符数改成 50
但是你能帮我把字符数改成 100
只需将“{0,10}”更改为“{0,100}”即可。
var regex = /(|)([a-z0-9 ]{0,100})[^\]*(|).*/;
谢谢您，先生，感谢您的努力...我也会选择您对 html 解析器的建议...非常感谢