【问题标题】:Javascript - string.split(regex) keep separatorsJavascript - string.split(regex) 保留分隔符
【发布时间】:2010-11-17 12:14:15
【问题描述】:

我想使用正则表达式拆分字符串,并将分隔符/匹配信息包含在结果数组中。

在我使用的java中:

theString.split("(?<=[!><=}{])|(?=[!><=}{])|(?<= AND )|(?= AND )|(?<= OR )|(?= OR )")

但是,javascript 不支持lookbehind ?&lt;=

例如我想要字符串:

"Reason={Existing problem or fault}{Bestaande probleem of vout}{Other}{Ander} and Required!=No and Results >=10 and Results <=25 and Tst>5 and Tst<80 and Info=test this or that and those and Success!=Yes"

拆分:

Reason,=,{,Existing problem, or ,fault,},{,Bestaande probleem of vout,},{,Other,},{,Ander,}, and ,Required,!,=,No, and ,Results,>,=,10, and ,Results,<,=,25, and ,Tst,>,5, and ,Tst,<,80, and ,Info,=,test this, or ,that, and ,those, and ,Success,!,=,Yes

我所拥有的示例:

var thestr = "Reason={Existing problem or fault}{Bestaande probleem of vout}{Other}{Ander} and Required!=No and Results >=10 and Results <=25 and Tst>5 and Tst<80 and Info=test this or that and those and Success!=Yes";

document.write("::SPLIT::<br>");
var patt1=new RegExp(/([!><=}{])|( AND )|( OR ) /gi);

var x = thestr.split(patt1);
//This splits correctly but, doesn't include the separators / matched characters
document.write("length="+x.length+"<br>");
for (c=0;c<x.length;c++) {
    document.write(c+" - "+ x[c]+" |");
}

document.write("<br><br>::MATCH::<br>");

var y = thestr.match(patt1);

//This shows the matched characters but, how do I combine info from split and match
document.write("length="+y.length+"<br>");
for (d=0;d<y.length;d++) {
    document.write(d+" - "+ y[d]+" |");
}

document.write("<br><br>::INCLUDE SEPERATORS::<br>");
var patt2=new RegExp(/(?![!><=}{])|(?=[!><=}{])|(?! AND )|(?= AND )|(?! OR )|(?= OR ) /gi);
//This puts everything in the array, but, each character is a seperate array element.
// Not what I wanted to achieve.
var bits = thestr.split(patt2);
document.write("length="+bits.length+"<br>");
for (r=0;r<bits.length;r++) {
    document.write(r+" - "+ bits[r]+" |");
}

【问题讨论】:

  • 所以你基本上想在`或, 和`上拆分,并且基本上在任何两个字符之间,除了字母数字字符或空格之间?

标签: javascript regex


【解决方案1】:

如果你把整个模式放在一个组中,你也会得到分隔符:

thestr.split(/([!><=}{]| (?:AND|OR) )/)

这会返回一个类似的数组:

["Reason", "=", "", "{", "Existing problem or fault", "}", "", "{", "Bestaande probleem of vout", "}", "", "{", "Other", "}", "", "{", "Ander", "}", " and Required", "!", "", "=", "No and Results ", ">", "", "=", "10 and Results ", "<", "", "=", "25 and Tst", ">", "5 and Tst", "<", "80 and Info", "=", "test this or that and those and Success", "!", "", "=", "Yes"]

然后你只需要过滤空字符串就完成了:

thestr.split(/([!><=}{]| (?:AND|OR) )/).filter(Boolean)

编辑    由于 Internet Explorer 和可能的其他浏览器不会将分组分隔符带入结果数组,因此您可以这样做:

var matches = thestr.split(/(?:[!><=}{]| (?:AND|OR) )/),
    separators = thestr.match(/(?:[!><=}{]| (?:AND|OR) )/g);
for (var i=0; i<separators.length; ++i) {
    matches[i+1] = separators[i];
}

这基本上将分隔符与其他部分分开,然后将两者结合起来。

【讨论】:

  • 不幸的是(对于我们所有人来说),Internet Explorer 既不支持从 split()filter() 方法保存捕获的组,所以这不是开箱即用的跨浏览器解决方案。
  • +1 供您编辑。还有 Steven Levithan 的 cross browser split 函数,它更符合规范。
【解决方案2】:

不要太深入你的查询结构,我建议你使用 replace 方法和一个函数作为替换,它会将术语收集到一个数组中:

function parse(sQuery) {
    var aParsed = [];
    var oReTerms = /.../gim;
    sQuery.replace(oReTerms, function($0, $1, $2, ...) {
        //...
        if ($1) {
            aParsed.append($1);
        }
        if ($2) {
            aParsed.append($2);
        }
        //...
        return $0; // return what was matched (or any string)
    });
    return aParsed;
}

我之前这样做是为了解析 HTML 标记和属性。我希望这个想法很清楚。您只需定义正则表达式,使其匹配查询中的所有术语。

对于特定情况,您可以在替换函数中进行另一个替换。

【讨论】:

【解决方案3】:

如果正则表达式拆分包含捕获组,我不确定 JavaScript 的行为方式。我知道在 Python 中,如果将分隔分隔符括在捕获括号中,它就会成为匹配的一部分。

试试

result = subject.split(/( or )|( and )|([^\w\s])\b|(?=[^\w\s])/i);

看看会发生什么。

【讨论】:

    【解决方案4】:
    function split2(str, re) {
        if (re.global) {
            // Reset to start of string
            re.lastIndex = 0;
        }
        var result = [];
        var match = re.exec(str);
        var lastEnd = 0;
        while (match != null) {
            if (match.index > lastEnd) {
                result.push(str.substring(lastEnd, match.index));
            }
            result.push(match[0]);
            lastEnd = match.index + match[0].length;
            match = re.exec(str);
        }
        result.push(str.substring(lastEnd));
        return result;
    }
    
    var thestr = "Reason={Existing problem or fault}{Bestaande probleem of vout}{Other}{Ander} and Required!=No and Results >=10 and Results <=25 and Tst>5 and Tst<80 and Info=test this or that and those and Success!=Yes";
    
    var patt = /[!><=}{]| AND | OR /gi;
    
    split2(thestr,patt):
    

    输出:

    ["Reason", "=", "{", "Existing problem", " or ", "fault", "}", "{",
    "Bestaande probleem of vout", "}", "{", "Other", "}", "{", "Ander", "}", " and ",
    "Required", "!", "=", "No", " and ", "Results ", ">", "=", "10", " and ",
    "Results ", "<", "=", "25", " and ", "Tst", ">", "5", " and ", "Tst", "<", "80",
    " and ", "Info", "=", "test this", " or ", "that", " and ", "those", " and ",
    "Success", "!", "=", "Yes"]
    

    【讨论】:

      【解决方案5】:

      上面 Gumbo 的拆分功能是个好主意,但它不起作用。应该是:

      function split(str, regex) {
          var matches    = str.split(regex),
              separators = str.match(regex),
              ret        = [ matches[0] ];
          if (!separators) return ret;
          for (var i = 0; i < separators.length; ++i) {
              ret[2 * i + 1] = separators[i];
              ret[2 * i + 2] = matches[i + 1];
          }
          return ret;
      }
      
      split('a,b,c', /,/g); // returns ["a", ",", "b", ",", "c"]
      

      【讨论】:

        【解决方案6】:

        要支持大多数正在使用的浏览器,您可以匹配您的字符串

        此模式匹配除分隔符之外的任意数量的字符,!{}=, 或其中一个分隔符。

        var rx=/([^<>!{}=]+|[<>!{}=])/g
        
        var str='Reason={Existing problem or fault}{Bestaande probleem of vout}'+
        '{Other}{Ander} and Required!=No and Results >=10 and Results <=25 '+
        'and Tst>5 and Tst<80 and Info=test this or that and those and Success!=Yes';
        
        
        str.match(rx).join('\n')
        
        //returned value:
        Reason
        =
        {
        Existing problem or fault
        }
        {
        Bestaande probleem of vout
        }
        {
        Other
        }
        {
        Ander
        }
         and Required
        !
        =
        No and Results 
        >
        =
        10 and Results 
        <
        =
        25 and Tst
        >
        5 and Tst
        <
        80 and Info
        =
        test this or that and those and Success
        !
        =
        Yes
        

        // 为了便于阅读,我将字符串连接起来并加入了结果

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2011-10-30
          • 1970-01-01
          • 2017-04-01
          • 1970-01-01
          • 2017-07-24
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多