【问题标题】:Split an array of strings into an array of arrays of strings将字符串数组拆分为字符串数组
【发布时间】:2016-04-01 06:19:24
【问题描述】:

我正在寻找一种方法来拆分这个字符串数组:

["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can", "parse", "this",
"text", "?", "Without", "any", "errors", "!"]

以标点符号结尾的组:

[
  ["this", "is", "a", "test", "."],
  ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
  ["Without", "any", "errors", "!"]
]

有没有简单的方法来做到这一点?迭代数组,将每个索引添加到临时数组,并在找到标点符号时将该临时数组附加到容器数组中是最明智的方法吗?

我在考虑使用slicemap,但我不知道是否可行。

【问题讨论】:

    标签: arrays ruby string grouping slice


    【解决方案1】:

    查看Enumerable#slice_after:

    x.slice_after { |e| '.?!'.include?(e) }.to_a
    

    【讨论】:

      【解决方案2】:

      @ndn 给出了这个问题的最佳答案,但我会建议另一种可能适用于其他问题的方法。

      您给出的数组通常是通过在空格或标点符号上拆分字符串来获得的。例如:

      s = "this is a test. I wonder if I can parse this text? Without any errors!"
      s.scan /\w+|[.?!]/
        #=> ["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can",
        #    "parse", "this", "text", "?", "Without", "any", "errors", "!"] 
      

      在这种情况下,您可能会发现以其他方式直接操作字符串更方便。例如,在这里,您可以首先使用带有正则表达式的String#split 将字符串s 分解为句子:

      r1 = /
           (?<=[.?!]) # match one of the given punctuation characters in capture group 1
           \s*   # match >= 0 whitespace characters to remove spaces
           /x    # extended/free-spacing regex definition mode
      
      a = s.split(r1)
        #=> ["this is a test.", "I wonder if I can parse this text?",
        #    "Without any errors!"] 
      

      然后拆分句子:

      r2 = /
           \s+       # match >= 1 whitespace characters
           |         # or
           (?=[.?!]) # use a positive lookahead to match a zero-width string
                     # followed by one of the punctuation characters
           /x
      
      b = a.map { |s| s.split(r2) }
        #=> [["this", "is", "a", "test", "."],
        #    ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
        #    ["Without", "any", "errors", "!"]]
      

      【讨论】:

      • 不幸的是,这个解决方案似乎丢失了输出中的标点符号。
      猜你喜欢
      • 2012-02-22
      • 2011-01-10
      • 1970-01-01
      • 2012-06-27
      • 1970-01-01
      • 2012-12-22
      相关资源
      最近更新 更多