如果在 Ruby 中使用 `.match` 在段落中找到字符串，则将字符串附加到数组答案

【问题标题】：Append strings to array if found in paragraph using `.match` in Ruby如果在 Ruby 中使用 `.match` 在段落中找到字符串，则将字符串附加到数组
【发布时间】：2016-12-18 22:30:06
【问题描述】：

我正在尝试为数组中的每个单词搜索一个段落，然后输出一个新数组，其中仅包含可以找到的单词。

但到目前为止，我一直无法获得所需的输出格式。

paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
The four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
The country is divided into 47 prefectures in eight regions."

words_to_find = %w[ Japan archipelago fishing country ]

words_found = []

words_to_find.each do |w|
    paragraph.match(/#{w}/) ? words_found << w : nil
end

puts words_found

目前我得到的输出是打印单词的垂直列表。

Japan
archipelago
country

但我想要['Japan', 'archipelago', 'country']。

我没有太多在段落中匹配文本的经验，并且不确定我在这里做错了什么。谁能给点指导？

【问题讨论】：

words_found 已经是你想要的了。 puts 每行打印一个元素。
啊，谢谢。我需要阅读 puts 与 p
P.S.你可以p words_found看看它到底是什么。

标签： arrays ruby regex

【解决方案1】：

这里有几种方法可以做到这一点。两者都不区分大小写。

使用正则表达式

r = /
    \b                               # Match a word break
    #{ Regexp.union(words_to_find) } # Match any word in words_to_find
    \b                               # Match a word break
    /xi                              # Free-spacing regex definition mode (x)
                                     # and case-indifferent (i)
  #=> /
  #   \b                             # Match a word break
  #   (?-mix:Japan|archipelago|fishing|country) # Match any word in words_to_find
  #   \b                             # Match a word break
  #   /ix                            # Free-spacing regex definition mode (x)
                                     # and case-indifferent (i)

paragraph.scan(r).uniq(&:itself)
  #=> ["Japan", "archipelago", "country"]

两个数组相交

words_to_find_hash = words_to_find.each_with_object({}) { |w,h| h[w.downcase] = w }
  #=> {"japan"=>"Japan", "archipelago"=>"archipelago", "fishing"=>"fishing",
       "country"=>"country"}  

words_to_find_hash.values_at(*paragraph.delete(".;:,?'").
                               downcase.
                               split.
                               uniq & words_to_find_hash.keys)
  #=> ["Japan", "archipelago", "country"]

【讨论】：

【解决方案2】：

这是因为您使用puts 来打印数组的元素。将"\n" 附加到每个元素“word”的末尾：

#!/usr/bin/env ruby
def run_me



    paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
    the four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
    the country is divided into 47 prefectures in eight regions."

    words_to_find = %w[ Japan archipelago fishing country ]


    find_words_from_a_text_file paragraph , words_to_find



end



def  find_words_from_a_text_file( paragraph  , *words_to_find )
    words_found = []

    words_to_find.each do |w|
              paragraph.match(/#{w}/) ? words_found << w : nil
    end

    #  print array with enum .  
    words_found.each { |x| puts "with enum and puts : : #{x}" }

    # or just use "print , which does not add anew line"
    print "with print :"; print  words_found "\n"

    # or with p
    p words_found

end


run_me

输出：

za:ruby_dir za$ ./fooscript.rb 
with enum and puts : : ["Japan", "archipelago", "fishing", "country"]
with print :[["Japan", "archipelago", "fishing", "country"]]

【讨论】：