用 Ruby 将单词分解为字母答案

【问题标题】：Decompose words into letters with Ruby用 Ruby 将单词分解为字母
【发布时间】：2018-03-02 03:42:17
【问题描述】：

在我的语言中有复合或复合字母，由多个字符组成，例如“ty”、“ny”，甚至“tty”和“nny”。我想编写一个 Ruby 方法（拼写），根据这个字母表将单词标记为字母：

abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h

生成的哈希键显示字母表的现有字母/复合字母，还显示哪个字母是辅音（“c”），哪个是元音（“v”），因为以后我想用这个hash 将单词分解为音节。在词的共同边界处意外形成复合字母的复合词的情况当然不应该用这种方法解决。

例子：

spell("csobolyó") => [ "cs", "o", "b", "o", "ly", "ó" ]
spell("nyirettyű") => [ "ny", "i", "r", "e", "tty", "ű" ]
spell("dzsesszmuzsikus") => [ "dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s" ]

【问题讨论】：

到目前为止你尝试过什么？这将非常复杂，因此如果您可以将其限制在您需要帮助的特定区域，我认为您在这里会有更好的运气。就目前而言，有很多边缘情况，那些母语不是你的语言的人（也许那些会说你的语言的人）将无法解决......例如，如果我看到@987654323 @ 在一个字符串中，可能是["dzs"]，或["d", "zs"]，或["dz", "s"] 或["d", "z", "s"]，并且没有字典（或对这种语言了解很多），我认为我们不能确定哪个是正确的
这就是我对字母表中的字母进行排序的原因：如果一个字母出现得较早，那么应该识别它而不是它的简单字母。当一个词包含“dzs”时，它应该被认为是“dzs”而不是“d”和“zs”。在极少数情况下它会给出一些错误的结果，但大多数分解都会起作用。我不知道如何有效地做到这一点。也许是一些内置的字符串标记器，或者什么。

标签： ruby methods tokenize letters alphabet

【解决方案1】：

您也许可以开始查看 String#scan，这似乎为您的示例提供了不错的结果：

"csobolyó".scan(Regexp.union(abc.keys))
# => ["cs", "o", "b", "o", "ly", "ó"]
"nyirettyű".scan(Regexp.union(abc.keys))
# => ["ny", "i", "r", "e", "tty", "ű"]
"dzsesszmuzsikus".scan(Regexp.union(abc.keys))
# => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]

最后一种情况与您的预期输出不匹配，但匹配 your statement in the comments

我对字母表中的字母进行了排序：如果一个字母出现得较早，那么应该识别它而不是它的简单字母。当一个词包含“dzs”时，它应该被认为是“dzs”而不是“d”和“zs”

【讨论】：

一般Regexp.union 比join("|") 更安全，但在这种情况下可能无关紧要，因为我们只处理单词字符。
啊，是的，好点，不要处理动态正则表达式，完全忘记union的存在。更新
是的，它按预期工作，我在示例中输入了错误的结果，现在我修复了它。
String#scan 是赢家，而 Regexp.union，但是在标记化时，键的顺序很重要，因为一些正则表达式模式是其他模式的前缀。
abc.keys 应该按照 Ruby 最新版本（> 2.0，可能更早）的插入顺序返回键，因此应该遵守键的顺序

【解决方案2】：

我没有使用你排序的偏好，而是我使用高字符单词会比低字符单词具有更高的偏好。

def spell word
  abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h
  current_position = 0
  maximum_current_position = 2
  maximum_possible_position = word.length
  split_word = []
  while current_position < maximum_possible_position do 
    current_word = set_current_word word, current_position, maximum_current_position
    if abc[current_word] != nil
      current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position
      split_word.push(current_word)
    else
      maximum_current_position = update_max_current_position maximum_current_position
      current_word = set_current_word word, current_position, maximum_current_position
      if abc[current_word] != nil
        current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position
        split_word.push(current_word)
      else
        maximum_current_position = update_max_current_position maximum_current_position
        current_word = set_current_word word, current_position, maximum_current_position
        if abc[current_word] != nil
          current_position, maximum_current_position = update_current_position_and_max_current_position current_position, maximum_current_position          
          split_word.push(current_word)
        else
          puts 'This word cannot be formed in the current language'
          break
        end
      end
    end
  end
  split_word
end

def update_max_current_position max_current_position
    max_current_position = max_current_position - 1
end

def update_current_position_and_max_current_position current_position,max_current_position
    current_position = max_current_position + 1
    max_current_position = current_position + 2
    return current_position, max_current_position
end

def set_current_word word, current_position, max_current_position
  word[current_position..max_current_position]
end

puts "csobolyó => #{spell("csobolyó")}"
puts "nyirettyű => #{spell("nyirettyű")}"
puts "dzsesszmuzsikus => #{spell("dzsesszmuzsikus")}"

输出

csobolyó => ["cs", "o", "b", "o", "ly", "ó"]
nyirettyű => ["ny", "i", "r", "e", "tty", "ű"]
dzsesszmuzsikus => ["dzs", "e", "ssz", "m", "u", "zs", "i", "k", "u", "s"]

【讨论】：

【解决方案3】：

同时我设法编写了一个有效的方法，但比 String#scan 慢 5 倍：

abc=[*%w{tty ccs lly ggy ssz nny dzs zzs sz zs cs gy ny dz ty ly q w r t z p l k j h g f d s x c v b n m y}.map{|z| [z,"c"]},*"eéuioöüóőúűáía".split(//).map{|z| [z,"v"]}].to_h

def spell(w,abc)


    s=w.split(//)
    p=""
    t=[]

    for i in 0..s.size-1 do
      p << s[i]
      if i>=s.size-2 then

       if abc[p]!=nil then
          t.push p
          p=""

       elsif abc[p[0..-2]]!=nil then
          t.push p[0..-2]
          p=p[-1]

       elsif abc[p[0]]!=nil then
          t.push p[0]
          p=p[1..-1]

       end 

      elsif p.size==3 then
       if abc[p]!=nil then
          t.push p
          p=""

       elsif abc[p[0..-2]]!=nil then
          t.push p[0..-2]
          p=p[-1]

       elsif abc[p[0]]!=nil then
          t.push p[0]
          p=p[1..-1]
       end
      end
    end

    if p.size>0 then
        if abc[p]!=nil then
          t.push p
          p=""

       elsif abc[p[0..-2]]!=nil then
          t.push p[0..-2]
          p=p[-1]
      end
    end

    if p.size>0 then
      t.push p
    end
    return t
end

【讨论】：