【问题标题】:R counting words in a string and storing in an arrayR计算字符串中的单词并存储在数组中
【发布时间】:2019-07-29 12:25:52
【问题描述】:

我编写了一个函数来识别句子中的单词数,它是可扩展的,因为我可能希望在编写代码时跟踪多个单词的计数。但问题是将函数输出到数组中。我可以生成计数并生成一个 int 数组。但是它只输出最终条目而不是生成矩阵

Words = c("Hero","Dragon","Sword","Level up")
game_description = c("The hero slayed the dragon with his mighty sword",
                    "The protagonist received a level up following a fight", 
                    "The dragon lost his temper and started to level up")

WCounts = sapply(Words, function(x) str_count(if(is.atomic(game_description))
             {game_description} else {" "},regex(x,ignore_case=TRUE)))

输出

WCounts

  [1,]     0
  [2,]     1
  [3,]     0
  [4,]     1

尝试作为 2,

          [1,] [2,] [3,]
  [1,]     1    0    0
  [2,]     1    0    1
  [3,]     1    0    0
  [4,]     0    1    1


          ["Hero"] ["Dragon"] ["Sword"] ["Level up"]
  [1,]        1        1          1          0
  [2,]        0        0          0          1
  [3,]        0        1          0          1

【问题讨论】:

  • 我无法获得与您显示的相同的“WCounts”输出。它给出的结果与最后的预期输出完全相同

标签: r stringr


【解决方案1】:

我们可以使用相同的代码,得到预期的输出

WCounts = sapply(Words, function(x) str_count(if(is.atomic(game_description))
             {game_description} else {" "},regex(x,ignore_case=TRUE)))

WCounts
#      Hero Dragon Sword Level up
#[1,]    1      1     1        0 
#[2,]    0      0     0        1
#[3,]    0      1     0        1

或者更紧凑一点

library(qdapTools)
mtabulate(str_extract_all(game_description, paste0("(?i)", 
         paste(Words, collapse="|"))))

map

library(purrr)
library(stringr)
map(Words, ~ str_count(game_description, regex(.x, ignore_case = TRUE))) %>%
          do.call(cbind, .) 

或者我们可以将base R 方法与tableregmatches/regexpr 一起使用

+(table(stack(setNames(lapply(Words, function(x) 
  regmatches(game_description, regexpr(x, game_description, 
      ignore.case = TRUE))), seq_along(Words)))[2:1]) > 0)

【讨论】:

    【解决方案2】:

    在基础 R 中,我们可以在 game_descriptionWords 上使用 sapplygrepl

    out <- +(sapply(game_description, function(x) 
                    sapply(Words, grepl, x, ignore.case = TRUE)))
    colnames(out) <- NULL
    out
    
    #         [,1] [,2] [,3]
    #Hero        1    0    0
    #Dragon      1    0    1
    #Sword       1    0    0
    #Level up    0    1    1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-03-21
      • 1970-01-01
      • 2020-08-08
      相关资源
      最近更新 更多