计算字符串的第一个字母并显示它出现的次数，但在 R 中不是按字母顺序排列答案

【问题标题】：Counting first letter of string and showing how many times it appears, but not in alphabetical order in R计算字符串的第一个字母并显示它出现的次数，但在 R 中不是按字母顺序排列
【发布时间】：2020-08-27 13:05:30
【问题描述】：

我目前编写了这段代码来计算代码的第一个字母出现在表格的特定列中的次数。

#a test data frame    
test <- data.frame("State" = c("PA", "RI", "SC"), "Code1" = c("EFGG, AFGG", "SSAG", "AFGG, SSAG"))

#code to count method codes
test[] <- lapply(test, as.character)


test_counts <- sapply(strsplit(test$Code1, ",\\s+"), function(x) {
  tab <- table(substr(x, 1, 1)) # Create a table of the first letters
  paste0(names(tab), tab, collapse = ", ") # Paste together the letter w/ the number and collapse 
them
} )

#example of output
[1] "A1, E1" "S1"     "A1, S1"

关于当前代码的一切都是完美的，除了我希望 R 不按字母顺序输出计数。我希望它保留代码的顺序。所以这就是我希望输出的样子：

 [1] "E1, A1", "S1", "A1, S1"

谢谢！！

【问题讨论】：

标签： r string count strsplit preserve

【解决方案1】：

这是使用factor 解决问题的基本 R 选项

sapply(
  strsplit(test$Code1, ", "),
  function(x) {
    toString(
      do.call(
        paste0,
        rev(stack(table(factor(u<-substr(x, 1, 1),levels = unique(u)))))
      )
    )
  }
)

给了

[1] "E1, A1" "S1"     "A1, S1"

【讨论】：

【解决方案2】：

tidyverse 的另一个选项。我们可以用separate_rows 拆分'Code1'，得到count 并在arrange 之后根据频率列执行group_by paste

library(dplyr)
library(tidyr)
test %>% 
    separate_rows(Code1) %>%
    mutate(Code1 = substr(Code1, 1, 1)) %>%
    count(State, Code1) %>% 
    arrange(State, n) %>% 
    unite(Code1, Code1, n, sep="") %>% 
    group_by(State) %>% 
    summarise(Code1 = toString(Code1), .groups = 'drop') %>% 
    pull(Code1)
#[1] "A1, E1" "S1"     "A1, S1"

【讨论】：