【问题标题】:Counting strings in R在 R 中计算字符串
【发布时间】:2021-09-18 03:22:25
【问题描述】:

我有一个如下的数据集。我想分组,然后计算字符串的数量。非常感谢。

SO = c("Journal Of Business", "Journal Of Business", "Journal of Economy")

AU_UN = c("Dartmouth Coll;Wellesley Coll;Wellesley Coll",                                                                                             
          "Georgetown Univ;Fed Reserve Syst",
          "Georgetown Univ;Fed Reserve Syst")

df <- data.frame(SO, AU_UN);df

预期的答案

Journal Of Business      Dartmouth Coll (1);Wellesley Coll (2);  Georgetown Univ (1);Fed Reserve Syst (1)
Journal of Economy       Georgetown Univ (1); Fed Reserve Syst (1)

【问题讨论】:

    标签: r tidyverse tidyr data-manipulation stringr


    【解决方案1】:

    使用base::strsplit() 我们可以提取“子字符串”。 strsplit() 返回一个 list,其中包含每行字符串的 vector。新的list-columnnested column 可以与tidyr::unnest() 解除嵌套。要获取每个期刊的每个字符串的频率,我们使用dplyr::count()

    library(tidyverse)
    df %>% 
      mutate(strings  = strsplit(AU_UN, ";")) %>% 
      unnest(strings) %>% 
      count(SO, strings)
    #> # A tibble: 6 x 3
    #>   SO                  strings              n
    #>   <chr>               <chr>            <int>
    #> 1 Journal Of Business Dartmouth Coll       1
    #> 2 Journal Of Business Fed Reserve Syst     1
    #> 3 Journal Of Business Georgetown Univ      1
    #> 4 Journal Of Business Wellesley Coll       2
    #> 5 Journal of Economy  Fed Reserve Syst     1
    #> 6 Journal of Economy  Georgetown Univ      1
    

    【讨论】:

      【解决方案2】:

      使用separate_rows 转换为长格式,计算行数并使用summary 转换回来。

      library(dplyr)
      library(tidyr)
      
      df %>% 
        separate_rows(AU_UN, sep = ";") %>% 
        count(SO, AU_UN) %>% 
        group_by(SO) %>% 
        summarize(AU_UN = paste(sprintf("%s (%d)", AU_UN, n), collapse=";"), .groups = "drop")
      

      给予:

      # A tibble: 2 x 2
        SO                  AU_UN                                                                         
        <chr>               <chr>                                                                         
      1 Journal Of Business Dartmouth Coll (1);Fed Reserve Syst (1);Georgetown Univ (1);Wellesley Coll (2)
      2 Journal of Economy  Fed Reserve Syst (1);Georgetown Univ (1)                 
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2017-07-09
        • 2018-11-20
        • 1970-01-01
        • 2014-06-04
        • 2015-02-06
        • 1970-01-01
        • 2019-09-14
        相关资源
        最近更新 更多