【问题标题】:Cross Table / Tabular in R with dplyrR中的交叉表/表格与dplyr
【发布时间】:2021-03-16 01:29:57
【问题描述】:

我的数据

Data FactorA FactorB FactorC
D1 Yes Yes No
D2 No No Yes
D1 Weak No No No
D2 No Yes No
D1 Weak Yes No No
D2 No No No
D1 No No Yes
D2 Weak Yes No No
D1 Weak No No Yes
D2 No No No

我想要什么

并且想要一张这样的桌子:

FactorA FactorB FactorC
No 1 2
Weak No 0 1
Weak Yes 0 0
Yes 1 0

计算FactorA 的每个级别与FactorBFactorC 的“是”的成对共现。最好一次,整体并按Data分组。

我有什么

df %>% 
    group_by(Dataset) %>%
    group_by(FactorA, FactorB) %>% 
    summarise(num = n()) %>%
    spread(FactorB, num)

返回

# A tibble: 4 x 3
# Groups:   FactorA [4]
  FactorA    No   Yes
  <fct>         <int> <int>
1 No             1092    36
2 Weakly No       684    NA
3 Weakly Yes     2388    60
4 Yes            9660   216

(输出中的数字取自真实数据,而不是玩具数据)

问题

dplyr 风格 s.t. 中是否有一种时尚的方式来获得我想要的具有多个因素的表格?稍后我可以简单地将其拆分为Data

【问题讨论】:

  • 您可以dput 一些示例数据,以便其他人更容易提供帮助吗?

标签: r dplyr


【解决方案1】:

group_byFactorAYesFactorBFactorC 列中的计数。

library(dplyr)

df %>%
  group_by(FactorA) %>%
  summarise(across(FactorB:FactorC, ~sum(. == 'Yes')))

#  FactorA  FactorB FactorC
#* <chr>      <int>   <int>
#1 No             1       2
#2 Weak No        0       1
#3 Weak Yes       0       0
#4 Yes            1       0

【讨论】:

  • 我认为使用across(.cols = contains("Factor"), ~sum(. == 'Yes')) 可能会更好,以防有比FactorBFactorC 更多的因素
【解决方案2】:

基础

df <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1", 
                              "D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No", 
                                                             "Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes", 
                                                                                                                               "No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No", 
                                                                                                                                                                                                   "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                  -10L))



sapply(df[3:4], function(x) table(df$FactorA, x)[, 2])
#>          FactorB FactorC
#> No             1       2
#> Weak No        0       1
#> Weak Yes       0       0
#> Yes            1       0

reprex package (v1.0.0) 于 2021-03-16 创建

【讨论】:

    【解决方案3】:

    1) Base R 将除前两列之外的每一列与 Yes 进行比较,然后按第二列对它们进行分组求和。结果是以下单行。不使用任何包。当我对其进行基准测试时,它的运行速度比 dplyr 解决方案快近 3 倍。

    # column 1 is Data, column 2 is FactorA, rest are other factors
    aggregate(DF[-(1:2)] == "Yes", DF[2], sum)
    

    给予:

       FactorA FactorB FactorC
    1       No       1       2
    2  Weak No       0       1
    3 Weak Yes       0       0
    4      Yes       1       0
    

    2) 折叠 一个类似的可能运行得更快的方法是在折叠包中使用折叠。当我对其进行基准测试时,它的运行速度比 dplyr 解决方案快 19 倍。

    library(collapse)
    
    collap(+(slt(DF, FactorB:FactorC) == "Yes"), DF$FactorA, fsum)
    

    给予:

       FactorA FactorB FactorC
    1       No       1       2
    2  Weak No       0       1
    3 Weak Yes       0       0
    4      Yes       1       0
    

    注意

    可重现形式的输入:

    DF <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1", 
    "D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No", 
    "Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes", 
    "No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No", 
    "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, -10L))
    

    【讨论】:

      猜你喜欢
      • 2016-05-26
      • 1970-01-01
      • 1970-01-01
      • 2015-05-20
      • 2018-10-07
      • 1970-01-01
      • 2017-12-07
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多