【问题标题】:How to count instances in table in R如何计算R中表中的实例
【发布时间】:2020-04-01 11:10:38
【问题描述】:

我正在尝试生成一个表,其中包含每个实例的计数,一个变量出现在按一列中的变量分组的数据框中

我的桌子是这样的:

Infected  Education age    sex    race     Score
       0      missing   35   Female   missing   1371.07
       1      Higher    39   Female   Black     1466.49
       0      Higher    27   Female   Asian     8020.09
       1      A-level   36   Female   Black     398.67
       1      GCSE      32   Male     Other     1312.80

这是用于生成它的代码:

 df<-  structure(list(Infected = structure(c(1L, 2L, 1L, 2L, 2L), .Label = c("0", 
    "1"), class = "factor"), Education = structure(c(1L, 4L, 4L, 
    2L, 3L), .Label = c("missing", "A-level", "GCSE", "Higher"), class = "factor"), 
        age = c(35L, 39L, 27L, 36L, 32L), sex = structure(c(3L, 3L, 
        3L, 3L, 2L), .Label = c("Missing_Other", "Male", "Female"
        ), class = "factor"), race = structure(c(1L, 3L, 2L, 3L, 
        4L), .Label = c("missing", "Asian", "Black", "Other", "White"
        ), class = "factor"), Score = c(1371.06994628906, 1466.48999023438, 
        8020.08984375, 398.670013427734, 1312.80004882812)), class = "data.frame", row.names = c(221L, 
    261L, 444L, 561L, 702L))

我已经尝试使用 dplyr 包对实例进行计数和分组,但我是 R 新手,所以我担心我的代码没有给出我想要的结果。

这是我已经尝试过的代码,但我不确定如何更改它以产生我想要的结果:

table <-df %>% group_by(Infection) %>% count(sex,Education,age,race,Score)

我想要的输出如下所示:

                 Infection_1     Infection_0    Infection_All
**ALLSex**                 
Male                 1(0%)         0(0%)            1(20%)
Female               2(40%)         2(40%)           4(80%
**Education**
Missing              0(0%)          1(20%)           1(20%)
Higher               1(20%)         1(20%)           2(40%)
Alevel               1(20%)         0(0%)            2(20%)
GCSE                 1(20%)         0(0%)            1(20%)
**Race**
Black                2(40%)         0(0%)            2(40%)
Asian                1(20%)         0(0%)            1(20%)
Other                0(0%)          1(20%)           1(20%)
White                0(0%)          0(0%)            0(0%)
Other                1(20%)         0(0%)            1(20%)

【问题讨论】:

  • 感谢您提供可重现的示例

标签: r dplyr


【解决方案1】:

您需要一些dplyr 步骤来实现所需的表。以下是在tibble 中获取计数的方法。

df %>% 
  select(-Score, -age) %>%
  gather(key="Category", value="Level", -Infected) %>%
  mutate(Infected = paste("Infected", Infected, sep="_")) %>%
  group_by(Category, Level, Infected) %>%
  count() %>%
  spread(Infected, n, fill = 0) %>%
  mutate(Infected_all = Infected_0 + Infected_1)
# A tibble: 10 x 5
# Groups:   Category, Level [10]
   Category  Level   Infected_0 Infected_1 Infected_all
   <chr>     <chr>        <dbl>      <dbl>        <dbl>
 1 Education A-level          0          1            1
 2 Education GCSE             0          1            1
 3 Education Higher           1          1            2
 4 Education missing          1          0            1
 5 race      Asian            1          0            1
 6 race      Black            0          2            2
 7 race      missing          1          0            1
 8 race      Other            0          1            1
 9 sex       Female           2          2            4
10 sex       Male             0          1            1
Warning message:
attributes are not identical across measure variables;
they will be dropped  

这里是描述的步骤。

使用select 丢弃不必要的列,然后使用gather 旋转这些列并设置结果列的名称。

> df %>% 
+   select(-Score, -age) %>%
+   gather(key="Category", value="Level", -Infected)
   Infected  Category   Level
1         0 Education missing
2         1 Education  Higher
3         0 Education  Higher
4         1 Education A-level
5         1 Education    GCSE
6         0       sex  Female
7         1       sex  Female
8         0       sex  Female
9         1       sex  Female
10        1       sex    Male
11        0      race missing
12        1      race   Black
13        0      race   Asian
14        1      race   Black
15        1      race   Other

使用 mutate 将 Infected 列的值替换为其名称和值。这些将在稍后用作列名。执行您已经知道的计数。

> df %>% 
+   select(-Score, -age) %>%
+   gather(key="Category", value="Level", -Infected) %>%
+   mutate(Infected = paste("Infected", Infected, sep="_")) %>%
+   group_by(Category, Level, Infected) %>%
+   count()
# A tibble: 12 x 4
# Groups:   Category, Level, Infected [12]
   Category  Level   Infected       n
   <chr>     <chr>   <chr>      <int>
 1 Education A-level Infected_1     1
 2 Education GCSE    Infected_1     1
 3 Education Higher  Infected_0     1
 4 Education Higher  Infected_1     1
 5 Education missing Infected_0     1
 6 race      Asian   Infected_0     1
 7 race      Black   Infected_1     2
 8 race      missing Infected_0     1
 9 race      Other   Infected_1     1
10 sex       Female  Infected_0     2
11 sex       Female  Infected_1     2
12 sex       Male    Infected_1     1

使用spread 函数将行转回列。使用 mutate 添加 Infected all 列。

然后你可以使用xtable等其他包来格式化输出。

【讨论】:

  • 不错的答案!仅供参考:gatherspread 即将退出市场,将来某个时候将被 pivot_longerpivot_wider 取代。
  • @Emer 非常感谢您出色而详细的回答。如果我要在每个单元格中添加百分比,我可以使用 dplyr 吗?我试图将 %>% summarise(paste0((n/nrow(df))*100, '%')) 添加到返回错误的列的末尾
  • @H.B 为百分比添加%&gt;% mutate(Infected_all_pct = paste0(Infected_all / nrow(df) * 100, '%'))
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2018-10-06
  • 1970-01-01
  • 2021-05-20
  • 2022-11-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多