【问题标题】:Pivot, count and group by a certain variable in df with categorical observations使用分类观察按 df 中的某个变量进行透视、计数和分组
【发布时间】:2021-01-02 14:44:16
【问题描述】:

我有这个带有可重现数据的 df:

structure(list(`Loperamida en diarrea` = structure(c(1L, 1L, 
1L, 2L, 4L, 3L, 4L, 1L, 2L, 1L), .Label = c("muy efectiva", "algo efectiva", 
"no efectiva", "no se"), class = c("ordered", "factor")), `Carbón en diarrea` = structure(c(2L, 
2L, 2L, 4L, 4L, 3L, 4L, 3L, 3L, 4L), .Label = c("muy efectiva", 
"algo efectiva", "no efectiva", "no se"), class = c("ordered", 
"factor")), `Bismuto en diarrea` = structure(c(2L, 1L, 2L, 4L, 
3L, 3L, 2L, 2L, 2L, 1L), .Label = c("muy efectiva", "algo efectiva", 
"no efectiva", "no se"), class = c("ordered", "factor")), `Rifaximina en diarrea` = structure(c(2L, 
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L), .Label = c("muy efectiva", 
"algo efectiva", "no efectiva", "no se"), class = c("ordered", 
"factor")), `Otros antibióticos en diarrea` = structure(c(2L, 
1L, 2L, 2L, 2L, 1L, 1L, 3L, 3L, 4L), .Label = c("muy efectiva", 
"algo efectiva", "no efectiva", "no se"), class = c("ordered", 
"factor")), `Probióticos en diarrea` = structure(c(2L, 2L, 2L, 
2L, 2L, 1L, 2L, 3L, 3L, 2L), .Label = c("muy efectiva", "algo efectiva", 
"no efectiva", "no se"), class = c("ordered", "factor")), `Orientación dicotómica` = c("Neurogastro", 
"Neurogastro", "Neurogastro", "No neurogastro", "Neurogastro", 
"Neurogastro", "Neurogastro", "No neurogastro", "No neurogastro", 
"No neurogastro")), row.names = c(NA, 10L), class = "data.frame")

我创建了一个数据框,通过使用以下代码旋转 df 来计算分类观察:

library(tidyverse)
library(janitor)

df %>%
         pivot_longer(cols = everything()) %>%
  count(name, value) %>%
  pivot_wider(names_from = value, values_from = n, values_fill = 0) %>%
        mutate("efectiva" = `algo efectiva` + `muy efectiva`) %>%
                arrange(desc(`efectiva`)) %>%
        select(c(`name`,`efectiva`, `no efectiva`)) %>%
          adorn_percentages("row") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns()

df$name <- str_remove(df$name, " en diarrea")

结果如下所示:

                name    efectiva no efectiva
             Rifaximina 100.0% (10)    0.0% (0)
            Probióticos  80.0%  (8)   20.0% (2)
                Bismuto  77.8%  (7)   22.2% (2)
             Loperamida  87.5%  (7)   12.5% (1)
     Otros antibióticos  77.8%  (7)   22.2% (2)
                 Carbón  50.0%  (3)   50.0% (3)
 Orientación dicotómica      -  (0)       - (0)

我一直在尝试通过变量 Orientación dicotómica(Neurogastro 与 No Neurogastro) 来分隔列,但我无法对其进行排序。我期望的是这样的:


                             Neurogastro                 No neurogastro 
                 name   efectiva    no efectiva     efectiva    no efectiva
             Rifaximina 98.1% (52)   1.9%  (1)      96.4% (240)  3.6%   (9)
                  Dieta 98.1% (51)   1.9%  (1)      91.6% (229)  8.4%  (21)
            Trimebutina 96.0% (48)   4.0%  (2)      86.3% (214) 13.7%  (34)
          Amitriptilina 97.8% (45)   2.2%  (1)      88.8% (214) 11.2%  (27)
 Trimebutina/simeticona 88.2% (45)  11.8%  (6)      84.0% (205) 16.0%  (39)
       Antiespasmódicos 93.6% (44)   6.4%  (3)      81.4% (184) 18.6%  (42)

有什么建议吗?

【问题讨论】:

  • 什么是df_count?请为adorn_percentagesadorn_pct_formattingadorn_ns 指定库?
  • 您的df 包含 10 行并且示例输出显示超过 10 行?你在这里错过了什么???洛哌丁胺没有行吗?
  • 抱歉耽搁了@AnilGoyal。我ve seen Id 忘记了库,但您已经编辑了问题。装饰函数在 janitor 包中,pivot 在 tidyverse 中。
  • 我在 dput() 中使用了 df 的缩短版本,因此它与我发布的不匹配。我已经编辑了输出,使其与数据匹配。最后一个输出只是一个例子,不要管值。

标签: r dataframe join merge janitor


【解决方案1】:

这不是我想要的,但它非常接近:

## fist I create 2 df's with filter:
library(dplyr)
df1 <- df %>% filter(`Orientación dicotómica` == "Neurogastro")
df2 <- df %>% filter(`Orientación dicotómica` != "Neurogastro")

## then I bind the 2 df's
df3 <- cbind(df1,df2) 

## finally I drop the repeated column and create a new df with renamed columns
df4 <- as.data.frame(as.matrix(df3[-4]) %>%
  list(name = df_tto$name, Neurogastro = df3[, c(2,3)], No_neurogastro = df3[,c(5,6)])) 

df4 <- df4[-(16),-(2:6)]

这可能有一个更好的答案和一个简化的代码,但这就是我所能想到的,而且无论如何,它几乎可以完成这项工作......

【讨论】:

  • (上面的代码是完整的数据集。文章开头的可重现数据是我正在使用的数据的缩写)
【解决方案2】:

编辑虽然有点延迟

library(janitor)
library(tidyverse)

df %>%
  pivot_longer(cols = 1:6) %>%
  count(`Orientación dicotómica`, name, value) %>%
  pivot_wider(id_cols = c(`Orientación dicotómica`, name), names_from = value, 
              values_from = n, values_fill = 0, values_fn = sum) %>%
  mutate("efectiva" = `algo efectiva` + `muy efectiva`) %>%
  select(c(`Orientación dicotómica`,`name`,`efectiva`, `no efectiva`)) %>%
  adorn_percentages("row") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns() -> out

merge(out %>% filter(`Orientación dicotómica` == 'Neurogastro') %>% select(name, 
                                                                           `Neurogastro efectiva` = efectiva, 
                                                                           `Neurogastro no efectiva` = `no efectiva`), 
      out %>% filter(`Orientación dicotómica` == 'No neurogastro') %>% select(name, 
                                                                           `No Neurogastro efectiva` = efectiva, 
                                                                           `No Neurogastro no efectiva` = `no efectiva`),
      by = "name")

                           name Neurogastro efectiva Neurogastro no efectiva No Neurogastro efectiva No Neurogastro no efectiva
1            Bismuto en diarrea            66.7% (4)               33.3% (2)              100.0% (3)                   0.0% (0)
2             Carbón en diarrea            75.0% (3)               25.0% (1)                0.0% (0)                 100.0% (2)
3         Loperamida en diarrea            75.0% (3)               25.0% (1)              100.0% (4)                   0.0% (0)
4 Otros antibióticos en diarrea           100.0% (6)                0.0% (0)               33.3% (1)                  66.7% (2)
5        Probióticos en diarrea           100.0% (6)                0.0% (0)               50.0% (2)                  50.0% (2)
6         Rifaximina en diarrea           100.0% (6)                0.0% (0)              100.0% (4)                   0.0% (0)

【讨论】:

  • 感谢@AnilGoyal,我正在寻找它。但是,我需要关于二分法的百分比(Neurogastro 为 100%,No Neurogastro 为 100%),正如我在示例中所写(我需要一个比较两组的表格)。
  • 我尝试了另一种方法,在一个中对 Neurogastro 进行 2 df 过滤,在另一个中对 Neurogastro 进行过滤(即:filter(Orientación dicotómica` == "Neurogastro")`,但后来我无法将它们与列顶部的名称合并。
  • 是的,这就是我尝试过的,但我没有在列顶部看到“Neurogastro”“没有 Neurogastro”。
  • 对shiny不太熟悉,但我现在看一下,看看我是否得到结果并将其发布在这里。谢谢你!
  • 修改后的答案应该有效。请检查一下
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-06-04
  • 2021-12-19
  • 2023-02-07
  • 2017-06-29
  • 2020-12-05
  • 1970-01-01
相关资源
最近更新 更多