【问题标题】:How to turn observations to variable and summarise each value per variable condition如何将观察结果转化为变量并根据变量条件总结每个值
【发布时间】:2021-08-02 21:59:29
【问题描述】:

我有一个包含不同列/观察的数据集

见下面的数据集

merchant Status Face value
a processing 10
b processing 5
c success 40
d Transaction declined 30
e success 32
f pending 21
g Transaction declined 23
h Success 45
i Transaction declined 66
j success 76
k pending 87
l processing 89

我想要一种情况,我可以将状态列转换为变量并汇总每个商家的面值

是否可以使用 janitor 包来汇总值而不是计数?

就像在下面的这种情况下,我使用 janitor 包来单独汇总计数

这是我使用管理员包汇总事务计数的代码行

report_21st%>%
  tabyl(Merchant , Status)%>%
  adorn_totals("row")%>%
  adorn_percentages("row")%>%
  adorn_pct_formatting()%>%
  adorn_ns("front")

这是下面代码的结果

merchant pending processing success Success Transaction declined
a 0 (0.0%) 1(100.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
b 0(0.0%) 1(100.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
c 0 (0.0%) 0(0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%)
d 0 (0.0%) 0(0.0%) 0 (0.0%) 0 (0.0%) 1 (100.0%

所以这次我想为 Value 做同样的事情,不算数 您可以建议我任何可以处理此问题的 r 包,或者我是否也可以使用看门人来处理价值,如果它是 tidyverse 或 dplyr 包,请提供示例代码。 请再提供一行代码或代码示例将不胜感激

再加上我是 r 中的一个大人物

非常感谢

【问题讨论】:

    标签: r data-science tidyverse janitor


    【解决方案1】:

    您可以使用pivot_wider 代替tabyl,然后使用janitor 代码。

    library(tidyr)
    library(janitor)
    
    report_21st %>%
      pivot_wider(names_from = Status, values_from = Face.value, values_fill = 0) %>%
      adorn_totals("row")%>%
      adorn_percentages("row")%>%
      adorn_pct_formatting()%>%
      adorn_ns("front")
    
    # merchant   processing      success Transaction declined      pending     Success
    #        a  10 (100.0%)   0   (0.0%)           0   (0.0%)   0   (0.0%)  0   (0.0%)
    #        b   5 (100.0%)   0   (0.0%)           0   (0.0%)   0   (0.0%)  0   (0.0%)
    #        c   0   (0.0%)  40 (100.0%)           0   (0.0%)   0   (0.0%)  0   (0.0%)
    #        d   0   (0.0%)   0   (0.0%)          30 (100.0%)   0   (0.0%)  0   (0.0%)
    #        e   0   (0.0%)  32 (100.0%)           0   (0.0%)   0   (0.0%)  0   (0.0%)
    #        f   0   (0.0%)   0   (0.0%)           0   (0.0%)  21 (100.0%)  0   (0.0%)
    #        g   0   (0.0%)   0   (0.0%)          23 (100.0%)   0   (0.0%)  0   (0.0%)
    #        h   0   (0.0%)   0   (0.0%)           0   (0.0%)   0   (0.0%) 45 (100.0%)
    #        i   0   (0.0%)   0   (0.0%)          66 (100.0%)   0   (0.0%)  0   (0.0%)
    #        j   0   (0.0%)  76 (100.0%)           0   (0.0%)   0   (0.0%)  0   (0.0%)
    #        k   0   (0.0%)   0   (0.0%)           0   (0.0%)  87 (100.0%)  0   (0.0%)
    #        l  89 (100.0%)   0   (0.0%)           0   (0.0%)   0   (0.0%)  0   (0.0%)
    #    Total 104  (19.8%) 148  (28.2%)         119  (22.7%) 108  (20.6%) 45   (8.6%)
    

    数据

    report_21st <- structure(list(merchant = c("a", "b", "c", "d", "e", "f", "g", 
    "h", "i", "j", "k", "l"), Status = c("processing", "processing", 
    "success", "Transaction declined", "success", "pending", "Transaction declined", 
    "Success", "Transaction declined", "success", "pending", "processing"
    ), Face.value = c(10L, 5L, 40L, 30L, 32L, 21L, 23L, 45L, 66L, 
    76L, 87L, 89L)), row.names = c(NA, -12L), class = "data.frame")
    

    【讨论】:

    • 谢谢,Ronak Shah 我尝试使用另一个数据集运行代码,但出现错误错误:无法将 转换为 。运行rlang::last_error() 以查看错误发生的位置。另外:警告信息:值不是唯一标识的;输出将包含列表列。 * 使用values_fn = list 抑制此警告。 * 使用values_fn = length 确定重复出现的位置 * 使用values_fn = {summary_fun} 汇总重复
    猜你喜欢
    • 2022-11-02
    • 2023-03-08
    • 1970-01-01
    • 2014-08-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-05-16
    • 2021-07-30
    相关资源
    最近更新 更多