【问题标题】:R - Summarize row values, return result as a rowR - 汇总行值,将结果作为一行返回
【发布时间】:2019-04-04 05:12:21
【问题描述】:

我的数据如下所示:

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   19 obs. of  7 variables:
 $ Week Ending  : chr  "5/1/18" "5/1/18" "5/1/18" "5/1/18" ...
 $ Agent        : chr  "telbenja ." "Tomsaint ." "davidlor ." "moniquec 
." ...
 $ Inbound      : int  25 62 44 36 1 22 144 36 28 51 ...
 $ Manual       : int  0 3 4 22 0 0 13 6 2 1 ...
 $ Avg Talk Time: 'hms' num  00:03:29 00:03:20 00:03:51 00:02:37 ...
  ..- attr(*, "units")= chr "secs"
 $ Avg Wrap Time: 'hms' num  00:01:57 00:01:13 00:01:31 00:01:24 ...
  ..- attr(*, "units")= chr "secs"
 $ Avg Hold Time: 'hms' num  00:00:11 00:00:02 00:00:02 00:00:00

这只是一个示例,我有大约 100,000 行。

我最终需要的是拥有一个名为“Average”的“Agent”,其所有其他列中的值只是同一“Week Ending”(日期)内所有其他行的平均值。

我相信对此的解决方案是某种 group_by 和 summarise dplyr 巫术,但是我似乎无法使这项工作用于返回行值,group by 和 summarise 会给我一个全新的列,但这不是什么我想要,我需要为每个日期(“周末”)创建一个新的行条目,其中包含同一日期每一列中值的平均值。

非常感谢您对此提供的任何帮助(完全预料到我的措辞/问题会令人震惊和恐惧,如果您无法入睡,请投反对票)。

dput(head(my_data)) 的结果:

dput(head(response_codes))
structure(list(`Response Code` = structure(c(105L, 72L, 79L, 
159L, 104L, 17L), .Label = c("304001", "312001", "799007", "843001", 
"951001", "1490001", "1490002", "1524002", "1524003", "1620001", 
"1696001", "2297001", "2299001", "2302001", "2305001", "2312001", 
"2314001", "2315001", "2316001", "2317001", "2327001", "2328001", 
"2329001", "2330001", "2333001", "2374001", "2380002", "2415001", 
"2420001", "2428001", "2428004", "2428005", "2428006", "2434001", 
"2435002", "2444002", "2449002", "2457002", "2457003", "2462001", 
"2463001", "2463002", "2478001", "2586010", "2673002", "2677001", 
"2678002", "2682001", "2683002", "2835005", "2938001", "2950001", 
"2974001", "3006001", "3006002", "3007001", "3046001", "3077003", 
"3091001", "3093001", "3093010", "3094003", "3115001", "3115006", 
"3115010", "3116001", "3116003", "3117001", "3117002", "3148001", 
"3214001", "3239001", "3244001", "3245001", "3245002", "3245003", 
"3262001", "3262002", "3273001", "3276001", "3276002", "3276003", 
"3276005", "3276006", "3276012", "3276013", "3276017", "3276019", 
"3276020", "3276021", "3276023", "3276030", "3276036", "3276037", 
"3276038", "3276039", "3276043", "3276044", "3276045", "3276048", 
"3276050", "3289001", "3330001", "3334001", "3334002", "3347001", 
"3348001", "3361001", "3382001", "3383001", "3393001", "3394001", 
"3394002", "3399001", "3403005", "3486003", "3488003", "3491001", 
"3558001", "3584001", "3585002", "3586001", "3588001", "3591001", 
"3677002", "3677003", "3678001", "3678002", "3691003", "3691004", 
"3691005", "3691006", "3691009", "3691010", "3691014", "3692001", 
"3693002", "3694002", "3695002", "3741001", "3743001", "3753001", 
"3753002", "3755001", "3762001", "3765001", "3766001", "3767001", 
"3767002", "3768001", "3769001", "3771001", "3772001", "3792001", 
"3795001", "3797001", "3799001", "3800001", "3810001", "7014001", 
"7371007", "7445001", "9007001", "9009001"), class = "factor"), 
    `Total Recruits` = c(518L, 467L, 345L, 335L, 333L, 224L), 
    `Number of 2nd Purchase (Converts)` = c(217L, 248L, 181L, 
    106L, 218L, 150L), `Total Cms that took a wp on or after their recruitment case` = c(187L, 
    169L, 142L, 104L, 361L, 233L), `Currently Closed Wine Plans` = c(135L, 
    130L, 108L, 79L, 295L, 188L), `Currently Active Wine Plans` = c(52L, 
    39L, 34L, 25L, 66L, 45L), `Upgrade to WP %` = c(36.1, 36.19, 
    41.16, 31.04, 108.41, 104.02), `2nd Purchase Conversion Rate` = c(41.89, 
    53.1, 52.46, 31.64, 65.47, 66.96), `Number of Conti Cases Purchased` = c(232L, 
    208L, 171L, 108L, 449L, 353L), `Number of Distinct WP Customers` = c(94L, 
    101L, 84L, 51L, 193L, 141L)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

【问题讨论】:

  • 如果您可以分享一些实际的示例数据,而不仅仅是其结构,例如通过在您的问题中包含dput(head(your_table)) 的输出。甚至更好的是预期输出的示例以及您尝试过的任何代码。
  • 听起来你想用每列的每周平均值添加行。我假设像 your_data %>% group_by('Week Ending') %>% summarize_if(is.numeric, mean) 这样的东西来获取你的摘要行,然后是像 bind_rows(your_data, summary_table) 这样的东西。没有示例数据,无法更具体地回答。
  • 感谢您的回复!这正是我所追求的,尽管我想在我的原始数据框中为每个每周摘要返回一个行值 - 代理名称为“平均”,所有其他变量只是总结为该周数据的平均值。对于每个星期。再次感谢您对此的帮助!
  • 此数据中是否存在“周末”变量?我不确定你打算如何分组。

标签: r dplyr data.table


【解决方案1】:
library(dplyr)

# 1. Adding fake week and Agent
response_codes <- response_codes %>%
  mutate(fake_week = rep(1:3, each = 2),
         Agent = letters[1:6])

# 2. Make summary by week
summarized <- response_codes %>%
  group_by(fake_week) %>%
  summarise_if(is.numeric, mean) %>%
  mutate(Agent = "Average")

# 3. Combine
combo <- bind_rows(response_codes, summarized)

输出

# Just showing columns 1-3 and 10-12:
> combo[, c(1:3,10:12)]
# A tibble: 9 x 6
  `Response Code` `Total Recruits` `Number of 2nd Purchase (Converts)` `Number of Distinct WP Customers` fake_week Agent  
  <fct>                      <dbl>                               <dbl>                             <dbl>     <int> <chr>  
1 3334002                     518                                 217                               94           1 a      
2 3239001                     467                                 248                              101           1 b      
3 3273001                     345                                 181                               84           2 c      
4 3810001                     335                                 106                               51           2 d      
5 3334001                     333                                 218                              193           3 e      
6 2314001                     224                                 150                              141           3 f      
7 NA                          492.                                232.                              97.5         1 Average
8 NA                          340                                 144.                              67.5         2 Average
9 NA                          278.                                184                              167           3 Average

【讨论】:

  • 不错,谢谢!完全没想过要总结为一个单独的数据框,然后将它们绑定在一起 - 很好的解决方案!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-09-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-05-09
相关资源
最近更新 更多