【问题标题】:Count specific categories in multiple columns and multiplicate them with a sum row计算多列中的特定类别并将它们与总和行相乘
【发布时间】:2021-06-02 14:56:48
【问题描述】:

我想计算每个类别在我的数据框中出现的频率。

为此,我需要计算每一行中的类别并将此数字乘以第 5 列的总和。

(我的分析不需要 c4 列)

首选输出是:

分析 = 131

广告 = 253

标识= ..

我的数据如下所示:

tracker_category <- data.frame = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"), 
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"), 
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"), 
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76))

【问题讨论】:

  • table(unlist(tracker_Category[2:3]))?
  • 嗨,在这种情况下,它只计算名称一次。我需要将它与相关的求和形式 sum_trackers 相乘。计算分析为:第一行 1*1 +第二行 2*20/ +第三行 0*100 等等。
  • 你到底在做什么?什么是 1*1、2*20、0*100?你从哪里得到 1,2,0?
  • @Paul 您能否检查一下我提出的答案以及您的 Google Analytics(分析)总数是否正确?如果是这样,我还没有理解你的算法。但是使用{tidyr}pivot_longer(),您应该得到一个表格,允许您创建您所追求的每行的总值,然后您可以使用group_by()summarise() 再次逐行添加这些值。

标签: r count sum


【解决方案1】:

以下应该产生你所追求的。
您可以将数据框转换为“长”格式,然后添加出现次数(您的第 5 列)。

数据 注意:为了支持可重复性,我更正了您的数据框定义。

tracker_category <- data.frame(
id = c("Tracker1", "Tracker2", "Tracker3", "Tracker4","Tracker5","Tracker6"), 
c1 = c("Analytics", "Crash", "Location", "Identification", "Analytics", "Ads"), 
c2 = c("Ads", "Analytics", "Location", "Analytics", "Identification", "Ads"), 
c3 = c("Identification", "Analytics", "Ads", "Ads", "Analytics", "Location"),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
sum_tracker = c(1,20,100,0,5,76)
)

强制转换为长格式 {tidyr} 为此提供了一个pivot_longer() 函数。

library(dplyr)
library(tidyr)

tracker_category %>% 
  select(-c4) %>%       # remove c4
  pivot_longer( cols = c(c1:c3)           # which cols to use
              , names_to = "action"       # where to store the names
              , values_to = "categories") # and values

这会产生:

# A tibble: 18 x 4
   id       sum_tracker action categories    
   <chr>          <dbl> <chr>  <chr>         
 1 Tracker1           1 c1     Analytics     
 2 Tracker1           1 c2     Ads           
 3 Tracker1           1 c3     Identification
 4 Tracker2          20 c1     Crash         
 5 Tracker2          20 c2     Analytics     
 6 Tracker2          20 c3     Analytics     
 7 Tracker3         100 c1     Location      
 8 Tracker3         100 c2     Location      
 9 Tracker3         100 c3     Ads           
10 Tracker4           0 c1     Identification
11 Tracker4           0 c2     Analytics     
12 Tracker4           0 c3     Ads           
13 Tracker5           5 c1     Analytics     
14 Tracker5           5 c2     Identification
15 Tracker5           5 c3     Analytics     
16 Tracker6          76 c1     Ads           
17 Tracker6          76 c2     Ads           
18 Tracker6          76 c3     Location

通过该格式,您可以使用{dplyr} 对您的组执行summarise()

tracker_category %>% 
   select(-c4) %>% 
   pivot_longer(cols = c(c1:c3), names_to = "action", values_to = "categories") %>% 
#------------- group by your categories
   group_by(categories) %>% 
#------------- and sum over your tracked results, note to use sum and not multiplication as we used a long format
   summarise(total = sum(sum_tracker))

这会产生:

# A tibble: 5 x 2
  categories     total
  <chr>          <dbl>
1 Ads              253
2 Analytics         51
3 Crash             20
4 Identification     6
5 Location         276

请检查您的 131 分析示例是否真的正确...

【讨论】:

    猜你喜欢
    • 2021-06-27
    • 1970-01-01
    • 2012-10-19
    • 2022-06-29
    • 2021-05-05
    • 1970-01-01
    • 1970-01-01
    • 2021-03-26
    • 1970-01-01
    相关资源
    最近更新 更多