【问题标题】:Sum of unique combination of values in columns in rr 中列中唯一值组合的总和
【发布时间】:2021-04-05 17:36:00
【问题描述】:

我的数据框如下

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
             Dept = c(101, 101, 101, 102, 102, 103), 
              Emp_Id = c(1, 1, 2, 3, 4, 4),
              weights = c(5,5,2,3,4,5))

Webpage Dept Emp_Id weights
111     101      1       5
111     101      1       5
111     101      2       2
111     102      3       3  
222     102      4       4
222     103      4       5

我想就每个网页的权重和权重百分比而言,看到该网页的员工人数是多少。 唯一员工是 Dept 和 Emp_ID 的唯一组合

例如Emp_ID 1,2 和 3 看到了网页 111。因此看到的员工人数是他们权重的总和,即 5+2+3 =10,权重百分比为 0.52(10/19)。 19是唯一员工的权重总和(这是Dept和Emp_ID的唯一组合)

Webpage    Number_people_seen    seen_percentage
111                 10            0.52
222                  9            0.47

我尝试了以下但不知道如何获得权重的总和。

library(dplyr)
df %>% group_by(Webpage) %>% distinct(Dept,Emp_Id)

【问题讨论】:

    标签: r


    【解决方案1】:
    df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                     Dept = c(101, 101, 101, 102, 102, 103), 
                     Emp_Id = c(1, 1, 2, 3, 4, 4),
                     weights = c(5,5,2,3,4,5))
    
    library(tidyverse)
    df %>% 
      group_by(Webpage) %>% 
      distinct(Dept,Emp_Id, .keep_all = T) %>% 
      summarise(Number_people_seen = sum(weights)) %>% 
      mutate(seen_percentage = prop.table(Number_people_seen))
    #> `summarise()` ungrouping output (override with `.groups` argument)
    #> # A tibble: 2 x 3
    #>   Webpage Number_people_seen seen_percentage
    #>     <dbl>              <dbl>           <dbl>
    #> 1     111                 10           0.526
    #> 2     222                  9           0.474
    

    reprex package (v0.3.0) 于 2021-04-05 创建

    【讨论】:

      【解决方案2】:
      df %>% group_by(Webpage, Emp_Id) %>%
        summarise(no_of_ppl_seen = unique(weights)) %>%
        group_by(Webpage) %>%
        summarise(no_of_ppl_seen = sum(no_of_ppl_seen)) %>%
        mutate(seen_percentage = no_of_ppl_seen/sum(no_of_ppl_seen))
      
      # A tibble: 2 x 3
        Webpage no_of_ppl_seen seen_percentage
          <dbl>          <dbl>           <dbl>
      1     111             10           0.526
      2     222              9           0.474
      

      df %>% filter(!duplicated(across(everything()))) %>%
        group_by(Webpage) %>%
        summarise(number_ppl_seen = sum(weights)) %>%
        mutate(seen_perc = number_ppl_seen/sum(number_ppl_seen))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2020-09-29
        • 1970-01-01
        • 1970-01-01
        • 2014-03-24
        • 2019-10-02
        • 1970-01-01
        • 2011-06-09
        相关资源
        最近更新 更多