【问题标题】:Merging the same combination of columns合并相同的列组合
【发布时间】:2018-11-25 05:42:06
【问题描述】:

我有一张如下所示的表格:

在表格的更下方,Target.Country 中的国家/地区在 Source.Country 中重复,因此重复组合但具有不同的数字、总和和平均值。当组合相同时,是否可以将剩余的列相加并添加一个额外的列来找到平均值?

例如:

Source.Country Target.Country number   sum_intensity   mean_intensity
North Korea     South Korea    26492     10674.9         0.402
South Korea     North Korea    34912     53848.3         1.542

成为:

Source.Country Target.Country  number  sum_intensity mean_intensity  Average 
North Korea     South Korea     61404   64523.2         1.944         1.05

任何帮助都会很棒!

【问题讨论】:

  • 记得标记语言!这对我来说就像r
  • 谢谢!是R。
  • 请添加您的数据的真实样本,人们可以将其导入 R。屏幕截图不是很有帮助。
  • library(dplyr); df %>% mutate(grp = purrr::map2_chr(Source.Country, Target.Country, ~paste(sort(c(.x, .y))))) %>% group_by(grp) %>% summarise(number = sum(number), sum_intensity = sum(sum_intensity), mean_intensity = sum(mean_intensity), average = mean(mean_intensity)) ?没有可读取的数据就无法测试。
  • 可能在collapse = ' ' 命令中使用paste

标签: r merge unique multiple-columns


【解决方案1】:

与@Axeman 在 cmets 中提出的类似解决方案:

library(purrr)
library(dplyr)
df=data.frame(Source.Country=c('North Korea', 'South Korea'), 
              Target.Country=c('South Korea', 'North Korea'),
              number=c(26492, 34912),
              sum_intensity=c(10674.9, 53848.3),
              mean_intensity=c(0.402, 1.542))

df %>% mutate(grp = purrr::map2_chr(Source.Country, Target.Country, ~paste(sort(c(as.character(.x), as.character(.y))), collapse=' '))) %>% 
    group_by(grp) %>% 
    summarise(number = sum(number), 
    sum_intensity = sum(sum_intensity), 
    mean_intensity = sum(mean_intensity), 
    average = sum_intensity/number)

# # A tibble: 1 x 5
#   grp                     number sum_intensity mean_intensity average
#   <chr>                    <dbl>         <dbl>          <dbl>   <dbl>
# 1 North Korea South Korea 61404.        64523.           1.94    1.05

一些小的调整:

  • paste 命令中确实需要collapse
  • 需要as.character 防止国家名称被强制转换为整数
  • mean_intensity 不能用作摘要中的输出,然后用作输入,但是当 number 无论如何都不平衡时,平均值的平均值没有多大意义。我刚刚从总和中重新计算了平均值

【讨论】:

    【解决方案2】:

    我增加了数据框以检查代码是否正常工作

    df1<-rbind(c( "North Korea ","South Korea" ,       26492 ,    10674.9 ,        
    0.402), c(   "South Korea",  "North Korea"  ,  34912  ,   53848.3   ,      1.542),
    c( "Canada ","South Korea" ,       26492 ,    10674.9 ,        0.402),
    c(   "South Korea",  "Canada"  ,  34912  ,   53848.3   ,      1.542))
    colnames(df1)<-c("Source.Country",  "Target.Country",  "number",  "sum_intensity",  
    "mean_intensity")
    df1<-data.frame(df1)
    
    df1$number<-as.numeric(as.character(df1$number))
    df1$sum_intensity<-as.numeric(as.character(df1$sum_intensity))
    df1$mean_intensity<-as.numeric(as.character(df1$mean_intensity))
    
    df1$Countries<-apply(cbind(df1$Source.Country, df1$Target.Country), 1, function(x) 
    paste(sort(x), collapse=" "))
    
    #
    library(reshape)
    m1 <- aggregate(df1$number~df1$Countries,data=df1,FUN=mean)
    m2 <- aggregate(df1$sum_intensity~df1$Countries,data=df1,FUN=mean)
    m3 <- aggregate(df1$mean_intensity~df1$Countries,data=df1,FUN=mean)
    
    mvtab <- merge(rename(m1,c(y="number")),
                   rename(m2,c(y="sum_intensity")))
                   
    mtab2<-merge(mvtab, rename(m3,c(y="mean_intensity")))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-08-02
      • 2022-08-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-08-22
      • 1970-01-01
      相关资源
      最近更新 更多