【问题标题】:How to summarize and spread data in R如何在 R 中汇总和传播数据
【发布时间】:2022-01-22 19:45:51
【问题描述】:

我想将我的数据总结为只有三列,如下所示: col_1 = name of the country, col_2 = percentage of 0s, col_3 = percentage of 1s,

这是数据:

country = rep(c("USA", "UK", "AUS", "ARM", "BEL", "BRA", "CHN", "EGY", "FIN", "FRA"),
              times = c(10, 5, 15, 10, 10, 10, 5, 15, 10, 10))
score= sample(c(0,1), replace=F)
dat = data.frame(country, score)

非常感谢。

【问题讨论】:

  • aggregate(score ~ country, dat, FUN = \(x) c(zero = 1 - mean(x), one = mean(x)))

标签: r


【解决方案1】:

另一种可能的解决方案,基于tidyverse

library(tidyverse)

country = rep(c("USA", "UK", "AUS", "ARM", "BEL", "BRA", "CHN", "EGY", "FIN", "FRA"),
              times = c(10, 5, 15, 10, 10, 10, 5, 15, 10, 10))
score= sample(c(0,1), replace=F)
dat = data.frame(country, score)

dat %>% 
  group_by(country) %>% 
  summarise(perc0s = 1-sum(score)/n(), perc1s=1-perc0s, .groups = "drop")

#> # A tibble: 10 × 3
#>    country perc0s perc1s
#>    <chr>    <dbl>  <dbl>
#>  1 ARM      0.5    0.5  
#>  2 AUS      0.467  0.533
#>  3 BEL      0.5    0.5  
#>  4 BRA      0.5    0.5  
#>  5 CHN      0.6    0.4  
#>  6 EGY      0.467  0.533
#>  7 FIN      0.5    0.5  
#>  8 FRA      0.5    0.5  
#>  9 UK       0.6    0.4  
#> 10 USA      0.5    0.5

【讨论】:

    【解决方案2】:

    使用 reshape2

    library(reshape2)
    dat2=dcast(dat,country~score,value.var="score")
    dat2[,c("0","1")]=dat2[,c("0","1")]/rowSums(dat2[,c("0","1")])
    
       country         0         1
    1      ARM 0.5000000 0.5000000
    2      AUS 0.5333333 0.4666667
    3      BEL 0.5000000 0.5000000
    4      BRA 0.5000000 0.5000000
    5      CHN 0.4000000 0.6000000
    6      EGY 0.5333333 0.4666667
    7      FIN 0.5000000 0.5000000
    8      FRA 0.5000000 0.5000000
    9       UK 0.4000000 0.6000000
    10     USA 0.5000000 0.5000000
    

    【讨论】:

      猜你喜欢
      • 2017-10-26
      • 2011-12-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-04-23
      相关资源
      最近更新 更多