【问题标题】:R -> Sum part of Columns + agreggating observations [duplicate]R - >列的总和部分+聚合观察[重复]
【发布时间】:2022-01-25 06:03:58
【问题描述】:

我对编码很陌生,刚开始做一些 R 图形,现在我对我的数据分析有点迷失了,需要一些启发!我正在训练一些分析,我得到了一个非常长的数据集,其中包含 19 个国家 x 12 个月 x 22 种产品,并且每个月都有利润。有点像这样:

Country   Month   Product Profit
Brazil    Jan     A      50
Brazil    fev     A      80
Brazil    mar     A      15
Austria   Jan     A      35
Austria   fev     A      80
Austria   mar     A      47
France    Jan     A      21
France    fev     A      66
France    mar     A      15
[...]
France    Dez     C      40 etc...

我正在考虑制作一个图表来显示全年的利润,并为每个国家/地区制作另一个图表,这样我就可以看到排名靠前和排名靠后的 2 个国家/地区。我想要类似的东西:

All Countries   Jan   106        or     Brazil   2021   145
All Countries   Fev   146               Austria  2021   162
All Countries   Mar   77                France   2021   112

但是 sum 函数对字符类型没有帮助,而且我有一个很长的列表,我不知道如何只对列的一部分求和。

如果让您感到困惑,请见谅。

【问题讨论】:

  • 嗨 Ikasquilici,回复是否回答了您的问题?

标签: r sum filtering


【解决方案1】:

dplyr 包有一个非常自然的语法:

require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(
  Country = rep(c(rep("Brazil", 3L), rep("Austria", 3L), rep("France", 3L)), 3L),
  Profit = rep(c(50, 80, 15, 35, 80, 47, 21, 66, 15), 3L),
  Month = rep(c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep"), 3L),
  Year = sort(rep(c(2019, 2020, 2021), 9L))
)
df %>%
  group_by(Country, Month) %>%
  summarize(sum = sum(Profit))
#> `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
#> # A tibble: 9 × 3
#> # Groups:   Country [3]
#>   Country Month   sum
#>   <chr>   <chr> <dbl>
#> 1 Austria Apr     105
#> 2 Austria Jun     141
#> 3 Austria May     240
#> 4 Brazil  Feb     240
#> 5 Brazil  Jan     150
#> 6 Brazil  Mar      45
#> 7 France  Aug     198
#> 8 France  Jul      63
#> 9 France  Sep      45

【讨论】:

    【解决方案2】:

    使用 base R,您可以尝试这些方法。

    # sum of profit per month
    out1 <- tapply(df$Profit, df$Month, sum)
    
    # sum of profit per year per country
    out2 <- data.frame(
      profit = sapply(split(df, f = ~ df$Country + df$Year), function(x) sum(x$Profit))
    )
    out2$Country <- gsub('\\.[0-9]*', '', rownames(out2))
    out2$Year <- gsub('[a-zA-z]*\\.', '', rownames(out2))
    rownames(out2) <- NULL
    

    输出

    > out1
    Apr Aug Feb Jan Jul Jun Mar May Sep 
    105 198 240 150  63 141  45 240  45 
    
    > head(out2)
      profit Country Year
    1    162 Austria 2019
    2    145  Brazil 2019
    3    102  France 2019
    4    162 Austria 2020
    5    145  Brazil 2020
    6    102  France 2020
    

    数据

    # sample data
    df <- data.frame(
      Country = rep(c(rep('Brazil',3L),rep('Austria',3L),rep('France',3L)), 3L),
      Profit = rep(c(50,80,15,35,80,47,21,66,15), 3L),
      Month = rep(c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep'),3L),
      Year = sort(rep(c(2019,2020,2021), 9L))
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-02-10
      • 1970-01-01
      • 2017-01-21
      • 2020-12-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多