【问题标题】:In IF statement, condition on the number of groups AND the number of observations per group在 IF 语句中,以组数和每组观察数为条件
【发布时间】:2019-09-12 12:24:16
【问题描述】:

如何在 IF 语句中设置组数和每组观察数的条件? IE。如果存在 >=4 个组并且有 >=2 个观察值,请执行一些操作。

第一部分不那么棘手,这是我真正苦苦挣扎的第二部分。

library(tidyverse)
data(mtcars)

set.seed(123)

mtcars <- mtcars %>% rownames_to_column("type")
mtcars$brand <- stringr::str_split_fixed(mtcars$type, " ", 2)[,1]
mtcars <- mtcars[mtcars$brand %in% c("Merc","Mazda","Hornet","Toyota"),]

mtcars_ls <- vector("list",5)
for(n in 1:5){ mtcars_ls[[n]] <- mtcars[mtcars$type %in% sample(mtcars$type, size=15, replace=T),]}

for(i in seq_along(mtcars_ls)) {
  if( length(unique(mtcars_ls[[i]]$brand)) >= 4 ) { next } 
  else { mtcars_ls[[i]] <- NULL } 
}

mtcars_ls
[[1]]
                type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
1          Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Mazda
2      Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda
5  Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 Hornet
9           Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2   Merc
10          Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4   Merc
11         Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4   Merc
12        Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3   Merc
14       Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3   Merc
20    Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota
21     Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Toyota

[[2]]
                type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
1          Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Mazda
2      Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda
5  Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 Hornet
8          Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2   Merc
11         Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4   Merc
12        Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3   Merc
13        Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3   Merc
20    Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota
21     Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Toyota

[[3]]
                type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
1          Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Mazda
2      Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda
4     Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Hornet
5  Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 Hornet
8          Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2   Merc
9           Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2   Merc
10          Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4   Merc
12        Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3   Merc
13        Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3   Merc
14       Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3   Merc
20    Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota
21     Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Toyota

[[4]]
                type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
1          Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Mazda
2      Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda
4     Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Hornet
5  Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 Hornet
8          Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2   Merc
9           Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2   Merc
10          Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4   Merc
11         Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4   Merc
13        Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3   Merc
14       Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3   Merc
20    Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota

上面的代码会删除所有少于 4 个汽车品牌的列表。我真正想做的是删除少于 4 个汽车品牌和每个汽车品牌有 2 个观察值的列表。在上面的例子中,这只会留下mtcars_ls[[3]]

【问题讨论】:

  • 使用您使用的 RNG 种子,我没有得到每个品牌至少有 2 次观察的列表成员。
  • 我刚刚重新启动了 RStudio 并粘贴了示例代码,我得到了与以前相同的输出,其中 mtcars_ls[[3]] 每个汽车品牌至少有 2 个观察值。

标签: r for-loop if-statement


【解决方案1】:

如果只想过滤,不需要知道索引,可以使用keep。如果您需要知道索引,请将keep 替换为map_lgl 并在生成的逻辑向量上使用which

library(tidyverse)

mtcars_ls %>% 
  keep(~ {
    count(., brand) %>% 
      {nrow(.) >= 4 & all(.$n >= 2)}
  })
#                 type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
# 1          Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Mazda
# 2      Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda
# 4     Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Hornet
# 5  Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 Hornet
# 8          Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2   Merc
# 9           Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2   Merc
# 10          Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4   Merc
# 12        Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3   Merc
# 13        Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3   Merc
# 14       Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3   Merc
# 20    Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota
# 21     Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Toyota

【讨论】:

    【解决方案2】:

    您可以使用一些purrr 函数,例如map。然后编写一个函数,为每个数据框指定所需的检查,如下所示:

    # Note I am using 1 car per brand just so I get results
    # This is a parameter so it is easily changed based on your constraints
    
    check_list <- function(df, num_brands = 4, cars_per_brand = 1){
      unique_cars <- unique(df$brand)
      min_n <- min(count(df, brand)[["n"]])
    
    # Why any? Because you could return multiple minimum values. 
      if(unique_cars >= num_brands && any(min_n>cars_per_brand)){
        df
      } 
    }
    
    map(mtcars_ls, check_list)
    

    那么输出是:

    [[1]]
    NULL
    
    [[2]]
    NULL
    
    [[3]]
                 type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
    8       Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2   Merc
    10       Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4   Merc
    12     Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3   Merc
    13     Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3   Merc
    14    Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3   Merc
    20 Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota
    21  Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Toyota
    
    [[4]]
                 type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
    1       Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Mazda
    2   Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda
    8       Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2   Merc
    9        Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2   Merc
    11      Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4   Merc
    12     Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3   Merc
    14    Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3   Merc
    20 Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota
    21  Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Toyota
    
    [[5]]
    NULL
    

    然后您可以指定其他详细信息等,这消除了循环的需要。

    然后您也可以使用compact 函数删除 NULL 列表。

    【讨论】:

      【解决方案3】:

      这个呢:

      ff <- function(dtt){
          dtt %>% filter(duplicated(brand)) %>%
              filter(!duplicated(brand)) %>% nrow() >= 4
      }
      
      mtcars_ls[sapply(mtcars_ls, ff)]
      
      # [[1]]
      #                 type  mpg cyl  disp  hp drat    wt  qsec vs am gear carb  brand
      # 1          Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  Mazda
      # 2      Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda
      # 4     Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Hornet
      # 5  Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 Hornet
      # 8          Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2   Merc
      # 9           Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2   Merc
      # 10          Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4   Merc
      # 12        Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3   Merc
      # 13        Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3   Merc
      # 14       Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3   Merc
      # 20    Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota
      # 21     Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 Toyota
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2023-03-21
        • 1970-01-01
        • 2013-04-23
        • 1970-01-01
        • 2014-10-10
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多