如何根据 R 中的列值选择（四）特定行（多次）？答案

【问题标题】：How to select (four) specific rows (multiple times) based on a column value in R?如何根据 R 中的列值选择（四）特定行（多次）？
【发布时间】：2020-05-15 12:14:14
【问题描述】：

我只想选择我数据框中所有年份的 ID，从 2013 年到 2016 年（所以四次）。在这种情况下，只剩下四行的 ID（面板数据，每个 ID 每年都有 1 行）。我已经确保我的数据框仅涵盖我需要的年份（2013 年、2014 年、2015 年和 2016 年），但我想排除数据框中少于 4 年/行的 ID。

这是我的数据框的结构：

 tibble [909,587 x 26] (S3: tbl_df/tbl/data.frame)
     $ ID                         : num [1:909587] 12 12 12 12 16 16 16 16...
     $ Gender                     : num [1:909587] 2 2 2 2 1 1 1 1 1 1 ...
      ..- attr(*, "format.spss")= chr "F10.0"
     $ Year                       : chr [1:909587] "2016" "2013" "2014" "2015" ...
      ..- attr(*, "format.spss")= chr "F9.3"
     $ Size                       : num [1:909587] 1983 1999 1951 1976 902 ...
     $ Costs                      : num [1:909587] 2957.47 0 0.34 1041.67 0 ...
     $ Urbanisation               : num [1:909587] 2 3 3 2 3 3 2 2 2 3 ...
     $ Age                        : num [1:909587] 92 89 90 91 82 83 22 23 24 65 ...

我怎样才能做到这一点？

谢谢！

【问题讨论】：

请阅读如何提供good example。仅提供数据结构并没有多大帮助。考虑使用dput。也就是说，这可能工作：df %>% group_by(ID) %>% filter(n_distinct(Year) >= 4)
谢谢 Jason，我希望结构足够。好消息；你的代码有效！现在我只有4年的身份证。谢谢！
嗨杰森，谢谢你这样做！我刚刚发现代码“错过”了几年/行/ID。根据 lenght(unique(df$ID) * 4 rows.. 知道怎么做吗？

标签： r dataframe dplyr row tidyverse

【解决方案1】：

只是为了从上面的 cmets 字段中捕获@Jasonaizkains 的答案，因为在这种情况下，对于一些播放数据来说，旋转并不是绝对必要的。

library(dplyr)
id <- rep(10:13, 4) # four subjects
year <- rep(2013:2016, each = 4) # four years
gender <- sample(1:2, 16, replace = TRUE)
play <- tibble(id, gender, year) # data.frame of 16

play <- play[-9,] # removes row for id 10 in 2015

# Removes all entries for the right id number
play %>% group_by(id) %>% filter(n_distinct(year) >= 4) %>% ungroup()
#> # A tibble: 12 x 3
#>       id gender  year
#>    <int>  <int> <int>
#>  1    11      1  2013
#>  2    12      2  2013
#>  3    13      2  2013
#>  4    11      1  2014
#>  5    12      2  2014
#>  6    13      1  2014
#>  7    11      2  2015
#>  8    12      2  2015
#>  9    13      2  2015
#> 10    11      2  2016
#> 11    12      2  2016
#> 12    13      1  2016

【讨论】：

嗨查克，谢谢你这样做！我刚刚发现代码“错过”了几年/行/ID。根据 lenght(unique(df$ID) * 4 rows.. 知道怎么做吗？
一个ID有可能超过4个吗？其他简单的可能性，您记得将结果放回 play 或 play2
您可以使用它按降序生成输出，并查看play %>% group_by(id) %>% summarise(id_count = n_distinct(year)) %>% arrange(id_count)（低数字）或> play %>% group_by(id) %>% summarise(id_count = n_distinct(year)) %>% arrange(desc(id_count))（高数字）弹出到顶部或底部的内容
嗨查克，可能不是。我检查了 table 函数，发现在你的第一个代码之后，每年唯一 ID 的数量是不同的。 2013 年有 156.979 个 ID，2014 年 156.976、2015 年 156982 和 2016 年 156.985。我还用你的代码检查了 1、2、3、5 年或更长时间，然后我的桌子都空了 5 年或更长时间。
我指的是数据框，而不是第二个“表”。对不起！

【解决方案2】：

转动你的 df

df %>% pivot_wider(names_from = Year,values_from = Age)

从 2013,2014,2015,2016 列中过滤出 na 的行

转身

df %>% pivot_longer(2013:2016)

【讨论】：

谢谢布鲁诺。上面 Jason 的代码有效，因为我有近 1M 行（和 40 列），我真的不想冒险尝试你的代码......我对 R 很陌生，所以错误发生了很多，对不起！感谢您的时间和帮助:)