【问题标题】:r for loop when match all the numbers匹配所有数字时的r for循环
【发布时间】:2018-07-15 14:36:15
【问题描述】:

我有一个数据框,每行有 7 个数字,我想做一个 for 或 while 循环来告诉我一行何时与一行相同。

数据框:

   1st 2nd 3rd 4th 5th 6th 7th
1    5  32  34  38  39  49   8
2   10  20  21  33  40  44  34
3   10  20  26  28  35  48  13
4   14  19  23  36  44  46   7
5    9  24  25  27  36  38  41
6    7  13  14  20  29  32  28
7   11  22  24  28  29  38  20
8    1  11  29  33  36  44  37
9    9  12  25  31  43  44   5
10   1   5   6  31  39  46  44
11   14  19  23  36  44  46   7

想要的输出:

 4   14  19  23  36  44  46   7
11   14  19  23  36  44  46   7

我尝试了代码但错误: lapply(df, function(i) all(df[i,] == df[1:nrow(df),]))

但这是不正确的。请指教,谢谢。

【问题讨论】:

  • 你需要lapply(seq_len(nrow(df)), function(i) lapply(seq_len(nrow(df)), function(j) all(df[i,] == df[j,])))还是使用outer(seq_len(nrow(df)), seq_len(nrow(df)), FUN = Vectorize(function(i, j) all(df[i,] == df[j,])))
  • 试试lapply(seq_len(nrow(df)), function(i) {i1 <- rowSums(df[i,][col(df)] == df)== ncol(df); if(sum(i1) >1) df[i1,]})

标签: r for-loop dataframe lapply


【解决方案1】:

base R 选项将是

unique(Filter(Negate(is.null), lapply(seq_len(nrow(df)), function(i) {
       i1 <- rowSums(df[i,][col(df)] == df)== ncol(df)
       if(sum(i1) >1) df[i1,]}) ))
[1]]
#    1st  2nd  3rd  4th  5th  6th  7th
#4    14   19   23   36   44   46    7
#11   14   19   23   36   44   46    7

如果我们只对重复行感兴趣

df[duplicated(df)|duplicated(df, fromLast = TRUE),]
#    1st  2nd  3rd   4th  5th  6th 7th
#4    14   19   23   36   44   46    7
#11   14   19   23   36   44   46    7

【讨论】:

    【解决方案2】:

    使用dplyr::group_by_all() 的选项非常方便:

    library(dplyr)
    
    df %>% group_by_all() %>%
      filter(n()>1)  # n()>1 will make sure to return only rows having duplicates
    
    # # A tibble: 2 x 7
    # # Groups: X1st, X2nd, X3rd, X4th, X5th, X6th, X7th [1]
    #    X1st  X2nd  X3rd  X4th  X5th  X6th  X7th
    #   <int> <int> <int> <int> <int> <int> <int>
    # 1    14    19    23    36    44    46     7
    # 2    14    19    23    36    44    46     7
    

    数据:

    df <- read.table(text = 
    "1st 2nd 3rd 4th 5th 6th 7th
    1    5  32  34  38  39  49   8
    2   10  20  21  33  40  44  34
    3   10  20  26  28  35  48  13
    4   14  19  23  36  44  46   7
    5    9  24  25  27  36  38  41
    6    7  13  14  20  29  32  28
    7   11  22  24  28  29  38  20
    8    1  11  29  33  36  44  37
    9    9  12  25  31  43  44   5
    10   1   5   6  31  39  46  44
    11   14  19  23  36  44  46   7",
    header = TRUE)
    

    【讨论】:

      猜你喜欢
      • 2013-06-25
      • 2011-04-28
      • 1970-01-01
      • 2011-08-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-24
      相关资源
      最近更新 更多