【问题标题】:count of frequency of variable in second row第二行变量的频率计数
【发布时间】:2020-10-28 18:40:08
【问题描述】:

我有一个如下所示的数据框,我正在寻找简单的解决方案来计算以数字开头的列名的变量计数。在数据框中添加第二行之后。

df <- data.frame(AA=c(72,62,43,66,54,64,47,47,27,68),
                 BB=c("AMK","KAMl","HAJ","NHS","KUL","GAF","BGA","NHU","VGY","NHU"),
                 CC=c("TAMAN","GHUSI","KELVIN","DEREK","LOKU","MNDHUL","JASMIN","BINNY","BURTAM","DAVID"),
                 DD=c(62,41,37,41,32,74,52,75,59,36),
                 EE=c("CA","NY","GA","DE","MN","LA","GA","VA","TM","BA"),
                 FF=c("ENGLISH","FRENCH","ENGLISH","FRENCH","ENGLISH","ENGLISH","SPANISH","ENGLISH","SPANISH","RUSSIAN"),
                 GG=c(33,44,51,51,37,58,24,67,41,75),
                 `1A`=c("","D","","NA","","D","","","D",""),
                 `2B`=c("","A","","","A","A","A","A","",""),
                 `3C`=c("","","","","","","","","",""),
                 `4D`=c("","G","G","G","G","G","G","G","",""),
                  "Concatenate" = c("","DAG","G","NAG","AG","DAG","AG","AG","D",""))

他的输出应该如下所示。这就像列名称的值计数,这些名称以任意数字和最后一列的总和开头。在数据框中添加第二行之后。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    你可以用

    创建你的行
    summary_row = 
      df %>% 
      summarize(across(c(matches("^[0-9]"), Concatenate), ~sum(!is.na(.) & . != "" & . != "NA")))
    
    summary_row
    #   1A 2B 3C 4D Concatenate
    # 1  3  5  0  7           8
    
    result = bind_rows(mutate(summary_row, across(everything(), as.character)), df)
    # reorder columns
    result[names(df)]
    #    AA   BB     CC DD   EE      FF GG 1A 2B 3C 4D Concatenate
    # 1  NA <NA>   <NA> NA <NA>    <NA> NA  3  5  0  7           8
    # 2  72  AMK  TAMAN 62   CA ENGLISH 33                        
    # 3  62 KAMl  GHUSI 41   NY  FRENCH 44  D  A     G         DAG
    # 4  43  HAJ KELVIN 37   GA ENGLISH 51           G           G
    # 5  66  NHS  DEREK 41   DE  FRENCH 51 NA        G         NAG
    # 6  54  KUL   LOKU 32   MN ENGLISH 37     A     G          AG
    # 7  64  GAF MNDHUL 74   LA ENGLISH 58  D  A     G         DAG
    # 8  47  BGA JASMIN 52   GA SPANISH 24     A     G          AG
    # 9  47  NHU  BINNY 75   VA ENGLISH 67     A     G          AG
    # 10 27  VGY BURTAM 59   TM SPANISH 41  D                    D
    # 11 68  NHU  DAVID 36   BA RUSSIAN 75                        
    
    

    您可以使用bind_rows 将其绑定到数据框的顶部,但仅出于演示目的。数据框列只能有一种类型,因此如果汇总行中的数字与您已有的character 列组合,则会将其转换为字符。


    我使用了这些数据(将 check.names = FALSE 添加到您的 data.frame() 代码中,以便列名显示为您的示例):

    df <- data.frame(AA=c(72,62,43,66,54,64,47,47,27,68),
                     BB=c("AMK","KAMl","HAJ","NHS","KUL","GAF","BGA","NHU","VGY","NHU"),
                     CC=c("TAMAN","GHUSI","KELVIN","DEREK","LOKU","MNDHUL","JASMIN","BINNY","BURTAM","DAVID"),
                     DD=c(62,41,37,41,32,74,52,75,59,36),
                     EE=c("CA","NY","GA","DE","MN","LA","GA","VA","TM","BA"),
                     FF=c("ENGLISH","FRENCH","ENGLISH","FRENCH","ENGLISH","ENGLISH","SPANISH","ENGLISH","SPANISH","RUSSIAN"),
                     GG=c(33,44,51,51,37,58,24,67,41,75),
                     `1A`=c("","D","","NA","","D","","","D",""),
                     `2B`=c("","A","","","A","A","A","A","",""),
                     `3C`=c("","","","","","","","","",""),
                     `4D`=c("","G","G","G","G","G","G","G","",""),
                      "Concatenate" = c("","DAG","G","NAG","AG","DAG","AG","AG","D",""), check.names = F)
    

    【讨论】:

    • 但这不是涵盖所需变量的列总和
    • 而且 concat 列应该在最后
    • 在输出中我得到了这个.. 连接 AA BB CC DD EE FF GG X1A X2B X3C X4D 8 NA NA NA NA NA NA NA NA NA NA NA
    • 输出应该是这样的:AA BB CC DD EE FF GG X1A X2B X3C X4D Concatenate 3 5 7 8
    • 好吧,我照着字面意思照了你的照片,而 “以任何数字开头的列名” 字面意思看起来你的名字实际上是 X1A, X2B, ...。如果要匹配以 X 和数字开头的列名,请将 matches("^[0-9]") 更改为 matches("^X[0-9]")
    【解决方案2】:

    我们可以使用base RcolSums

    nm1 <- grep('^[0-9]', names(df), value = TRUE)
    colSums(!is.na(df[nm1]) & df[nm1] != "" & df[nm1] != "NA")
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-11-18
      • 1970-01-01
      • 2022-01-14
      • 2020-04-24
      • 1970-01-01
      相关资源
      最近更新 更多