【问题标题】:Conditional Count Aggregation in R data framesR数据帧中的条件计数聚合
【发布时间】:2017-05-01 01:17:15
【问题描述】:

我有一个如下所示的数据框:

SubjectID CoupleID PrePost hit1RT  hit2RT hit3RT ... hit26RT  miss1RT  miss2RT miss3RT ... miss26RT
1531      153    Post       5        5      NA   ...   3        NA      NA     2      ...     NA
1531      153     Pre       NA        5      2   ...   3        4      NA     NA     ...     NA  
1532      153    Post       2        NA      NA   ...   2        NA      5     2      ...     NA    

hit[i]RT 和 miss[i]RT 各有 26 列,每个 SubjectID 有两行,一列 PrePost == 'Pre',一列 PrePost == 'Post'

我想创建一个新的数据框,每个 SubjectID/PrePost 具有相同的两行,该行中的 hit[i]RT 单元格的总数不是 NA 的列和该行中非 NA 的 miss[i]RT 单元的总数。

由于有 26 次试验,每次试验都是命中或未命中,所以命中计数列 + 未命中计数列应该 == 26

例如:

Subject ID    PrePost   hitcount    misscount
1531           Pre        3          23
1531           Post       5          21
1532           Pre        10         16
1531           Post       21         5

已编辑:每条评论,添加输出

structure(list(SubjectID = c("1531", "1531", "1532", "1532", "5291", "5291"), CoupleID = c("153", "153", "153", "153", "529", "529"), PrePost = c("Post", "Pre", "Post", "Pre", "Post", "Pre" ), hit10RT = c(11.0550000000076, 11.0209999999934, 11.0889999999927, 11.0270000000019, 11.0499999999956, 11.0610000000015), hit11RT = c(15.5299999999988, 15.6460000000079, 15.5979999999981, 15.5310000000027, 15.8790000000008, 15.5410000000047), hit12RT = c(15.5329999999958, 15.5209999999934, 15.5350000000035, 15.5160000000033, 15.5840000000026, 15.5469999999987 ), hit13RT = c(8.03299999999581, 8.03600000000733, 8.03299999999581, 8.0509999999922, 8.05399999999645, 8.03899999999703), hit14RT = c(15.601999999999, 15.5269999999873, 15.625, 15.6340000000055, 15.5889999999999, 15.5449999999983), hit15RT = c(15.5280000000057, 15.5350000000035, 15.5280000000057, 16.0089999999909, 15.5450000000055, 15.6209999999992 ), hit16RT = c(11.0849999999919, 11.0200000000041, 11.0329999999958, 11.0370000000112, 11.0459999999948, 11.0440000000017), hit17RT = c(14.0370000000112, 14.0610000000015, 14.0890000000072, 14.1059999999998, 14.1180000000022, 14.0440000000017), hit18RT = c(6.51999999998952, 6.53800000000047, NA, 6.58799999998882, 6.5679999999993, 6.57600000000093), hit19RT = c(9.52200000001176, 9.54299999999057, 9.64699999999721, 9.50700000001234, 9.64899999999761, 9.62799999999697), hit1RT = c(NA, NA, NA, NA, NA, 0), hit20RT = c(15.5369999999966, 15.5210000000079, 15.525999999998, 15.5639999999985, 15.6130000000048, 15.6170000000056), hit21RT = c(14.0570000000007, 14.0439999999944, 14.0380000000005, 14.0219999999972, 14.0219999999972, 14.0479999999952 ), hit22RT = c(15.5829999999987, 15.5290000000095, 15.5219999999972, 15.5840000000026, 15.5970000000016, 15.5480000000025), hit23RT = c(12.6189999999915, 12.5779999999941, 12.5200000000041, 12.5369999999966, 12.5329999999958, 12.5319999999992), hit24RT = c(6.52100000000792, 6.52700000000186, 6.53800000000047, 6.55899999999383, 6.54100000000471, 6.53800000000047 ), hit25RT = c(14.0580000000045, 14.0979999999981, 14.0359999999928, 14.1100000000006, 14.0999999999985, 14.1460000000006), hit26RT = c(15.525999999998, 15.5540000000037, 15.570000000007, 15.5890000000072, 15.5610000000015, 15.6259999999966), hit2RT = c(36.781999999992, 96.6390000000101, 35.6609999999928, 108.394, 54.0280000000057, NA), hit3RT = c(14.0270000000019, 14.0539999999892, 14.0369999999966, 14.0130000000063, 14.0360000000001, 14.0639999999985), hit4RT = c(15.5850000000064, 15.5080000000016, 15.5610000000015, 15.6109999999899, 15.5859999999957, 15.5490000000063 ), hit5RT = c(6.50699999999779, 6.53699999999662, 6.57000000000698, 6.52200000001176, 6.55800000000454, 6.64699999999721), hit6RT = c(15.5650000000023, 15.6280000000115, 15.5849999999919, 15.531999999992, 15.5349999999962, 15.6630000000005), hit7RT = c(12.5760000000009, 12.5190000000002, 12.5350000000035, 12.5200000000041, 12.5390000000043, NA), hit8RT = c(6.62699999999313, 6.5049999999901, 6.50599999999395, 6.50599999999395, 6.55099999999948, 6.65199999999459), hit9RT = c(8.00400000000081, 8.03600000000733, 8.03300000001036, 8.03800000000047, 8.12299999999959, NA), miss10RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss11RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss12RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss13RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss14RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss15RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss16RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss17RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss18RT = c(NA, NA, 6.60599999999977, NA, NA, NA), miss19RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss1RT = c(0, 0, 0, 0, 0, NA), miss20RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss21RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss22RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss23RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss24RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss25RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss26RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss2RT = c(NA, NA, NA, NA, NA, 104.578000000001), miss3RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss4RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss5RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss6RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss7RT = c(NA, NA, NA, NA, NA, 12.6160000000018), miss8RT = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), miss9RT = c(NA, NA, NA, NA, NA, 8.03399999999965)), .Names = c("SubjectID", "CoupleID", "PrePost", "hit10RT", "hit11RT", "hit12RT", "hit13RT", "hit14RT", "hit15RT", "hit16RT", "hit17RT", "hit18RT", "hit19RT", "hit1RT", "hit20RT", "hit21RT", "hit22RT", "hit23RT", "hit24RT", "hit25RT", "hit26RT", "hit2RT", "hit3RT", "hit4RT", "hit5RT", "hit6RT", "hit7RT", "hit8RT", "hit9RT", "miss10RT", "miss11RT", "miss12RT", "miss13RT", "miss14RT", "miss15RT", "miss16RT", "miss17RT", "miss18RT", "miss19RT", "miss1RT", "miss20RT", "miss21RT", "miss22RT", "miss23RT", "miss24RT", "miss25RT", "miss26RT", "miss2RT", "miss3RT", "miss4RT", "miss5RT", "miss6RT", "miss7RT", "miss8RT", "miss9RT"), sorted = c("SubjectID", "CoupleID", "PrePost"), class = c("data.table", "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x101820b78>)

【问题讨论】:

  • 如果您提供完整的示例数据集,将更容易提供帮助。尝试粘贴dput(head(mydata)) 的输出,其中mydata 是您的数据框。
  • 评论太长了,所以我编辑了帖子以包含它。
  • @HeatherCohen 请尝试复制粘贴一遍,好像少了一些括号什么的。
  • dput 输出包含非整数值例如 12.576 与原始示例不同?

标签: r dataframe aggregate


【解决方案1】:

您可以尝试以下方法:

df.new<- data.frame(SubjectID = df$SubjectID, PrePost = df$PrePost, 
                    hitcount = apply(df[, 4:29], 1, function(x) sum(!is.na(x))), 
                    misscount = apply(df[, 30:ncol(df)], 1, function(x) sum(!is.na(x))))

您还可以使其更通用,以防您通过这样做添加更多“命中”或“未命中”列:

df.new1<- data.frame(SubjectID = df$SubjectID, PrePost = df$PrePost, 
                     hitcount = apply(df[, names(df)[startsWith(names(df), "hit")]], 1, function(x) sum(!is.na(x))), 
                     misscount = apply(df[, names(df)[startsWith(names(df), "miss")]], 1, function(x) sum(!is.na(x))))

【讨论】:

  • 尝试第一个选项返回dim(X) must have a positive length,第二个选项非常接近,但莫名其妙地有一个 SubjectID 的命中未命中计数缺少来自 hit26RT 的计数。
  • 好的,我在第一个选项中打错字了(39 代替了 29,40 代替了 30),但我修正了它,这两个选项应该给出相同的结果。我不确定是否错过了 hit26RT 的计数。我得到的输出是:SubjectID PrePost hitcount misscount 1 1531 Post 25 1 2 1531 Pre 25 1 3 1532 Post 24 2 4 1532 Pre 25 1 5 5291 Post 25 1 6 5291 Pre 23 3
  • 看看rowSums。一般避免在数据帧上使用apply
猜你喜欢
  • 2021-08-18
  • 2015-03-16
  • 1970-01-01
  • 2019-01-29
  • 1970-01-01
  • 2018-03-04
  • 2021-12-24
  • 1970-01-01
  • 2022-07-06
相关资源
最近更新 更多