【问题标题】:Select rows by the specific value按特定值选择行
【发布时间】:2013-11-07 14:31:03
【问题描述】:

如何从数据框中选择特定列中值大于 1 的行?

这就是我的数据的样子:

> dput(head(tbl_comp[,-1]))
structure(list(Meve_mean = c(7774.44229552491, 43374.1166119026, 
585562.72426545, 3866.54724117546, 320338.197537275, 918368.01990607
), Mmor_mean = c(39113.5325249635, 119476.157216344, 1296530.34384725, 
23511.2980313616, 209092.538981888, 577355.581852083), Mtot_mean = c(23443.9874102442, 
81425.1369141232, 941046.53405635, 13688.9226362685, 264715.368259581, 
747861.800879077), tot_meanMe = c(258492586.999527, NA, NA, NA, 
NA, NA), tot_meanMm = c(246665241.110832, NA, NA, NA, NA, NA), 
    tot_sdMe = c(35569170.0311164, NA, NA, NA, NA, NA), tot_sdMm = c(30522099.9189256, 
    NA, NA, NA, NA, NA), Wteve_mean = c(10752.4381084666, 53658.8435672746, 
    715547.921685567, 3422.17220367207, 335384.199178456, 1013708.18845339
    ), Wtmor_mean = c(29254.6414790837, 98804.8007431987, 1001344.20496027, 
    11541.8862121394, 217110.411645861, 571826.157099177), Wttot_mean = c(18681.9538387311, 
    73007.110928385, 838032.04308901, 6902.04963587237, 284695.433093058, 
    824330.175015869), tot_meanwte = c(278901499.672313, NA, 
    NA, NA, NA, NA), tot_meanwtm = c(235415566.775308, NA, NA, 
    NA, NA, NA), tot_sdwte = c(16743477.4011497, NA, NA, NA, 
    NA, NA), tot_sdwtm = c(3922418.43271348, NA, NA, NA, NA, 
    NA), diff_eve = c(0.72303994843767, 0.808331185101342, 0.818341730189196, 
    1.12985174650959, 0.955138012828161, 0.905949098928778), 
    diff_mor = c(1.33700262752933, 1.20921408998001, 1.29478988086689, 
    2.03704122525771, 0.963070068343606, 1.00966976533735), diff_tot = c(1.25490018938172, 
    1.11530419268331, 1.12292428650774, 1.98331269093204, 0.929819510568173, 
    0.907235745512628)), .Names = c("Meve_mean", "Mmor_mean", 
"Mtot_mean", "tot_meanMe", "tot_meanMm", "tot_sdMe", "tot_sdMm", 
"Wteve_mean", "Wtmor_mean", "Wttot_mean", "tot_meanwte", "tot_meanwtm", 
"tot_sdwte", "tot_sdwtm", "diff_eve", "diff_mor", "diff_tot"), row.names = c(NA, 
6L), class = "data.frame")

在此数据中,我最感兴趣的三列:

diff_eve  diff_mor  diff_tot
0.7230399 1.3370026 1.2549002
0.8083312 1.2092141 1.1153042
0.8183417 1.2947899 1.1229243
1.1298517 2.0370412 1.9833127
0.9551380 0.9630701 0.9298195
0.9059491 1.0096698 0.9072357

我想创建由每一列中的值选择的新数据框。应该有 6 个新的数据帧。 “diff_eve”列中低于 1 的值应该在新数据帧中,高于 1 = 下一个数据帧的值也是如此。当然,我想保留数据中的所有列(tbl_comp)。

让我展示新数据框的示例。通过 diff_eve 列中低于 1 的值进行选择:

>newdata.frame
diff_eve  diff_mor  diff_tot  .. ....... .....  rest of the columns from tbl_comp.
0.7230399 1.3370026 1.2549002
0.8083312 1.2092141 1.1153042 
0.8183417 1.2947899 1.1229243
0.9551380 0.9630701 0.9298195
0.9059491 1.0096698 0.9072357

我希望你们中的一些人理解我想要实现的目标。

【问题讨论】:

  • 您的六个数据集是diff_evediff_mordiff_tot 列的各自值过滤为小于一和大于一吗?
  • 抱歉回复晚了,但我不得不去实验室。

标签: r dataframe


【解决方案1】:

这是一个解决方案,它按以下顺序为您提供六个数据帧:

diff_eve <1
diff_mor <1
diff_tot <1
diff_eve >1
diff_mor >1
diff_tot >1

代码:

 cols <- c("diff_eve", "diff_mor", "diff_tot")
 c(lapply(cols, function(x)subset(tbl_comp, eval(parse(text=x))<1)), lapply(cols, function(x)subset(tbl_comp, eval(parse(text=x))>1)))

给你

[[1]]
   Meve_mean  Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve  diff_mor  diff_tot
1   7774.442   39113.53  23443.99  258492587  246665241 35569170 30522100   10752.44   29254.64   18681.95   278901500   235415567  16743477   3922418 0.7230399 1.3370026 1.2549002
2  43374.117  119476.16  81425.14         NA         NA       NA       NA   53658.84   98804.80   73007.11          NA          NA        NA        NA 0.8083312 1.2092141 1.1153042
3 585562.724 1296530.34 941046.53         NA         NA       NA       NA  715547.92 1001344.20  838032.04          NA          NA        NA        NA 0.8183417 1.2947899 1.1229243
5 320338.198  209092.54 264715.37         NA         NA       NA       NA  335384.20  217110.41  284695.43          NA          NA        NA        NA 0.9551380 0.9630701 0.9298195
6 918368.020  577355.58 747861.80         NA         NA       NA       NA 1013708.19  571826.16  824330.18          NA          NA        NA        NA 0.9059491 1.0096698 0.9072357

[[2]]
  Meve_mean Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm diff_eve  diff_mor  diff_tot
5  320338.2  209092.5  264715.4         NA         NA       NA       NA   335384.2   217110.4   284695.4          NA          NA        NA        NA 0.955138 0.9630701 0.9298195

[[3]]
  Meve_mean Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve  diff_mor  diff_tot
5  320338.2  209092.5  264715.4         NA         NA       NA       NA   335384.2   217110.4   284695.4          NA          NA        NA        NA 0.9551380 0.9630701 0.9298195
6  918368.0  577355.6  747861.8         NA         NA       NA       NA  1013708.2   571826.2   824330.2          NA          NA        NA        NA 0.9059491 1.0096698 0.9072357

[[4]]

  Meve_mean Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm diff_eve diff_mor diff_tot
4  3866.547   23511.3  13688.92         NA         NA       NA       NA   3422.172   11541.89    6902.05          NA          NA        NA        NA 1.129852 2.037041 1.983313

[[5]]
   Meve_mean  Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm  Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve diff_mor  diff_tot
1   7774.442   39113.53  23443.99  258492587  246665241 35569170 30522100   10752.438   29254.64   18681.95   278901500   235415567  16743477   3922418 0.7230399 1.337003 1.2549002
2  43374.117  119476.16  81425.14         NA         NA       NA       NA   53658.844   98804.80   73007.11          NA          NA        NA        NA 0.8083312 1.209214 1.1153042
3 585562.724 1296530.34 941046.53         NA         NA       NA       NA  715547.922 1001344.20  838032.04          NA          NA        NA        NA 0.8183417 1.294790 1.1229243
4   3866.547   23511.30  13688.92         NA         NA       NA       NA    3422.172   11541.89    6902.05          NA          NA        NA        NA 1.1298517 2.037041 1.9833127
6 918368.020  577355.58 747861.80         NA         NA       NA       NA 1013708.188  571826.16  824330.18          NA          NA        NA        NA 0.9059491 1.009670 0.9072357

[[6]]
   Meve_mean  Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve diff_mor diff_tot
1   7774.442   39113.53  23443.99  258492587  246665241 35569170 30522100  10752.438   29254.64   18681.95   278901500   235415567  16743477   3922418 0.7230399 1.337003 1.254900
2  43374.117  119476.16  81425.14         NA         NA       NA       NA  53658.844   98804.80   73007.11          NA          NA        NA        NA 0.8083312 1.209214 1.115304
3 585562.724 1296530.34 941046.53         NA         NA       NA       NA 715547.922 1001344.20  838032.04          NA          NA        NA        NA 0.8183417 1.294790 1.122924
4   3866.547   23511.30  13688.92         NA         NA       NA       NA   3422.172   11541.89    6902.05          NA          NA        NA        NA 1.1298517 2.037041 1.983313

如果您希望数据框前面的选择标准列,如您的示例所示,您可以使用:

c(lapply(cols, function(x)subset(tbl_comp[,c(cols, setdiff(colnames(tbl_comp), cols))], eval(parse(text=x))<1)), lapply(cols, function(x)subset(tbl_comp[,c(cols, setdiff(colnames(tbl_comp), cols))], eval(parse(text=x))>1)))

【讨论】:

  • 我可以将它们分成不同的数据框吗?
  • lapply 给你一个列表。您可以将上述命令的输出存储在变量中,例如allframes,然后使用allframes[[1]] 等访问单个元素。
  • 如果你想要一个数据帧,你可以像 DJack 的解决方案一样:subset(tbl_comp, diff_eve&lt;1) 第一个数据帧等...
【解决方案2】:

我不知道我是否正确理解了您的问题,但这里可能是一个解决方案:

diff_eve_below <- subset(tbl_comp, diff_eve < 1)
diff_eve_above <- subset(tbl_comp, diff_eve > 1)
diff_mor_below <- subset(tbl_comp, diff_mor < 1)
diff_mor_above <- subset(tbl_comp, diff_mor > 1)
diff_tot_below <- subset(tbl_comp, diff_tot < 1)
diff_tot_above <- subset(tbl_comp, diff_tot > 1)

【讨论】:

  • 当我尝试运行所有代码时出现错误“dput$diff_eve 中的错误:'closure' 类型的对象不是子集”
  • @Rechlay replace dput by tbl_comp (我的错误,我已经纠正了)但 user1981275 的答案要优雅得多;)
猜你喜欢
  • 2022-12-03
  • 1970-01-01
  • 1970-01-01
  • 2018-06-08
  • 2013-06-22
  • 2022-01-26
  • 1970-01-01
  • 2019-06-05
  • 1970-01-01
相关资源
最近更新 更多