选择/获取负值在 0-10 之间的所有列的名称答案

【问题标题】：Select/Get names of all columns which have a negative value between 0-10选择/获取负值在 0-10 之间的所有列的名称
【发布时间】：2019-10-03 17:49:32
【问题描述】：

对于数据框，我想获取或选择在一定范围内具有负值的所有列的名称。 This 帖子非常接近，但它遍历行，这对我的数据不可行。此外，如果我存储该解决方案，它会变成一个列表，我更喜欢向量。例如，对于以下数据集：

library(data.table)
df <- fread(
     "A   B   D   E  iso   year   
      0   1   1   NA ECU   2009   
      1   0   2   0  ECU   2009   
      0   0   -3  0  BRA   2011   
      1   0   4   0  BRA   2011   
      0   1   7   NA ECU   2008   
     -1   0   1   0  ECU   2008   
      0   0   3   2  BRA   2012   
      1   0   4   NA BRA   2012",
  header = TRUE
)

我想要所有具有 0 到 10 之间负值的列的名称（示例中为 A 和 D）。实现这一目标的最简单解决方案是什么？其他一切等同于 data.table 解决方案将是首选。

【问题讨论】：

所以您想识别所有值都大于 -10 且小于 0 的列？
嘿tmfmnk。几乎.. 我想要任何值大于 -10 且小于 0 的列的标识。

标签： r range lapply negative-number

【解决方案1】：

tidyverse 的一种可能是：

 df %>%
 gather(var, val, -c(5:6)) %>%
 group_by(var) %>%
 summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))

  var   res  
  <chr> <lgl>
1 A     TRUE 
2 B     FALSE
3 D     TRUE 
4 E     FALSE

仅选择数字列：

df %>%
 select_if(is.numeric) %>%
 gather(var, val) %>%
 group_by(var) %>%
 summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))

请注意，它还选择“年份”列，因为它是一个数字列。

您也可以使用base R：

df <- Filter(is.numeric, df)
cond <- as.logical(colSums(df > -10, na.rm = TRUE) *
                    colSums(df < -0, na.rm = TRUE))
colnames(df[, cond])

[1] "A" "D"

或写成“单线”：

df <- Filter(is.numeric, df)
colnames(df[, as.logical(colSums(df > -10, na.rm = TRUE) * colSums(df < -0, na.rm = TRUE))])

样本数据：

df <- read.table(text = 
 "A   B   D   E  iso   year   
      0   1   1   NA ECU   2009   
      1   0   2   0  ECU   2009   
      0   0   -3  0  BRA   2011   
      1   0   4   0  BRA   2011   
      0   1   7   NA ECU   2008   
     -1   0   1   0  ECU   2008   
      0   0   3   2  BRA   2012   
      1   0   4   NA BRA   2012", 
 header = TRUE,
 stringsAsFactors = FALSE)

【讨论】：

感谢您的回答 tmfmnk！是否可以使用 !is.numeric 而不是 -c(5:6) 来取消选择非数字列？这适用于示例，但会给我拥有的大型数据集带来问题..
它似乎也可以通过将-c(5:6) 替换为cols 来工作，其中cols 是：cols=sapply(df, is.numeric) - cols=names(cols)[cols]。

【解决方案2】：

另一个tidyverse 变体：

df %>% 
   group_by(iso,year) %>% 
   keep(~any(.x>-10 & .x<0 & !is.na(.x))) %>% 
   names()
 "A" "D"

编辑：要处理因子，请使用mutate_if。我们也可以这样做（虽然我认为分组会更好）：

  df %>% 
   mutate_if(is.factor,as.character) %>% 
   purrr::keep(~any(.x>-10 & .x<0 & !is.na(.x))) %>% 
   names()
[1] "A" "D"

价值观：

df %>% 
  group_by(iso,year) %>% 
   keep(~any(.x>-10 & .x<0 & !is.na(.x)))
# A tibble: 8 x 2
      A     D
  <int> <int>
1     0     1
2     1     2
3     0    -3
4     1     4
5     0     7
6    -1     1
7     0     3
8     1     4

【讨论】：

感谢 NelsonGon！有没有办法在不指定之前哪些列是非数字的情况下让它工作？如前所述，这在示例中效果很好，但会给我拥有的大型数据集带来问题..
不确定。在这里，我们不测试non_numerics，因为keep 处理它。它将（我认为）丢弃任何非数字数据，因此无需事先检查。你能分享一下你的数据集发生了什么吗？
第一个解决方案出现以下错误：Error: Predicate functions must return a single TRUE or FALSE, not a missing value Call rlang::last_error() to see a backtrace In addition: Warning messages: 1: In Ops.factor(.x, -10) : ‘>’ not meaningful for factors 2: In Ops.factor(.x, 0) : ‘<’ not meaningful for factors
已验证！非常感谢！