使用 dplyr 的 transmute_all 将列值替换为列名答案

【问题标题】：Replace column values with column name using dplyr's transmute_all使用 dplyr 的 transmute_all 将列值替换为列名
【发布时间】：2018-05-02 15:27:14
【问题描述】：

数据集包含许多包含 NA 或 1 值的列，如下所示：

> data_frame(a = c(NA, 1, NA, 1, 1), b=c(1, NA, 1, 1, NA))
# A tibble: 5 x 2
      a     b
  <dbl> <dbl>
1 NA     1.00
2  1.00 NA   
3 NA     1.00
4  1.00  1.00
5  1.00 NA

期望的输出：用列名作为字符串替换所有1个值，

> data_frame(a = c(NA, 'a', NA, 'a', 'a'), b=c('b', NA, 'b', 'b', NA))
# A tibble: 5 x 2
  a     b    
  <chr> <chr>
1 <NA>  b    
2 a     <NA> 
3 <NA>  b    
4 a     b    
5 a     <NA>

这是我在 transmute_all 中使用匿名函数的尝试：

> data_frame(a = c(NA, 1, NA, 1, 1), b=c(1, NA, 1, 1, NA)) %>%
+     transmute_all(
+         funs(function(x){if (x == 1) deparse(substitute(x)) else NA})
+     )
Error in mutate_impl(.data, dots) : 
  Column `a` is of unsupported type function

编辑：尝试 #2：

> data_frame(a = c(NA, 1, NA, 1, 1), b=c(1, NA, 1, 1, NA)) %>%
+     transmute_all(
+         funs(
+             ((function(x){if (!is.na(x)) deparse(substitute(x)) else NA})(.))
+             )
+     )
# A tibble: 5 x 2
  a     b    
  <lgl> <chr>
1 NA    b    
2 NA    b    
3 NA    b    
4 NA    b    
5 NA    b    
Warning messages:
1: In if (!is.na(x)) deparse(substitute(x)) else NA :
  the condition has length > 1 and only the first element will be used
2: In if (!is.na(x)) deparse(substitute(x)) else NA :
  the condition has length > 1 and only the first element will be used
>

【问题讨论】：

标签： r dplyr

【解决方案1】：

一个选项是map2

library(purrr)
map2_df(df1, names(df1), ~  replace(.x, .x==1, .y))
# A tibble: 5 x 2
#  a     b    
# <chr> <chr>
#1 NA    b    
#2 a     NA   
#3 NA    b    
#4 a     b    
#5 a     NA

或者正如@Moody_Mudskipper 评论的那样

imap_dfr(df1, ~replace(.x, .x==1, .y))

在base R，我们可以做

df1[] <- names(df1)[col(df1) *(df1 == 1)]

数据

df1 <-  data_frame(a = c(NA, 1, NA, 1, 1), b=c(1, NA, 1, 1, NA))

【讨论】：

很好的答案。根据问题的要求，检查了另一个答案，因为它使用了纯 dplyr。谢谢。
或imap_dfr(df1, ~replace(.x, .x==1, .y)) 更紧凑

【解决方案2】：

如果您想坚持使用 dplyr 解决方案，您几乎已经拥有它

library(dplyr)

df <- data_frame(a = c(NA, 1, NA, 1, 1), b = c(1, NA, 1, 1, NA))

df %>% 
    transmute_all(funs(ifelse(. == 1, deparse(substitute(.)), NA)))

#> # A tibble: 5 x 2
#>     a     b    
#>   <chr> <chr>
#> 1 <NA>  b    
#> 2 a     <NA> 
#> 3 <NA>  b    
#> 4 a     b    
#> 5 a     <NA>

【讨论】：

不推荐使用 funs()，现在执行：transmute_all(function(x) ifelse(x == 1, deparse(substitute(x)), NA))。这也被取代了，现在应该使用across() 完成，但这对您没有帮助，因为它为所有值返回“col”，除非您使用cur_column() 构造它。别介意across()，恕我直言，这很不方便......

【解决方案3】：

基础R 也应该没问题：

nn <- names(df)
for (i in seq_along(df)) {
  df[i] <- ifelse(df[i] == 1, nn[i], df[i])
}

这会产生

     a    b
1 <NA>    b
2    a <NA>
3 <NA>    b
4    a    b
5    a <NA>

【讨论】：

【解决方案4】：

因为deparse(substitute(.))会返回一个长度为1的字符串，你可以直接用.作为子集，因为NA的子集返回NA：

library(tidyverse)

df <- data_frame(a = c(NA, 1, NA, 1, 1), 
                 b = c(1, NA, 1, 1, NA))

df %>% mutate_all(funs(deparse(substitute(.))[.]))
#> # A tibble: 5 x 2
#>   a     b    
#>   <chr> <chr>
#> 1 <NA>  b    
#> 2 a     <NA> 
#> 3 <NA>  b    
#> 4 a     b    
#> 5 a     <NA>

一种不涉及解析名称的方法是重塑为长格式，以便变量名称是可以照常操作的变量。在这里，强制转换为逻辑向量使子集的行为与上述相同。如果您想在重新整形为宽格式时保持行顺序，则需要添加索引列。

df %>% 
    rowid_to_column('i') %>% 
    gather(variable, value, -i) %>% 
    mutate(value = variable[as.logical(value)]) %>% 
    spread(variable, value)
#> # A tibble: 5 x 3
#>       i a     b    
#>   <int> <chr> <chr>
#> 1     1 <NA>  b    
#> 2     2 a     <NA> 
#> 3     3 <NA>  b    
#> 4     4 a     b    
#> 5     5 a     <NA>

【讨论】：

funs 已被软性弃用。在这种情况下我们如何替换它？
使用purrr::imodify: df %>% imodify(~.y[.x]) 是保持相同逻辑的一种非常简洁的方法
或代替funs，您需要使用完整的匿名函数：df %>% mutate_all(function(x) deparse(substitute(x))[x])。 purrr-style ~ 引用的 lambda 表示法似乎不起作用，大概是因为它的解析方式。
很好，非常感谢。你可以用这个来更新你的答案。