基于第二个 df 对 df 的特定行和列应用转换答案

【问题标题】：apply a transformation to specific rows and columns of a df based on a second df基于第二个 df 对 df 的特定行和列应用转换
【发布时间】：2018-05-24 12:13:24
【问题描述】：

我有两个巨大的 df（尤其是第一个），我在这里对其进行了简化。

library(tidyverse)
(thewhat <- tibble(sample = 1:10L, y= 1.0, z =2.0))

# A tibble: 10 x 3
   sample     y     z
    <int> <dbl> <dbl>
 1      1    1.    2.
 2      2    1.    2.
 3      3    1.    2.
 4      4    1.    2.
 5      5    1.    2.
 6      6    1.    2.
 7      7    1.    2.
 8      8    1.    2.
 9      9    1.    2.
10     10    1.    2.

(thewhere <- tibble(cond = c("a","a","b","c","a"),
     init_sample= c(1,3,4,5,7), 
     duration = c(1,2,2,1,3), 
     where = c(NA,"y","z","y","z")))

# A tibble: 5 x 4
  cond  init_sample duration where
  <chr>       <dbl>    <dbl> <chr>
1 a              1.       1. <NA> 
2 a              3.       2. y    
3 b              4.       2. z    
4 c              5.       1. y    
5 a              7.       3. z

我想根据thewhere df 的信息将thewhat df 的一些单元格“变异”为NAs。重要的是，thewhat 是宽格式，我不想将其转换为长格式（因为我有数百万行）。

我想将thewhere 中由init_sample 指示的样本转换为where 指示的列的duration。（如果 where 为 NA，则意味着它适用于 thewhat 的所有列，sample 除外；此处为 y 和 z。）

我创建了一个 df，NAs，它指示哪些单元格应该是 NA：

# table with the elements that should be replaced by NA
NAs <- filter(thewhere, cond=="a") %>% 
      mutate( sample = map2(init_sample, init_sample + duration - 1,seq)) %>% 
      unnest %>%
      select(where, sample)

我尝试了不同的方法，这是我得到的最接近的方法。在接下来的mutate 中，我对一列进行了 NA 转换，我可以手动添加其余相关列，但在我的实际场景中，我有 30 列。

# Takes into account the different columns but I need to manually add each relevant column
# and another case for mutate_all when the where is NA:
mutate(thewhat, y = if_else(sample %in% NAs$sample[NAs$where =="y"],  
        NA_real_, y  ))

预期的输出如下：

# A tibble: 10 x 3
   sample     y     z
    <int> <dbl> <dbl>
 1      1   NA    NA
 2      2    1.    2.
 3      3   NA     2.
 4      4   NA     2.
 5      5    1.    2.
 6      6    1.    2.
 7      7    1.   NA 
 8      8    1.   NA 
 9      9    1.   NA 
10     10    1.    2.

也许mutate_at 或mutate_if 可以在这里工作，但我不知道如何。或者来自purrr 的一些map 函数可以拯救我，但我无法让它适用于这种情况。

（如果解决方案保留在 tidyverse 中，Brownie 指出，但我也可以接受另一种类型的解决方案）。

谢谢，布鲁诺

【问题讨论】：

您是否缺少来自NAs 的数据？喜欢4z, 5z, 5y？
我不明白你的问题@CPak。
我对 purrr 的尝试以 map_dfc(select(thewhat,-sample), ~ if_else(thewhat$sample %in% NAs$sample, NA_real_,.x)) 结束，它会转换每一列，我无法指示对每一列做不同的事情，因为 names(.x) 只返回 NULL。
NAs 应指定应修改 thewhat 中的哪些值。但根据我认为你想要的，不应该将4z, 5z, and 5y 也更改为NA。这来自thewhere[3, ] & thewhere[4, ]...
你能用预期的输出更新吗

标签： r dataframe dplyr purrr

【解决方案1】：

根据描述，我们可以使用map

library(tidyverse)
lst <- NAs %>% 
         split(.$where)
set_names(names(lst), names(lst)) %>%
     map_df(., ~ thewhat[[.x]] %>%
                 replace(., thewhat$sample %in% lst[[.x]]$sample, NA_real_) ) %>%
     bind_cols(thewhat %>%
                 select(sample), .)
# A tibble: 10 x 3
#   sample     y     z
#    <int> <dbl> <dbl>
# 1      1     1     2
# 2      2     1     2
# 3      3    NA     2
# 4      4    NA     2
# 5      5     1     2
# 6      6     1     2
# 7      7     1    NA
# 8      8     1    NA
# 9      9     1    NA
#10     10     1     2

【讨论】：

几乎！仅当where 为NA 时才缺少这种情况，然后y 和x 在输出中都应为NA。我需要检查`set_names`函数，我不知道。