【问题标题】:why I do not get the column name recognised since I have it into my tibble when attempting replace_with_na_all?为什么我在尝试 replace_with_na_all 时没有识别列名,因为我将它放入了我的 tibble?
【发布时间】:2020-12-03 16:32:24
【问题描述】:

我正在尝试在 R 中使用 tidyverse 将 NA 分配给我的 tibble 中的某些分类值。 但是,我的列名没有被选中。

这是我的假数据:

structure(list(id = c("1", "2", "3", "4", "5", "6", "7", "9", 
"8", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20"), health_care_worker = c("No", "No", "No", "No", "Yes", 
"No", "No", "Yes", "No", "No", "No", "No", "No", "No", "No", 
"No", "No", "No", "No", "No"), how_unwell = c(1, 6, 1, 1, 1, 
6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Comorbidity_one = structure(c(5L, 
5L, 5L, 3L, 5L, 5L, 5L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 5L, 
3L, 5L, 4L), .Label = c("Asthma (managed with an inhaler)", "Diabetes Type 2", 
"High Blood Pressure (hypertension)", "No", "None"), class = "factor"), 
    Comorbidity_two = structure(c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, 2L, NA, 2L, 1L, 1L), .Label = c("No", 
    "Obesity"), class = "factor"), Comorbidity_three = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, "No", "No"), Comorbidity_four = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), Comorbidity_five = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), Comorbidity_six = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), Comorbidity_seven = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), Comorbidity_eight = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), Comorbidity_nine = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

这是我为获得所需输出而编写的代码:

na_strings <- c("Diarrhoea", "Long Standing Health Issues", "No","Yes", "0", "4", "15", "No", "Self-Isolating With No Symptoms",
                "Showing Symptoms But Not Tested", "Mild", "Moderate")

data_replace_na <- fake_data %>%
  replace_with_na_all(condition = ~.Comorbidity_one %in% na_strings,
                      condition = ~.Comorbidity_two %in% na_strings, 
                      condition = ~. Comorbidity_three %in% na_strings)

这是我得到的第一个错误:

Error: unexpected symbol in:
"                      condition = ~.Comorbidity_two %in% na_strings, 
                      condition = ~. Comorbidity_three"

如果我删除我的第二个条件和第三个条件,则会收到此错误:

Error in .Comorbidity_one %in% na_strings : 
  object '.Comorbidity_one' not found

有人知道我为什么不能成功吗?这似乎是一个双重问题。首先,它不接受我的列名,其次,我怎样才能成功地将这些变量的类别分配给 NA?

【问题讨论】:

  • 我假设您只想在某些列中设置 NA?

标签: r tidyverse


【解决方案1】:

condition 循环遍历每一列,因为它根据文档采用匿名函数

condition - 设置 NA 需要为 TRUE 的条件。在这里,条件是用一个公式指定的,语法如下:~.x {condition}。例如,写 ~.x

此外,根据文档,它采用了整个数据集,并且没有提供列子集

此函数接受一个数据帧并替换所有满足指定为 NA 值的条件的值,遵循特殊语法。

通过检查源代码,它在所有列上使用map 进行循环

...
purrr::map_dfc(data, ~na_set(.x, condition))
...

所以,除非我们 select 之前的列,否则它将在所有列上执行

fake_data %>%
  replace_with_na_all(condition = ~.x %in% na_strings)

如果我们只需要替换选定的列,请使用 mutateacross

library(dplyr)
fake_data %>%
     mutate(across(starts_with('Comorbidity'), 
               ~ replace(., . %in% na_strings, NA)))

【讨论】:

  • 谢谢。它工作得很好。我需要第二个代码。我也想勾选这个作为答案。在 2 分钟内,我实际上可以
  • 谢谢。它工作得很好。事实上,答案真是太棒了!!
猜你喜欢
  • 1970-01-01
  • 2017-11-26
  • 2021-02-22
  • 2017-11-21
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-11-27
  • 1970-01-01
相关资源
最近更新 更多