【问题标题】:Replace NA value with the correct one [duplicate]用正确的值替换NA值[重复]
【发布时间】:2019-08-15 00:06:30
【问题描述】:

假设我们有以下数据框:

ID <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6)
age <- c(25, 25, 25, 22, 22, 56, 56, 56, 80, 33, 33, 90, 90, 90, 5, 5, 5)
gender <- c("m", "m", NA, "f", "f", "m", NA, "m", "m", "m", NA, NA, NA, "m", NA, NA, NA)
company <- c("c1", "c2", "c2", "c3", "c3", "c1", "c1", "c1", "c1", "c5", "c5", "c3", "c4", "c5", "c3", "c1", "c1")
income <- c(1000, 1000, 1000, 500, 1700, 200, 200, 250, 500, 700, 700, 300, 350, 300, 500, 1700, 200)

df <- data.frame(ID, age, gender, company, income)

在此数据中,我们有 6 个唯一的 IDs,如果您查看 gender 变量,有时会包含 NA

我想用正确的性别类别替换 NAs。此外,如果一个 ID 包含所有NA 的性别,则保持原样。

预期的结果是:

【问题讨论】:

  • 你可以使用fill, df %&gt;% group_by(age) %&gt;% fill(gender) %&gt;% fill(gender, .direction = "up")

标签: r


【解决方案1】:

这是使用ave 在基础 R 中的方法 -

df$gender <- with(df, ave(gender, ID, FUN = function(x) na.omit(x)[1]))

   ID age gender company income
1   1  25      m      c1   1000
2   1  25      m      c2   1000
3   1  25      m      c2   1000
4   2  22      f      c3    500
5   2  22      f      c3   1700
6   3  56      m      c1    200
7   3  56      m      c1    200
8   3  56      m      c1    250
9   3  80      m      c1    500
10  4  33      m      c5    700
11  4  33      m      c5    700
12  5  90      m      c3    300
13  5  90      m      c4    350
14  5  90      m      c5    300
15  6   5   <NA>      c3    500
16  6   5   <NA>      c1   1700
17  6   5   <NA>      c1    200

dplyrtidyr 的一些方法-

df %>% 
  group_by(ID) %>% 
  mutate(gender = na.omit(gender)[1])

df %>% 
  group_by(ID) %>% 
  fill(gender, .direction = "up") %>% 
  fill(gender, .direction = "down")

【讨论】:

    【解决方案2】:

    使用tidyverse 库,您可以做到这一点

    library(tidyverse)
    # for each ID get the gender
    df_gender_ref <- df %>% filter(!is.na(gender)) %>% select(ID,gender) %>% unique() 
    # add the new gender column to the original dataframe
    df %>% select(-gender) %>% left_join(df_gender_ref) 
    

    【讨论】:

      猜你喜欢
      • 2021-06-28
      • 1970-01-01
      • 1970-01-01
      • 2019-10-11
      • 2019-06-27
      • 1970-01-01
      • 2019-11-12
      • 2014-06-28
      • 2015-04-26
      相关资源
      最近更新 更多