【发布时间】:2021-03-22 17:13:55
【问题描述】:
我是r 的新手,正在尝试解决一个基本问题。
我有一个名为books 的小标题。其中一列是total_purchased(购买的图书总数),另一列是title(书名)。
在total_purchased 列中有许多缺失值。我想将这些替换为每本书的平均购买量。但是,我不能真正让这个工作以一种有效的方式工作。下面我刚刚硬编码了书名。
例如,我
-
过滤
total_purchased列包含na值的小标题,并按书籍title。 -
计算
mean。 -
为每本书分别执行这些步骤。
-
使用
mutate函数添加一个新列,该列只是total_purchased的一个副本,但它为每个na值分配相关均值。
我基本上只需要了解如何简化它,这样我就不会硬编码书名,也可以减少代码量。我对r 有点太陌生,无法自己解决。在另一种语言中,我会在这里使用循环,但不确定是否可以使用一些矢量化来简单地做到这一点。
# Calculate mean total purchased for particular book.
SOR <- books %>%
filter(!(is.na(total_purchased))) %>%
filter(title == "Secrets Of R For Advanced Students") %>%
pull(total_purchased) %>%
mean
RFD <- books %>%
filter(!(is.na(total_purchased))) %>%
filter(title == "R For Dummies") %>%
pull(total_purchased) %>%
mean
FOR <- books %>%
filter(!(is.na(total_purchased))) %>%
filter(title == "Fundamentals of R For Beginners") %>%
pull(total_purchased) %>%
mean
RVP <- books %>%
filter(!(is.na(total_purchased))) %>%
filter(title == "R vs Python: An Essay") %>%
pull(total_purchased) %>%
mean
TTM <- books %>%
filter(!(is.na(total_purchased))) %>%
filter(title == "Top 10 Mistakes R Beginners Make") %>%
pull(total_purchased) %>%
mean
RME <- books %>%
filter(!(is.na(total_purchased))) %>%
filter(title == "R Made Easy") %>%
pull(total_purchased) %>%
mean
# Assign mean specific to book when total purchased value is na
books <- books %>%
mutate(complete_purchased = case_when(
is.na(total_purchased) & title == "Secrets Of R For Advanced Students" ~ SOR,
is.na(total_purchased) & title == "R For Dummies" ~ RFD,
is.na(total_purchased) & title == "Fundamentals of R For Beginners" ~ FOR,
is.na(total_purchased) & title == "R vs Python: An Essay" ~ RVP,
is.na(total_purchased) & title == "Top 10 Mistakes R Beginners Make" ~ TTM,
is.na(total_purchased) & title == "R Made Easy" ~ RME,
TRUE ~ total_purchased
))
【问题讨论】: