如果目的是应用check_variables(),它接收数据集(表)并返回单个TRUE 或FALSE,那么问题可能与矢量化函数的使用有关。
R 和 R 的包有许多向量化函数,例如is.na,这意味着当将这些函数应用于列表c(1, NA, 2) 或数据框时,该函数将应用于列表的每个元素,从而产生@ 987654326@ 而不是 TRUE(任何元素 is.na)或 FALSE(所有元素 is.na)。
当check_variable() 函数由这些向量化函数组成时,我们需要“聚合”这些向量化函数使用all、any 等函数。此外,我们还需要控制聚合范围,以控制 check_variables() 函数是应用于元素、变量(列)还是整个表(数据框):
require(tidyverse) # in production code, import only `dplyr` and `tidyr`
require(purrr)
a = data.frame(x = c(1,2,3), y =c(3,NA,5))
b = data.frame(x = c(1,NA,3), y =c(3,4,5))
c = data.frame(x = c(1,NA,3), y =c(3,NA,4))
# apply `check.func` on varaibles(columns)
# aggregation has to be limited within scope of each varaible (column)
# `dplyr::summarize_all` happens to functioning like this
check.vars = function(list.tbls, check.func) list.tbls %>% map(~ .x %>% summarize_all(check.func) )
# apply `check.func` on the entire table
# as long as `check.func` takes a table and returns a single value
# we can directly apply this function
check.tbls = function(list.tbls, check.func) list.tbls %>% map(~ check.func(.x))
## Some sample functions
# check if all elements under the scope, has no NA
# take in either a vector or a table, return a boolean
has.no.na = . %>% is.na %>% any %>% `!`
# check if all elements under the scope is less than 5, NAs are counted as False
# take in either a vector or a table, return a boolean
has.no.na = . %>% is.na %>% any %>% `!`
is.lt.5 = . %>% `<`(5) %>% all %>% replace_na(F)
# check if all elements under the scope is less than 5, NAs are ignored, all NA means TRUE
# take in either a vector or a table, return a boolean
is.lt.5.rm.na = . %>% `<`(5) %>% all(na.rm=T)
## Use of sample functions to check variables within each dataset
list(a,b,c) %>% check.vars(has.no.na)
list(a,b,c) %>% check.vars(is.lt.5)
## Use of sample functions to check each dataset
list(a,b,c) %>% check.tbls(has.no.na)
list(a,b,c) %>% check.tbls(is.lt.5)
list(a,b,c) %>% check.tbls(is.lt.5.rm.na)