【问题标题】:programming with dplyr- how to deal with quotes/enquotes使用 dplyr 编程 - 如何处理引号/引号
【发布时间】:2018-07-22 03:49:45
【问题描述】:

我需要创建一个使用dplyr 执行数据库调用的包装函数。

首先创建一个可重现的示例:

library("DBI")
library("dplyr")
conn = DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

df = expand.grid(indate = as.character(as.POSIXct(seq(as.Date('2017/06/06'), as.Date('2018/02/12'), by="day"))), name = c("Canada","Japan","USA"), stringsAsFactors = FALSE)

copy_to(conn, df, "lineups_country",
        temporary = FALSE, 
        indexes = list(
          "indate",
          "name"
        )
)

这是在没有包装函数的情况下运行良好的代码:

res = tbl(conn, table)

# filter the country
res = res %>% filter(name %in% c("Canada","Japan"))

# filter the date
res = res %>% filter(indate >= "2018-01-01")

res %>% show_query()
df2=res %>% collect()
unique(df$name);unique(df2$name)
min(df$indate);min(df2$indate)

现在要创建包装函数,我已经阅读了文档https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

然而事情对我来说还不是很清楚,尤其是关于引号/引用。

这是我尝试过的:

myFun = function(conn, table, 
                 dateCol   = "indate", 
                 startDate = as.POSIXct("2018-01-01"), 
                 key       = list(name = c("Australia","Japan"))) {


  on.exit({dbDisconnect(conn)})
  res = tbl(conn, table) 

  res %>% show_query()

  # filter the country
  countryCol = names(key)
  enquo_country <- enquo(countryCol) #enquo_country <- rlang::sym(countryCol) #
  res = res %>% filter(!!enquo_country %in% key[[1]])

  res %>% show_query()

  # filter the date
  enquo_dateCol <- enquo(dateCol) #enquo_country <- rlang::sym(names(key)) #
  res = res %>% filter(!!enquo_dateCol >= as.character(startDate))

  res %>% show_query()

  return(res %>% collect())
}

给出错误:

匹配错误(x, table, nomatch = 0L):“匹配”需要向量 论据

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    你需要改变一些东西:

    • 指定一个表,而不是table,它是一个函数;
    • 在调用names(key) 返回的字符向量上使用sym 将其转换为quosure;
    • 如果您要使用enquo,请不要引用dateCol。如果要引用它,请使用sym
    • 名称startDate 一致;
    • startDate 转换为字符没有特别的意义;无论如何都处理得很好。
    library("DBI")
    library("dplyr")
    
    conn = DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
    df = expand.grid(indate = as.character(as.POSIXct(seq(as.Date('2017/06/06'), as.Date('2018/02/12'), by="day"))), 
                     name = c("Canada","Japan","USA"), stringsAsFactors = FALSE)
    
    copy_to(conn, df, "lineups_country",
            temporary = FALSE, 
            indexes = list("indate", "name"))
    
    myFun = function(conn, table, 
                     dateCol   = indate, 
                     startDate = as.POSIXct("2018-01-01"), 
                     key       = list(name = c("Australia","Japan"))) {
        on.exit({dbDisconnect(conn)})
        res = tbl(conn, table)     
        res %>% show_query()
    
        # filter the country
        enquo_country <- sym(names(key))    # use `sym` here
        res = res %>% filter(!!enquo_country %in% key[[1]])      
        res %>% show_query()
    
        # filter the date
        enquo_dateCol <- enquo(dateCol)
        res = res %>% filter(!!enquo_dateCol >= startDate)
        res %>% show_query()
    
        return(res %>% collect())
    }
    

    现在:

    df2 <- myFun(conn, 
          table = "lineups_country",    # the table name
          key = list(name = c("Canada", "Japan")), 
          dateCol = indate,    # not quoted if using `enquo`
          startDate = as.POSIXct("2018-01-01"))
    #> <SQL>
    #> SELECT *
    #> FROM `lineups_country`
    #> <SQL>
    #> SELECT *
    #> FROM `lineups_country`
    #> WHERE (`name` IN ('Canada', 'Japan'))
    #> <SQL>
    #> SELECT *
    #> FROM (SELECT *
    #> FROM `lineups_country`
    #> WHERE (`name` IN ('Canada', 'Japan')))
    #> WHERE (`indate` >= '2018-01-01T05:00:00Z')
    
    df2
    #> # A tibble: 82 x 2
    #>    indate              name  
    #>    <chr>               <chr> 
    #>  1 2018-01-02 19:00:00 Canada
    #>  2 2018-01-02 19:00:00 Japan 
    #>  3 2018-01-03 19:00:00 Canada
    #>  4 2018-01-03 19:00:00 Japan 
    #>  5 2018-01-04 19:00:00 Canada
    #>  6 2018-01-04 19:00:00 Japan 
    #>  7 2018-01-05 19:00:00 Canada
    #>  8 2018-01-05 19:00:00 Japan 
    #>  9 2018-01-06 19:00:00 Canada
    #> 10 2018-01-06 19:00:00 Japan 
    #> # ... with 72 more rows
    

    【讨论】:

      【解决方案2】:

      'key' 和 'dateCol' 的 names 是字符输入,使用来自 rlangsym 将其转换为符号以进行评估

      myFun = function(conn, table, 
                       dateCol   = "indate", 
                       startDate = as.POSIXct("2018-01-01"), 
                       key       = list(name = c("Australia","Japan"))) {
      
      
        on.exit({dbDisconnect(conn)})
        res = tbl(conn, table) 
      
        res %>%
              show_query()
      
        # filter the country
        countryCol = names(key)
       country <- rlang::sym(countryCol) 
        res <- res %>% 
                   filter(!! (country) %in% key[[1]])
      
        res %>% 
               show_query()
      
        # filter the date
        dateCol <- rlang::sym(dateCol) 
        res <- res %>%
                   filter(!! (dateCol) >= startDate)
      
        res %>%
             show_query()
      
        return(res %>% 
                     collect())
       }
      

      -运行函数

      df2 <- myFun(conn, 
             table = "lineups_country",    # the table name
             key = list(name = c("Canada", "Japan")), 
             dateCol = "indate",    
             startDate = as.POSIXct("2018-01-01"))
      #<SQL>
      #SELECT *
      #FROM `lineups_country`
      #<SQL>
      #SELECT *
      #FROM `lineups_country`
      #WHERE (`name` IN ('Canada', 'Japan'))
      #<SQL>
      #SELECT *
      #FROM (SELECT *
      #FROM `lineups_country`
      #WHERE (`name` IN ('Canada', 'Japan')))
      #WHERE (`indate` >= '2017-12-31T18:30:00Z')
      
      head(df2, 5)
      # A tibble: 5 x 2
      #  indate              name  
      #   <chr>               <chr> 
      #1 2018-01-01 05:30:00 Canada
      #2 2018-01-01 05:30:00 Japan 
      #3 2018-01-02 05:30:00 Canada
      #4 2018-01-02 05:30:00 Japan 
      #5 2018-01-03 05:30:00 Canada
      

      【讨论】:

      • 难道我们不需要使用%&gt;% filter(UQ(country) %in% key[[1]]) 来避免!! 传播到表达式的其余部分吗?
      • @RockScience 我错过了国内的c。您可以使用UQ!!。如果不想传播filter(!!(country) %in% key[[1]]
      • 它从何而来?
      • @RockScience 没关系。它将来自深层状态或相当于 1984 年的状态:=)
      猜你喜欢
      • 2021-01-11
      • 2011-07-04
      • 1970-01-01
      • 2011-07-01
      • 1970-01-01
      • 2020-10-10
      • 2011-11-16
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多