【问题标题】:How to use a predicate while reading from JDBC connection?从 JDBC 连接读取时如何使用谓词?
【发布时间】:2018-01-07 07:48:14
【问题描述】:

默认情况下,spark_read_jdbc() 将整个数据库表读入 Spark。我使用以下语法来创建这些连接。

library(sparklyr)
library(dplyr)

config <- spark_config()
config$`sparklyr.shell.driver-class-path` <- "mysql-connector-java-5.1.43/mysql-connector-java-5.1.43-bin.jar"

sc <- spark_connect(master         = "local",
                    version        = "1.6.0",
                    hadoop_version = 2.4,
                    config         = config)

db_tbl <- sc %>%
  spark_read_jdbc(sc      = .,
                  name    = "table_name",  
                  options = list(url      = "jdbc:mysql://localhost:3306/schema_name",
                                 user     = "root",
                                 password = "password",
                                 dbtable  = "table_name"))

但是,我现在遇到了这样一种情况,即我在 MySQL 数据库中有一个表,我宁愿只将该表的一个子集读入 Spark。

如何让spark_read_jdbc 接受谓词?我尝试将谓词添加到选项列表但没有成功,

db_tbl <- sc %>%
  spark_read_jdbc(sc      = .,
                  name    = "table_name",  
                  options = list(url      = "jdbc:mysql://localhost:3306/schema_name",
                                 user       = "root",
                                 password   = "password",
                                 dbtable    = "table_name",
                                 predicates = "field > 1"))

【问题讨论】:

    标签: r apache-spark jdbc sparklyr


    【解决方案1】:

    您可以将dbtable 替换为查询:

    db_tbl <- sc %>%
      spark_read_jdbc(sc      = .,
                  name    = "table_name",  
                  options = list(url      = "jdbc:mysql://localhost:3306/schema_name",
                                 user     = "root",
                                 password = "password",
                                 dbtable  = "(SELECT * FROM table_name WHERE field > 1) as my_query"))
    

    但是对于像这样的简单条件,Spark 应该在您过滤时自动推送它:

    db_tbl %>% filter(field > 1)
    

    只要确保设置:

    memory = FALSE
    

    spark_read_jdbc.

    【讨论】:

      猜你喜欢
      • 2018-07-18
      • 1970-01-01
      • 1970-01-01
      • 2021-03-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多