【问题标题】:Spark-Phoenix connection, issue with sql query having filter on date columnSpark-Phoenix 连接,sql 查询在日期列上有过滤器的问题
【发布时间】:2018-01-23 11:46:57
【问题描述】:

根据凤凰官网的建议,我已从 spark 连接到凤凰。简单的选择查询可以正常工作,但是当我尝试在日期列上使用过滤器运行查询时,会出现一些错误。

这里是示例代码

Map<String, String> map = new HashMap<>();
map.put("zkUrl", ZOOKEEPER_URL);
map.put("table", "TABLE_1");
Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map);
df.registerTempTable("TABLE_1");
// This query works without any error
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT COUNT(1) AS ROW_COUNT 
FROM TABLE_1 WHERE TEXT_COLUMN_1 = 'ABC' ");

但是当我在日期列上使用过滤器运行查询时,它给了我错误

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN to_date('2015-01-02') AND to_date('2016-12-30') ");

尝试了许多不同的方法来提供如下所述的日期格式

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN cast('2015-01-02' as date) AND cast('2015-01-02' as date) ");

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 <= cast('2015-01-02' as date) AND DATE_COLUMN_1 >= cast('2015-01-02' as date) ");

错误信息:-

18/01/23 17:05:26 INFO PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=1, selectColumnList=DATE_COLUMN_1 
18/01/23 17:05:26 INFO PhoenixInputFormat: Select Statement: SELECT "DATE_COLUMN_1" FROM TABLE_1 WHERE ( DATE_COLUMN_1 IS NOT NULL AND DATE_COLUMN_1 >= 2015-01-02 AND DATE_COLUMN_1 <= 2016-12-30)
18/01/23 17:05:26 ERROR PhoenixInputFormat: Failed to get the query plan with error [ERROR 203 (22005): Type mismatch. DATE and BIGINT for DATE_COLUMN_1 >= 2012]
Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#1671L])
   +- *Project
      +- *Filter ((isnotnull(DATE_COLUMN_1#997) && (DATE_COLUMN_1#997 >= 16437)) && (DATE_COLUMN_1#997 <= 17165))
         +- *Scan PhoenixRelation(TABLE_1,localhost:2181,false) [DATE_COLUMN_1#997] PushedFilters: [IsNotNull(DATE_COLUMN_1), GreaterThanOrEqual(DATE_COLUMN_1,2015-01-02), LessThanOrEqual(TR..., ReadSchema: struct<>

【问题讨论】:

    标签: java apache-spark hbase apache-spark-sql phoenix


    【解决方案1】:

    在我删除 to_date() 函数后它的工作。

    Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
    DATE_COLUMN_1 BETWEEN '2015-01-02' AND '2016-12-30' ");
    

    【讨论】:

      猜你喜欢
      • 2015-05-18
      • 2021-12-25
      • 2023-02-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-11-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多