【发布时间】:2015-05-18 19:26:21
【问题描述】:
我正在尝试使用 Spark SQL 按日期范围查询表。例如,我正在尝试运行如下 SQL 语句:SELECT * FROM trip WHERE utc_startdate >= '2015-01-01' AND utc_startdate
SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("SparkTest")
.set("spark.executor.memory", "1g")
.set("spark.cassandra.connection.host", "localhost")
.set("spark.cassandra.connection.native.port", "9042")
.set("spark.cassandra.connection.rpc.port", "9160");
JavaSparkContext context = new JavaSparkContext(sparkConf);
JavaCassandraSQLContext sqlContext = new JavaCassandraSQLContext(context);
sqlContext.sqlContext().setKeyspace("mykeyspace");
String sql = "SELECT * FROM trip WHERE utc_startdate >= '2015-01-01' AND utc_startdate < '2015-12-31' AND deployment_id = 1 AND device_id = 1";
JavaSchemaRDD rdd = sqlContext.sql(sql);
List<Row> rows = rdd.collect(); // rows.size() is zero when I would expect it to contain numerous rows.
架构:
CREATE TABLE trip (
device_id bigint,
deployment_id bigint,
utc_startdate timestamp,
other columns....
PRIMARY KEY ((device_id, deployment_id), utc_startdate)
) WITH CLUSTERING ORDER BY (utc_startdate ASC);
任何帮助将不胜感激。
【问题讨论】:
标签: cassandra apache-spark datastax apache-spark-sql