【发布时间】:2018-12-05 16:03:30
【问题描述】:
我想从 kafka 主题中读取数据,并创建 spark tempview 以按某些列分组?
+----+--------------------+
| key| value|
+----+--------------------+
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|
|null|{"e":"trade","E":...|
但我无法从 tempview 聚合数据?? value 列数据存储为 String???
Dataset<Row> data = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092,localhost:9093")
.option("subscribe", "data2-topic")
.option("startingOffsets", "latest")
.option ("group.id", "test")
.option("enable.auto.commit", "true")
.option("auto.commit.interval.ms", "1000")
.load();
data.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)");
data.createOrReplaceTempView("Tempdata");
data.show();
Dataset<Row> df2=spark.sql("SELECT e FROM Tempdata group by e");
df2.show();
【问题讨论】:
标签: apache-spark apache-kafka dataset