【发布时间】:2018-08-02 04:17:46
【问题描述】:
Spark 结构化流 (2.3) 中是否允许连接来自同一输入流数据集的两个流?
例如在下面的示例查询中,连接了两个流。 我在 Azure eventthub spark 客户端中收到 IllegalStateException。
这会起作用吗?
eventhubs = spark.readStream ... .createOrReplaceTempView("Input")
spark.sql("SELECT temperature, time, device, category FROM Input").createOrReplaceTempView("devices1")
spark.sql("SELECT temperature, time, device, category FROM Input").createOrReplaceTempView("devices2")
val d1 = spark.sql("SELECT * FROM devices1 WHERE device=0")
val d2 = spark.sql("SELECT * FROM devices2 WHERE device=1")
val output = d1.join(
d2,
expr("""
devices1.category = devices2.category AND
devices1.time >= devices2.time AND
devices1.time <= devices2.time + interval 1 seconds
"""),
joinType = "inner"
)
display(output)
【问题讨论】:
标签: apache-spark apache-spark-sql spark-streaming