【发布时间】:2018-05-22 22:11:45
【问题描述】:
我的架构:
- 1 个带有 8 个分区和 2 个 TPU 的 EventHub
- 1 个流分析作业
- 基于相同输入的 6 个窗口(从 1mn 到 6mn)
样本数据:
{side: 'BUY', ticker: 'MSFT', qty: 1, price: 123, tradeTimestamp: 10000000000}
{side: 'SELL', ticker: 'MSFT', qty: 1, price: 124, tradeTimestamp:1000000000}
EventHub PartitionKey 是 ticker
我想每秒发出以下数据:
(Total quantity bought / Total quantity sold) in the last minute, last 2mn, last 3mn and more
我尝试了什么:
WITH TradesWindow AS (
SELECT
windowEnd = System.Timestamp,
ticker,
side,
totalQty = SUM(qty)
FROM [Trades-Stream] TIMESTAMP BY tradeTimestamp PARTITION BY PartitionId
GROUP BY ticker, side, PartitionId, HoppingWindow(second, 60, 1)
),
TradesRatio1MN AS (
SELECT
ticker = b.ticker,
buySellRatio = b.totalQty / s.totalQty
FROM TradesWindow b /* SHOULD I PARTITION HERE TOO ? */
JOIN TradesWindow s /* SHOULD I PARTITION HERE TOO ? */
ON s.ticker = b.ticker AND s.side = 'SELL'
AND DATEDIFF(second, b, s) BETWEEN 0 AND 1
WHERE b.side = 'BUY'
)
/* .... More windows.... */
/* FINAL OUTPUT: Joining all the windows */
SELECT
buySellRatio1MN = bs1.buySellRatio,
buySellRatio2MN = bs2.buySellRatio
/* more windows */
INTO [output]
FROM buySellRatio1MN bs1 /* SHOULD I PARTITION HERE TOO ? */
JOIN buySellRatio2MN bs2 /* SHOULD I PARTITION HERE TOO ? */
ON bs2.ticker = bs1.ticker
AND DATEDIFF(second, bs1, bs2) BETWEEN 0 AND 1
问题:
- 这需要6个EventHub消费者组(每个只能有5个读者),为什么?我的输入没有 5x6 SELECT 语句,那为什么?
- 输出似乎不一致(我不知道我的 JOIN 是否正确)。
- 有时作业根本不输出(可能是一些分区问题?请参阅代码中关于分区的 cmets)
简而言之,有没有更好的方法来实现这一点?我在文档和示例中找不到任何关于拥有多个窗口并加入它们然后仅从 1 个输入加入先前加入的结果的示例。
【问题讨论】:
标签: azure azure-eventhub azure-stream-analytics complex-event-processing