处理 Kusto 中的重复数据答案

【问题标题】：Handle duplicate data in Kusto处理 Kusto 中的重复数据
【发布时间】：2021-09-11 20:40:38
【问题描述】：

问题描述：

当 source_table 接收到数据时，kusto 中的更新策略将运行以将数据存储在 end_point_table 中。更新函数应处理重复数据并将新数据仅存储在 end_point_table 中。这意味着如果我们从 source_table 获得的数据与我们在 end_point_table 中的数据相同，则不会存储任何数据。

我做了什么：

end_point_table 已经有数据了

.ingest inline into table end_point_table <|
1,2020-01-01T12:00:00Z,property,128

我有一个名为 source_table 的源表，我将数据提取到其中，如下所示

.ingest inline into table source_table <|
1,2020-01-01T12:00:00Z,128

.ingest inline into table source_table <|
1,2020-01-01T12:00:00Z,property,128

下面的函数会自动触发

let _incoming =(
            source_table
            | where property == "property"
            | project device_id, timestamp, value
            | distinct *
        );
let _old_data = (
            end_point_table
        );

_incoming
        | join kind = leftouter(
            _old_data
            | summarize arg_max(timestamp, *) by device_id
        ) on device_id
        | where ( 
            timestamp != timestamp1
            or value != value1
        )
        | project device_id, timestamp, value

结果：当我在摄取后查询数据时，我得到了三行而不是像这样的一行

1,2020-01-01T12:00:00Z,property,128
1,2020-01-01T12:00:00Z,property,128
1,2020-01-01T12:00:00Z,property,128

问题是：

是否有任何解决方案可以避免摄取 end_point_table 中的重复数据。还是我错误地使用了更新策略

【问题讨论】：

标签： azure azure-data-explorer

【解决方案1】：

更新政策不是解决这个问题的正确方法。

有多种正确的方法来处理重复数据删除，请在此处阅读它们：

https://docs.microsoft.com/en-us/azure/data-explorer/dealing-with-duplicates

【讨论】：

当然，我在发布问题之前检查了文档。
您能否解释一下为什么那里描述的方法都不适合您，而您正在寻找另一种解决方案？