【问题标题】:Remove duplicates based on Timestamp Kusto Query根据时间戳 Kusto Query 删除重复项
【发布时间】:2020-11-24 23:37:59
【问题描述】:

我在 Kusto 有两张像下面这样的桌子。我正在尝试根据名称/用户名加入表,但即使第一个表不匹配,也保留第二个表中的行,并且如果用户名和电子邮件是,还根据时间戳从第二个表中删除重复项相同(在这种情况下,我会保留最新的信息——最新的时间戳)

Table 1

Name | pets | color | city
A    | A1   | blue  | NYC
A    | A2   | blue  | NYC
A    | A3   | blue  | NYC
B    | B1   | red   | Boston
C    | C1   | yellow| Miami
C    | C2   | yellow| Miami

Table 2

username | email          | school   | timestamp
A        | a@whatever.com | schoolA  | 10pm
B        | b@whatever.com | schoolB1 | 10pm
B        | b@whatever.com | schoolB2 | 11pm
C        | c@whatever.com | schoolC  | 9pm
D        | d@whatever.com | schoolD  | 11pm
E        | e@whatever.com | schoolE  | 10pm

Table results I want

name | pets | color  | city  | email          | school   | timestamp
A    | A1   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
A    | A2   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
A    | A3   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
B    | B1   | red    | Boston| b@whatever.com | schoolB2 | 11pm
C    | C1   | yellow | Miami | c@whatever.com | schoolC  | 9pm
C    | C2   | yellow | Miami | c@whatever.com | schoolC  | 9pm
D    |      |        |       | d@whatever.com | schoolD  | 11pm
E    |      |        |       | e@whatever.com | schoolE  | 10pm

【问题讨论】:

    标签: database relational-database azure-data-explorer kql


    【解决方案1】:

    如果我理解正确,下面的查询可以工作。

    它使用:

    • arg_max() (aggregation function)“如果用户名和电子邮件相同,则根据时间戳从第二个表中删除重复项(在这种情况下,我将保留最新的信息——最新的时间戳)”李>
    • Right outer-join flavor: “即使第一个表没有匹配项,也要保留第二个表中的行”
    let T1 = datatable(name:string, pets:string, color:string, city:string)
    [
        "A", "A1", "blue",   "NYC",
        "A", "A2", "blue",   "NYC",
        "A", "A3", "blue",   "NYC",
        "B", "B1", "red ",   "Boston",
        "C", "C1", "yellow", "Miami",
        "C", "C2", "yellow", "Miami",
    ]
    ;
    let T2 = datatable(username:string, email:string, school:string, timestamp:datetime)
    [
        "A", "a@whatever.com", "schoolA",  datetime(2020-11-24 22:00),
        "B", "b@whatever.com", "schoolB1", datetime(2020-11-24 22:00),
        "B", "b@whatever.com", "schoolB2", datetime(2020-11-24 23:00),
        "C", "c@whatever.com", "schoolC",  datetime(2020-11-24 21:00),
        "D", "d@whatever.com", "schoolD",  datetime(2020-11-24 23:00),
        "E", "e@whatever.com", "schoolE",  datetime(2020-11-24 22:00),
    ]
    ;
    T1
    | join kind=rightouter (
        T2
        | summarize arg_max(timestamp, *) by username, email
    ) on $left.name == $right.username
    | project name = username, pets, color, city, email, school, timestamp
    | order by name asc, pets asc
    
    | name | pets | color  | city   | email          | school   | timestamp                   |
    |------|------|--------|--------|----------------|----------|-----------------------------|
    | A    | A1   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
    | A    | A2   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
    | A    | A3   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
    | B    | B1   | red    | Boston | b@whatever.com | schoolB2 | 2020-11-24 23:00:00.0000000 |
    | C    | C1   | yellow | Miami  | c@whatever.com | schoolC  | 2020-11-24 21:00:00.0000000 |
    | C    | C2   | yellow | Miami  | c@whatever.com | schoolC  | 2020-11-24 21:00:00.0000000 |
    | D    |      |        |        | d@whatever.com | schoolD  | 2020-11-24 23:00:00.0000000 |
    | E    |      |        |        | e@whatever.com | schoolE  | 2020-11-24 22:00:00.0000000 |
    

    【讨论】:

      猜你喜欢
      • 2020-11-03
      • 1970-01-01
      • 2022-10-14
      • 1970-01-01
      • 2021-11-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-18
      相关资源
      最近更新 更多