【问题标题】:Join tables based on geometry and nearest timestamp BigQuery SQl基于几何和最近的时间戳 BigQuery SQl 连接表
【发布时间】:2020-06-23 14:06:29
【问题描述】:

我有两张桌子; df1 包含 Date1(时间戳)和 PolygonWKT(几何),df2 包含 Date2(时间戳)和 PointWKT(几何)。我根据几何加入了df1和df2,所以每个PointWKT都属于对应的PolygonWKT。问题是,Date1 和 Date2e 列搞砸了,我还需要匹配 Date1 和 Date2。 我想加入基于几何的表格以及Date1和Date2之间最接近的时间戳匹配。

 df2
 | PointWKT  |      Date2            |     
 --------------------------------------
 |    b    | 2020-05-05 12:00:00 UTC |
 |    b    | 2020-05-05 12:00:10 UTC |
 |    b    | 2020-05-05 12:00:20 UTC |
 |    b    | 2020-05-05 12:17:00 UTC |
 |    c    | 2020-05-06 18:00:00 UTC |


df1
 | PolygonWKT  |      Date1          |
--------------------------------------
 |    A    | 2020-05-03 9:00:00 UTC  |
 |    A    | 2020-05-03 9:30:10 UTC  |
 |    B    | 2020-05-05 12:05:00 UTC |
 |    B    | 2020-05-05 12:25:00 UTC |
 |    C    | 2020-05-06 18:05:00 UTC |

代码的第一部分是正确的,但第二部分没有返回我想要的:

    SELECT *
    FROM `xxx.yyy.df1` as df1 ,
    `xxx.yyy.df2` as df2
    WHERE ST_Contains (df1.PolygonWKT, df2.PointWKT) 


    AND ( 
    df2.Date2 BETWEEN df1.Date1 AND TIMESTAMP_ADD(df1.Date1, INTERVAL 10 MINUTE)



    
desired df

| PointWKT  |      Date2            || PolygonWKT  |      Date1          |     
----------------------------------------------------------------------------
 |    b    | 2020-05-05 12:00:00 UTC | |    B    | 2020-05-05 12:05:00 UTC |
 |    b    | 2020-05-05 12:00:10 UTC | |    B    | 2020-05-05 12:05:00 UTC |
 |    b    | 2020-05-05 12:00:20 UTC | |    B    | 2020-05-05 12:05:00 UTC |
 |    b    | 2020-05-05 12:17:00 UTC | |    B    | 2020-05-05 12:25:00 UTC |
 |    c    | 2020-05-06 18:00:00 UTC | |    C    | 2020-05-06 18:05:00 UTC |

这样做的正确方法是什么?

【问题讨论】:

    标签: sql google-bigquery


    【解决方案1】:

    我想基于几何以及Date1 和Date2 之间最接近的时间戳匹配加入表。

    以下是 BigQuery 标准 SQL

    SELECT  
      ARRAY_AGG(STRUCT(df2.PointWKT, df2.Date2, df1.PolygonWKT, df1.Date1) 
        ORDER BY ABS(TIMESTAMP_DIFF(df2.Date2, df1.Date1, SECOND)) 
        LIMIT 1)[OFFSET(0)].*
    FROM `xxx.yyy.df1` AS df1 ,
      `xxx.yyy.df2` AS df2
    WHERE ST_CONTAINS(df1.PolygonWKT, df2.PointWKT) 
    GROUP BY TO_JSON_STRING(STRUCT(df2.PointWKT, df2.Date2))
    

    如果适用于与您的示例中类似的示例数据 -

    WITH `xxx.yyy.df1` AS (
      SELECT ST_GEOGPOINT(1,2) PolygonWKT, TIMESTAMP '2020-05-03 9:00:00 UTC' Date1 UNION ALL
      SELECT ST_GEOGPOINT(1,2), '2020-05-03 9:30:10 UTC' UNION ALL
      SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:05:00 UTC' UNION ALL
      SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:25:00 UTC' UNION ALL
      SELECT ST_GEOGPOINT(1,4), '2020-05-06 18:05:00 UTC' 
    ), `xxx.yyy.df2` AS (
      SELECT ST_GEOGPOINT(1,3) PointWKT, TIMESTAMP '2020-05-05 12:00:00 UTC' Date2 UNION ALL
      SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:00:10 UTC' UNION ALL
      SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:00:20 UTC' UNION ALL
      SELECT ST_GEOGPOINT(1,3), '2020-05-05 12:17:00 UTC' UNION ALL /* this value adjusted based on exapected result sample - as it looks as a typo */
      SELECT ST_GEOGPOINT(1,4), '2020-05-06 18:00:00 UTC' 
    )
    

    输出是

    Row PointWKT    Date2                   PolygonWKT  Date1    
    1   POINT(1 3)  2020-05-05 12:00:00 UTC POINT(1 3)  2020-05-05 12:05:00 UTC  
    2   POINT(1 3)  2020-05-05 12:00:10 UTC POINT(1 3)  2020-05-05 12:05:00 UTC  
    3   POINT(1 3)  2020-05-05 12:00:20 UTC POINT(1 3)  2020-05-05 12:05:00 UTC  
    4   POINT(1 3)  2020-05-05 12:17:00 UTC POINT(1 3)  2020-05-05 12:25:00 UTC  
    5   POINT(1 4)  2020-05-06 18:00:00 UTC POINT(1 4)  2020-05-06 18:05:00 UTC 
    

    【讨论】:

    • 非常感谢您!我在 GROUP BY 行中收到错误消息:[6:10] 不允许按 GEOGRAPHY 类型的表达式分组。有什么建议如何处理吗?
    • 查看对地理类型进行更正处理的更新。原来的例子是字符串 - 所以我错过了这部分。现在修复了
    • 太棒了!完美运行,非常感谢!
    【解决方案2】:

    根据您的示例数据,您以错误的顺序提取日期。这符合您的要求吗?

    df2.Date1 BETWEEN df2.Date1 AND TIMESTAMP_ADD(df2.Date1, INTERVAL 10 MINUTE)
    

    【讨论】:

    • 我认为这一行的整个概念对于我的目的来说是错误的。我认为米哈伊尔的回答会有所帮助。
    猜你喜欢
    • 2013-03-31
    • 2020-01-29
    • 2020-05-26
    • 1970-01-01
    • 2016-03-18
    • 1970-01-01
    • 2015-03-14
    • 1970-01-01
    相关资源
    最近更新 更多