【问题标题】:How can I get a slice of an array in BigQuery Standard SQL?如何在 BigQuery 标准 SQL 中获取数组的切片?
【发布时间】:2019-02-22 21:06:38
【问题描述】:

在 BigQuery 中,我有一个带有 path 列的表,如下所示:

ID .     | Path
---------+----------------------------------------
1        | foo/bar/baz
2        | foo/bar/quux/blat

我希望能够在正斜杠 (/) 上拆分路径并选择一个或多个路径部分,然后重新加入它们。

在 PostgreSQL 中,这很容易:

select array_to_string((regexp_split_to_array(path, '/'))[1:3], '/')

但 BigQuery 似乎没有任何范围偏移或数组切片功能。

【问题讨论】:

    标签: google-bigquery


    【解决方案1】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    SELECT id, path,
      (
        SELECT STRING_AGG(part, '/' ORDER BY index) 
        FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index 
        WHERE index BETWEEN 1 AND 3
      ) adjusted_path
    FROM `project.dataset.table`  
    

    您可以使用您问题中的示例数据进行测试,如以下示例所示

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
      SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2' 
    )
    SELECT id, path,
      (
        SELECT STRING_AGG(part, '/' ORDER BY index) 
        FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index 
        WHERE index BETWEEN 1 AND 3
      ) adjusted_path
    FROM `project.dataset.table`   
    

    结果

    Row     id      path                                        adjusted_path    
    1       1       foo/bar/baz/foo1/bar1/baz1/                 bar/baz/foo1     
    2       2       foo/bar/quux/blat/foo2/bar2/quux2/blat2     bar/quux/blat    
    

    如果出于某种原因您想让您的查询“内联/相似”到您在 PostgreSQL 中使用的内容 (array_to_string((regexp_split_to_array(path, '/'))[1:3], '/')) - 你可以引入 SQL UDF(我们将其命名为ARRAY_SLICE),如下例所示

    #standardSQL
    CREATE temp  FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64) 
    RETURNS ARRAY<STRING> AS (
      ARRAY(
        SELECT part FROM UNNEST(arr) part WITH OFFSET index 
        WHERE index BETWEEN start AND finish ORDER BY index
      )
    );
    SELECT id, path, 
      ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
    FROM `project.dataset.table`  
    

    显然,如果应用于相同的样本数据 - 你会得到相同的结果

    #standardSQL
    CREATE temp  FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64) 
    RETURNS ARRAY<STRING> AS (
      ARRAY(
        SELECT part FROM UNNEST(arr) part WITH OFFSET index 
        WHERE index BETWEEN start AND finish ORDER BY index
      )
    );
    WITH `project.dataset.table` AS (
      SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
      SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2' 
    )
    SELECT id, path, 
      ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
    FROM `project.dataset.table`   
    
    Row     id      path                                        adjusted_path    
    1       1       foo/bar/baz/foo1/bar1/baz1/                 bar/baz/foo1     
    2       2       foo/bar/quux/blat/foo2/bar2/quux2/blat2     bar/quux/blat    
    

    【讨论】:

    • 完美。谢谢!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-11
    • 1970-01-01
    • 2018-05-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多