【问题标题】:BigQuery error when using CASE statement within ON for LEFT JOIN在 ON 中使用 CASE 语句进行 LEFT JOIN 时出现 BigQuery 错误
【发布时间】:2020-05-21 19:35:27
【问题描述】:

我正在寻求一些帮助来理解我在 BigQuery 中遇到的这个错误:

LEFT OUTER JOIN 不能在没有相等条件的情况下使用 连接两边的字段。

我正在尝试使用 case 语句来根据左表行中的值更改选择用于连接的行。我在其他一些地方做类似的事情并且它有效,所以我的一部分认为我可能在表别名和列名方面犯了错误,但我无法弄清楚。这是我正在尝试做的一个最小示例:

WITH t1 AS (
  SELECT "milk" AS dairy,
   1 AS id,
   2 AS other_id

   UNION ALL

   SELECT "yogurt" AS dairy,
   3 AS id,
   4 AS other_id

   UNION ALL

   SELECT "cheese" AS dairy,
   5 AS id,
   6 AS other_id
),

t2 AS (
  SELECT "blue" AS color,
  1 AS id

  UNION ALL

  SELECT "red" AS color,
  4 AS id
)

SELECT
  t1.*, t2
FROM t1
LEFT JOIN t2 ON
  CASE
    WHEN t1.dairy = 'milk' THEN t1.id = t2.id
    WHEN t1.dairy = 'yogurt' THEN t1.other_id = t2.id
  END

我想看到的结果是:

正如您在所需结果中看到的那样,当dairy 的值为milk 时,我希望来自t2id 等于t1 中的id 列,但是当值对于dairyyogurt,我希望t2 中的id 等于other_id 中的other_idt1

我一直在寻找解释,但无法弄清楚。我还尝试了here 提供的解决方案,但得到了同样的错误,这就是为什么我认为我只是在用表名或别名弄乱了一些东西。

请帮忙!

更新

通过这种方式重写 case 语句,我能够摆脱错误:

SELECT
  t1.*, t2
FROM t1
LEFT JOIN t2 ON
  CASE
    WHEN t1.dairy = 'milk' THEN t1.id
    WHEN t1.dairy = 'yogurt' THEN t1.other_id
  END = t2.id

但是,在我真正的问题中,我需要以类似的方式加入第三张桌子。如果t2.colorblue,我想根据t2.id = t3.id加入,但是如果t2.colorred,我想根据t2.id = t3.other_id加入。一旦我这样做,就会发生同样的错误。这是我尝试的完整示例:

WITH t1 AS (
  SELECT "milk" AS dairy,
   1 AS id,
   2 AS other_id

   UNION ALL

   SELECT "yogurt" AS dairy,
   3 AS id,
   4 AS other_id

   UNION ALL

   SELECT "cheese" AS dairy,
   5 AS id,
   6 AS other_id
),

t2 AS (
  SELECT "blue" AS color,
  1 AS id

  UNION ALL

  SELECT "red" AS color,
  4 AS id
),

t3 AS (
  SELECT "sunny" AS weather,
  1 AS id,
  10 AS other_id

  UNION ALL

  SELECT "cloudy" AS weather,
  11 AS id,
  4 AS other_id
)

SELECT
  t1.*, t2, t3
FROM t1
LEFT JOIN t2 ON
  CASE
    WHEN t1.dairy = 'milk' THEN t1.id
    WHEN t1.dairy = 'yogurt' THEN t1.other_id
  END = t2.id
LEFT JOIN t3 ON
  CASE
   WHEN t2.color = 'blue' THEN t3.id
   WHEN t2.color = 'red' THEN t3.other_id
  END = t2.id

但是现在出现了同样的错误:

LEFT OUTER JOIN 不能在没有相等条件的情况下使用 连接两边的字段。

如果我删除t3 的加入,它可以正常工作。以下是表格的更多图片和所需的结果,以防有帮助:

【问题讨论】:

  • 只是一个猜测,但我认为 BQ 不喜欢条件连接,因为它无法准确估计查询的成本(特别是如果您的右侧表具有分区/集群)。下面提供的解决方案有效,但使用基本原则的另一种解决方案是对每个 ID 执行 2 次单独连接,然后在您的 select 语句中使用 case 语句。
  • @rtenha 但为什么它适用于一个条件连接?有些事情没有意义。
  • 您的带有 1 个条件连接的场景(更新版本)实际上满足了我上面的理论。 join 表示它肯定需要扫描 t1 和 t2。您在 case 语句中与 t2 的原始连接没有定义将扫描多少 t2。
  • @rtenha 但是有两个连接的那个呢?如果你的理论是正确的,那不应该也有效吗?
  • 这是同一个问题。 t3 在 case 语句中,使其成为有条件的。它无法估计需要扫描多少 t3。

标签: google-bigquery


【解决方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
SELECT *,
  ARRAY(
    SELECT AS STRUCT *  
    FROM t2 b
    WHERE b.id IN (a.id, a.other_id) 
    ORDER BY (
      CASE
        WHEN dairy IN ('milk', 'yogurt') THEN 1
        ELSE 2
      END    
    )
    LIMIT 1
  )[SAFE_OFFSET(0)] AS t2  
FROM t1 a  

如果适用于您问题中的样本/虚拟数据 - 结果是

Row dairy   id  other_id    t2.color    t2.id    
1   milk    1   2           blue        1    
2   yogurt  3   4           red         4    
3   cheese  5   6           

【讨论】:

    【解决方案2】:

    通过将联接和关联逻辑分解为单独的 CTE,我能够用 3 个表回答您更新的问题。

    WITH t1 AS (
      SELECT "milk" AS dairy, 1 AS id, 2 AS other_id UNION ALL
      SELECT "yogurt", 3, 4 UNION ALL
      SELECT "cheese", 5, 6
    ),
    t2 AS (
      SELECT "blue" AS color, 1 AS id UNION ALL
      SELECT "red", 4
    ),
    t3 AS (
      SELECT "sunny" AS weather, 1 as id, 10 as other_id UNION ALL
      SELECT "cloudy", 11, 4
    ),
    join_t1_t2 as (
      select
        t1.*,
        case 
          when t1.dairy = 'milk' then milk.color
          when t1.dairy = 'yogurt' then yogurt.color
          else null
        end as t2_color,
        case 
          when t1.dairy = 'milk' then milk.id
          when t1.dairy = 'yogurt' then yogurt.id
          else null
        end as t2_id
      from t1
      left join t2 milk on t1.id = milk.id
      left join t2 yogurt on t1.other_id = yogurt.id
    ),
    join_t1_t2_t3 as (
      select
        join_t1_t2.*,
        case 
          when t2_color = 'blue' then blue.id
          when t2_color = 'red' then red.id
          else null
        end as t3_id,
        case 
          when t2_color = 'blue' then blue.other_id
          when t2_color = 'red' then red.other_id
          else null
        end as t3_other_id,
        case 
          when t2_color = 'blue' then blue.weather
          when t2_color = 'red' then red.weather
          else null
        end as t3_weather,
      from join_t1_t2
      left join t3 blue on t2_id = blue.id
      left join t3 red on t2_id = red.other_id
    )
    select * from join_t1_t2_t3
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-11
      • 2018-12-28
      • 2016-01-11
      相关资源
      最近更新 更多