【问题标题】:join columns separated by delimiter in same table在同一个表中连接由分隔符分隔的列
【发布时间】:2020-08-22 17:09:49
【问题描述】:

我有以下数据集

color_code   fav_color_code    color_code_name    fav_color_name 
1|2          5                 blue|white         black
3|4          7|9               green|red          pink|yellow

我需要将join 的第一个值color_code 转换为color_code_name 的第一个值,将color_code 的第二个值转换为color_code_name 的第二个值等等。

code                color
1                   blue
2                   white
5                   black
3                   green
4                   red
7                   pink
9                   yellow

我正在使用下面的代码,但它正在交叉连接,因为我没有 idjoin。如果我映射 2 列而不是多列,则此代码有效。

有人可以帮我得到预期的结果吗?

SELECT 
        t1.code AS code, 
        t2.color AS color, 
      FROM 
        (
          SELECT 
            c.value :: varchar AS code, 
            row_number() over(
              order by 
                code
            ) AS rownum 
          FROM 
            table, 
            lateral flatten (
              input => split(color_code, '|')
            ) c  
          UNION 
          SELECT 
            d.value :: varchar AS code, 
            row_number() OVER(
              ORDER BY 
                code
            ) AS rownum
            FROM 
            table, 
            lateral flatten (
              input => split(fav_color_code, '|')
            ) d 
        ) t1 
        JOIN (
          SELECT 
            f.value :: varchar AS color, 
            row_number() OVER(
              ORDER BY 
                color
            ) AS rownum 
          FROM 
            table, 
            lateral flatten (
              input => split(color_code_name, '|')
            ) f 
          UNION 
          SELECT 
            g.value :: varchar AS color, 
            row_number() OVER(
              ORDER BY 
                color
            ) AS rownum 
          FROM 
            table, 
            lateral flatten (
              input => split(fav_color_name, '|')
            ) g 
        ) t2 ON (t1.rownum = t2.rownum) 
      ORDER BY 
        t1.color

【问题讨论】:

    标签: sql join union comma


    【解决方案1】:

    出于解释的目的,您可以分几个步骤遵循这种方法,因为我认为一个步骤是一团糟。

    注意: 代码解决方案是在Hive中完成的(post没有指定任何sql-query-engine),但是在Hive中几乎所有东西都是sql-standard

    原始数据

    +--------------------+------------------------+-------------------------+------------------------+--+
    | colors.color_code  | colors.fav_color_code  | colors.color_code_name  | colors.fav_color_name  |
    +--------------------+------------------------+-------------------------+------------------------+--+
    | 1|2                | 5                      | blue|white              | black                  |
    | 3|4                | 7|9                    | green|red               | pink|yellow            |
    +--------------------+------------------------+-------------------------+------------------------+--
    

    首先,我们创建一个temp tablecolor ids,其中我们将code columnssplit 列连接成一个array,然后explode array 和一个rownumber

    CREATE TABLE tc1 AS
    SELECT ROW_NUMBER() OVER() AS rownum, CAST(color_id AS INT) as color_id
    FROM colors
    LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code,'|', fav_color_code),'\\|')) a1 AS color_id;
    

    我们使用color names 创建第二个temp table,并按照以前的方法,但现在我们将color_name 列、split 列连接成array,然后将explode 连接到array带有rownumber

    CREATE TABLE tc2 AS
    SELECT ROW_NUMBER() OVER() AS rownum, color_name
    FROM colors
    LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code_name,'|', fav_color_name),'\\|')) a1 AS color_name;
    

    我们 joinrownum 的临时表

    SELECT color_id, color_name
    FROM tc1
    JOIN tc2 ON(tc1.rownum = tc2.rownum)
    ORDER BY color_id;
    

    预期输出

    +-----------+-------------+--+
    | color_id  | color_name  |
    +-----------+-------------+--+
    | 1         | blue        |
    | 2         | white       |
    | 3         | green       |
    | 4         | red         |
    | 5         | black       |
    | 7         | pink        |
    | 9         | yellow      |
    +-----------+-------------+--+
    

    一次性做同样的事情,虽然不是一个轻量级的查询

    SELECT tc1.color_id, tc2.color_name
    FROM (SELECT ROW_NUMBER() OVER() AS rownum, CAST(color_id AS INT) as color_id
          FROM colors
          LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code,'|', fav_color_code),'\\|')) a1 AS color_id) AS tc1
    JOIN (SELECT ROW_NUMBER() OVER() AS rownum, color_name
          FROM colors
          LATERAL VIEW EXPLODE(SPLIT(CONCAT(color_code_name,'|', fav_color_name),'\\|')) a1 AS color_name) AS tc2
    ON(tc1.rownum = tc2.rownum)
    ORDER BY tc1.color_id;
    

    预期输出

    +---------------+-----------------+--+
    | tc1.color_id  | tc2.color_name  |
    +---------------+-----------------+--+
    | 1             | blue            |
    | 2             | white           |
    | 3             | green           |
    | 4             | red             |
    | 5             | black           |
    | 7             | pink            |
    | 9             | yellow          |
    +---------------+-----------------+--+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2023-03-13
      • 2021-02-28
      • 1970-01-01
      • 1970-01-01
      • 2017-09-09
      • 1970-01-01
      • 2013-06-07
      • 2013-06-24
      相关资源
      最近更新 更多