在 redshift 中撤消 LISTAGG答案

【问题标题】：Undo a LISTAGG in redshift在 redshift 中撤消 LISTAGG
【发布时间】：2015-12-23 13:29:23
【问题描述】：

我有一个可能来自 listagg 的表，类似于：

# select * from s;
     s     
-----------
 a,c,b,d,a
 b,e,c,d,f
(2 rows)

如何将其更改为这组行：

a
c
b
d
a
b
e
c
d
f

【问题讨论】：

见stackoverflow.com/questions/25112389/…

标签： sql amazon-redshift

【解决方案1】：

在 redshift 中，您可以连接一个数字表，并将其用作拆分索引：

--with recursive Numbers as (
--  select 1 as i
--  union all
--  select i + 1 as i from Numbers where i <= 5
--)
with Numbers(i) as (
  select 1 union
  select 2 union
  select 3 union
  select 4 union
  select 5 
)
select split_part(s,',', i) from Numbers, s ORDER by s,i;

编辑：redshift 似乎不支持递归子查询，只支持 postgres。 :(

【讨论】：

Redshift 似乎不支持递归子查询

【解决方案2】：

SQL Fiddle

Oracle 11g R2 架构设置：

create table s(
  col varchar2(20) );

insert into s values('a,c,b,d,a');
insert into s values('b,e,c,d,f');

查询 1：

SELECT  REGEXP_SUBSTR(t1.col, '([^,])+', 1, t2.COLUMN_VALUE )
FROM s t1 CROSS JOIN
TABLE
(
  CAST
  (
    MULTISET
    (
      SELECT LEVEL
      FROM DUAL 
      CONNECT BY LEVEL <= REGEXP_COUNT(t1.col, '([^,])+')
    )
    AS SYS.odciNumberList
 )
) t2

Results：

| REGEXP_SUBSTR(T1.COL,'([^,])+',1,T2.COLUMN_VALUE) |
|---------------------------------------------------|
|                                                 a |
|                                                 c |
|                                                 b |
|                                                 d |
|                                                 a |
|                                                 b |
|                                                 e |
|                                                 c |
|                                                 d |
|                                                 f |

【讨论】：

【解决方案3】：

由于这被标记为 Redshift，并且到目前为止没有任何答案可以完整地概述在 Redshift 中正确撤消 LISTAGG，以下是解决其所有用例的代码：

CREATE TEMPORARY TABLE s (
  s varchar(255) 
);

INSERT INTO s VALUES('a,c,b,d,a');
INSERT INTO s VALUES('b,e,c,d,f');

SELECT 
      TRIM(split_part(s.s,',',R::smallint)) AS s 
FROM s
LEFT JOIN (
SELECT 
      ROW_NUMBER() OVER (PARTITION BY 1) AS R
   FROM any_large_table
   LIMIT 1000
) extend_number 
ON (SELECT MAX(regexp_count(s.s,',')+1) FROM s) >= extend_number.R 
AND NULLIF(TRIM(split_part(s.s,',',extend_number.R::smallint)),'') IS NOT NULL;

DROP TABLE s;

其中“any_large_table”是您在 redshift 中已有的任何表，它有足够的记录用于您的目的，具体取决于每条记录列表将包含的元素数量（即在上述情况下，我确保它最多为一个 -万条记录）。不幸的是，据我所知，generate_series 函数在 Redshift 中无法正常工作，这是唯一的方法。

另一个建议是检查是否可以在它们已经 list_agg 之前尽可能地获取这些值。从上面的代码可以看出，它看起来很复杂，如果你保持简单（即只要有机会），就可以节省大量的代码维护时间。

【讨论】：