【问题标题】:LISTAGG in Oracle to return distinct valuesOracle中的LISTAGG返回不同的值
【发布时间】:2012-07-15 16:30:45
【问题描述】:

我正在尝试在 Oracle 中使用 LISTAGG 函数。我只想获取该列的不同值。有没有一种方法可以在不创建函数或过程的情况下只获取不同的值?

col1 col2 Created_by 1 2 史密斯 1 2 约翰 1 3 阿杰 1 4 公羊 1 5 杰克

我需要选择 col1 和 col2 的 LISTAGG(不考虑第 3 列)。当我这样做时,LISTAGG 的结果是这样的:[2,2,3,4,5]

我需要在这里删除重复的“2”;我只需要 col2 与 col1 的不同值。

【问题讨论】:

  • 你能从样本中显示预期的输出(行)吗?如果 col1 有多个值,您想查看什么?
  • LISTAGG 的预期输出是 [2,3,4,5]。第二个“2”应该被删除。我的表有 1000 多行。
  • 如果 col1 有多个值,您想查看什么?
  • 代码是这样的:- SELECT col1 ,LISTAGG(col2, ',') within group (order by col2) FROM table T WHERE.... 所以,它应该显示所有不同的值col2 对应 col1 的,用逗号分隔。

标签: sql oracle aggregate-functions listagg


【解决方案1】:

从 oracle 19C 开始,它是内置的,请参阅here

从 18C 及更早的组内尝试请参阅here

否则使用正则表达式

以下是解决问题的方法。

select  
      regexp_replace(
    '2,2,2.1,3,3,3,3,4,4' 
     ,'([^,]+)(,\1)*(,|$)', '\1\3')

from dual

返回

2,2.1,3,4

答案如下:

select col1, 

regexp_replace(
    listagg(
     col2 , ',') within group (order by col2)  -- sorted
    ,'([^,]+)(,\1)*(,|$)', '\1\3') )
   from tableX
where rn = 1
group by col1; 

注意:以上方法在大多数情况下都有效 - 列表应排序,您可能需要根据您的数据修剪所有尾随和前导空格。

如果您在一个组中有很多项目 > 20 或较大的字符串大小,您可能会遇到 oracle 字符串大小限制“字符串连接的结果太长”。

从 oracle 12cR2 开始,您可以抑制此错误,请参阅 here。或者,为每个组中的成员设置一个最大数量。这仅在可以仅列出第一个成员的情况下才有效。如果您有很长的变量字符串,这可能不起作用。您将不得不进行实验。

select col1,

case 
    when count(col2) < 100 then 
       regexp_replace(
        listagg(col2, ',') within group (order by col2)
        ,'([^,]+)(,\1)*(,|$)', '\1\3')
 
    else
    'Too many entries to list...'
end
    
from sometable
where rn = 1
group by col1;

希望避免 oracle 字符串大小限制的另一种解决方案(不是那么简单) - 字符串大小限制为 4000。感谢hereuser3465996 的这篇帖子

select col1  ,
    dbms_xmlgen.convert(  -- HTML decode
    dbms_lob.substr( -- limit size to 4000 chars
    ltrim( -- remove leading commas
    REGEXP_REPLACE(REPLACE(
         REPLACE(
           XMLAGG(
             XMLELEMENT("A",col2 )
               ORDER BY col2).getClobVal(),
             '<A>',','),
             '</A>',''),'([^,]+)(,\1)*(,|$)', '\1\3'),
                  ','), -- remove leading XML commas ltrim
                      4000,1) -- limit to 4000 string size
                      , 1)  -- HTML.decode
                       as col2
 from sometable
where rn = 1
group by col1;

V1 - 一些测试用例 - 仅供参考

regexp_replace('2,2,2.1,3,3,4,4','([^,]+)(,\1)+', '\1')
-> 2.1,3,4 Fail
regexp_replace('2 ,2 ,2.1,3 ,3 ,4 ,4 ','([^,]+)(,\1)+', '\1')
-> 2 ,2.1,3,4 Success  - fixed length items

V2 - 项目中包含的项目,例如。 2,21

regexp_replace('2.1,1','([^,]+)(,\1)+', '\1')
-> 2.1 Fail
regexp_replace('2 ,2 ,2.1,1 ,3 ,4 ,4 ','(^|,)(.+)(,\2)+', '\1\2')
-> 2 ,2.1,1 ,3 ,4  -- success - NEW regex
 regexp_replace('a,b,b,b,b,c','(^|,)(.+)(,\2)+', '\1\2')
-> a,b,b,c fail!

v3 - 正则表达式感谢伊戈尔!适用于所有情况。

select  
regexp_replace('2,2,2.1,3,3,4,4','([^,]+)(,\1)*(,|$)', '\1\3') ,
---> 2,2.1,3,4 works
regexp_replace('2.1,1','([^,]+)(,\1)*(,|$)', '\1\3'),
--> 2.1,1 works
regexp_replace('a,b,b,b,b,c','([^,]+)(,\1)*(,|$)', '\1\3')
---> a,b,c works

from dual

【讨论】:

  • 公平的结果,但不是那么简单。如果数据量很大,您会遇到ORA-01489: result of string concatenation is too long
  • 我不会称其为简单但非常有吸引力的解决方案。我不知道匹配号可以在搜索字符串中使用,而不仅仅是替换字符串。辉煌。
  • 需要注意的是,此方法要求对值进行排序,以便重复值是连续的。否则失败。不过简单就好!我正在为我的特殊情况使用这种方法。谢谢!
  • 超级简单的重复次数不能超过 3 次!,例如a,b,b,b,b,c 将变为 a,b,b,c :-( (Oracle 11.2)
  • @AndreasDietrich - 以下解决方案似乎总是正确的:regexp_replace(your_string, '([^,]+)(,\1)*(,|$)', '\1\3')
【解决方案2】:

非常简单 - 在您的查询中使用带有 select distinct 的子查询:

SELECT question_id,
       LISTAGG(element_id, ',') WITHIN GROUP (ORDER BY element_id)
FROM
       (SELECT distinct question_id, element_id
       FROM YOUR_TABLE)
GROUP BY question_id;

【讨论】:

    【解决方案3】:

    19c 及更高版本:

    select listagg(distinct the_column, ',') within group (order by the_column)
    from the_table
    

    18c 及更早版本:

    select listagg(the_column, ',') within group (order by the_column)
    from (
       select distinct the_column 
       from the_table
    ) t
    

    如果您需要更多列,您可能正在寻找类似这样的内容:

    select col1, listagg(col2, ',') within group (order by col2)
    from (
      select col1, 
             col2,
             row_number() over (partition by col1, col2 order by col1) as rn
      from foo
      order by col1,col2
    )
    where rn = 1
    group by col1;
    

    【讨论】:

    • 与我的想法类似。如果listagg 是查询中唯一的聚合函数,则应该这样做。然而,将它与其他聚合函数结合起来更加棘手。
    • 是的。我的查询与此类似。
    • @a_horse_with_no_name :上面的 select 语句为我提供了重复的值。我想删除重复项。 col1 col2 Created by 1 2 Smith 1 2 John 1 3 Ajay 1 4 Ram 1 5 Jack 我需要选择 col1 和 col2 的 LISTAGG(不考虑第 3 列)。当我这样做时,我会得到类似的结果 od LISTAGG :->[2,2,3,4,5] 我需要在这里删除重复的'2'。我只需要 col2 与 col1 的不同值.
    • @a_horse_with_no_name :我尝试了代码 - 并得到如下错误消息 ORA-01489: 字符串连接的结果太长 01489. 00000 - “字符串连接的结果太长” *原因:字符串连接结果大于最大大小。
    • @Priyanth:那你就不走运了。总长度超过 4000 字节,Oracle 无法处理。您需要在应用程序代码中进行聚合。
    【解决方案4】:

    即将推出的 Oracle 19c 将支持 DISTINCTLISTAGG

    LISTAGG with DISTINCT option:

    19c 附带此功能:

    SQL> select deptno, listagg (distinct sal,', ') within group (order by sal)  
      2  from scott.emp  
      3  group by deptno;  
    

    编辑:

    Oracle 19C LISTAGG DISTINCT

    LISTAGG 聚合函数现在支持使用新的 DISTINCT 关键字消除重复项。 LISTAGG 聚合函数根据 ORDER BY 表达式对查询中每个组的行进行排序,然后将值连接到单个字符串。使用新的 DISTINCT 关键字,可以在连接成单个字符串之前从指定的表达式中删除重复值。 这消除了在使用聚合 LISTAGG 函数之前创建复杂的查询处理来查找不同值的需要。使用 DISTINCT 选项,删除重复值的处理可以直接在 LISTAGG 函数中完成。结果是更简单、更快、更高效的 SQL。

    【讨论】:

      【解决方案5】:

      使用 DECODE 与 CASE (i saw here) 进一步完善 @YoYo 对 @a_horse_with_no_name 的基于 row_number() 的方法的更正。我看到@Martin Vrbovsky 也有这种案例方法的答案。

      select
        col1, 
        listagg(col2, ',') within group (order by col2) AS col2_list,
        listagg(col3, ',') within group (order by col3) AS col3_list,
        SUM(col4) AS col4
      from (
        select
          col1, 
          decode(row_number() over (partition by col1, col2 order by null),1,col2) as col2,
          decode(row_number() over (partition by col1, col3 order by null),1,col3) as col3
        from foo
      )
      group by col1;
      

      【讨论】:

        【解决方案6】:

        在调用 LISTAGG 之前使用SELECT DISTINCT ... 作为子查询的一部分可能是简单查询的最佳方式,正如@a_horse_with_no_name 所指出的那样

        但是,在更复杂的查询中,可能无法或不容易做到这一点。我在一个使用分析函数的 top-n 方法的场景中遇到了这个问题。

        所以我找到了COLLECT 聚合函数。据记载可以使用UNIQUEDISTINCT 修饰符。只有in 10g,它悄悄地失败了(它忽略了修饰符而没有错误)。但是,为了克服这个问题,我从another answer 找到了这个解决方案:

        SELECT
          ...
          (
            SELECT LISTAGG(v.column_value,',') WITHIN GROUP (ORDER BY v.column_value)
            FROM TABLE(columns_tab) v
          ) AS columns,
          ...
        FROM (
          SELECT
            ...
            SET(CAST(COLLECT(UNIQUE some_column ORDER BY some_column) AS tab_typ)) AS columns_tab,
            ...
        )
        

        基本上,通过使用SET,我可以删除我收藏中的重复项。

        您仍然需要将tab_typ 定义为基本集合类型,对于VARCHAR,例如:

        CREATE OR REPLACE type tab_typ as table of varchar2(100)
        /
        

        也作为对@a_horse_with_no_name 在多列情况下的答案的更正,您可能仍希望在第三(或更多)列上聚合:

        select
          col1, 
          listagg(CASE rn2 WHEN 1 THEN col2 END, ',') within group (order by col2) AS col2_list,
          listagg(CASE rn3 WHEN 1 THEN col3 END, ',') within group (order by col3) AS col3_list,
          SUM(col4) AS col4
        from (
          select
            col1, 
            col2,
            row_number() over (partition by col1, col2 order by null) as rn2,
            row_number() over (partition by col1, col3 order by null) as rn3
          from foo
        )
        group by col1;
        

        如果您将 rn = 1 作为查询条件保留,则会错误地聚合其他列。

        【讨论】:

          【解决方案7】:

          select col1, listaggr(col2,',') within group(Order by col2) from table group by col1 表示将字符串 (col2) 聚合到列表中,保持顺序 n,然后将重复项按 col1 分组处理,这意味着将 col1 重复项合并到 1 个组中。也许这看起来应该是干净和简单的 如果你也想要 col3,你只需要再添加一个 listagg() 即select col1, listaggr(col2,',') within group(Order by col2),listaggr(col3,',') within group(order by col3) from table group by col1

          【讨论】:

            【解决方案8】:

            我实现了这个存储函数:

            CREATE TYPE LISTAGG_DISTINCT_PARAMS AS OBJECT (ELEMENTO VARCHAR2(2000), SEPARATORE VARCHAR2(10));
            
            CREATE TYPE T_LISTA_ELEMENTI AS TABLE OF VARCHAR2(2000);
            
            CREATE TYPE T_LISTAGG_DISTINCT AS OBJECT (
            
                LISTA_ELEMENTI T_LISTA_ELEMENTI,
                    SEPARATORE VARCHAR2(10),
            
                STATIC FUNCTION ODCIAGGREGATEINITIALIZE(SCTX  IN OUT            T_LISTAGG_DISTINCT) 
                                RETURN NUMBER,
            
                MEMBER FUNCTION ODCIAGGREGATEITERATE   (SELF  IN OUT            T_LISTAGG_DISTINCT, 
                                                        VALUE IN                    LISTAGG_DISTINCT_PARAMS ) 
                                RETURN NUMBER,
            
                MEMBER FUNCTION ODCIAGGREGATETERMINATE (SELF         IN     T_LISTAGG_DISTINCT,
                                                        RETURN_VALUE OUT    VARCHAR2, 
                                                        FLAGS        IN     NUMBER      )
                                RETURN NUMBER,
            
                MEMBER FUNCTION ODCIAGGREGATEMERGE       (SELF               IN OUT T_LISTAGG_DISTINCT,
                                                                                                    CTX2                 IN         T_LISTAGG_DISTINCT    )
                                RETURN NUMBER
            );
            
            CREATE OR REPLACE TYPE BODY T_LISTAGG_DISTINCT IS 
            
                STATIC FUNCTION ODCIAGGREGATEINITIALIZE(SCTX IN OUT T_LISTAGG_DISTINCT) RETURN NUMBER IS 
                BEGIN
                            SCTX := T_LISTAGG_DISTINCT(T_LISTA_ELEMENTI() , ',');
                    RETURN ODCICONST.SUCCESS;
                END;
            
                MEMBER FUNCTION ODCIAGGREGATEITERATE(SELF IN OUT T_LISTAGG_DISTINCT, VALUE IN LISTAGG_DISTINCT_PARAMS) RETURN NUMBER IS
                BEGIN
            
                            IF VALUE.ELEMENTO IS NOT NULL THEN
                                    SELF.LISTA_ELEMENTI.EXTEND;
                                    SELF.LISTA_ELEMENTI(SELF.LISTA_ELEMENTI.LAST) := TO_CHAR(VALUE.ELEMENTO);
                                    SELF.LISTA_ELEMENTI:= SELF.LISTA_ELEMENTI MULTISET UNION DISTINCT SELF.LISTA_ELEMENTI;
                                    SELF.SEPARATORE := VALUE.SEPARATORE;
                            END IF;
                    RETURN ODCICONST.SUCCESS;
                END;
            
                MEMBER FUNCTION ODCIAGGREGATETERMINATE(SELF IN T_LISTAGG_DISTINCT, RETURN_VALUE OUT VARCHAR2, FLAGS IN NUMBER) RETURN NUMBER IS
                  STRINGA_OUTPUT            CLOB:='';
                        LISTA_OUTPUT                T_LISTA_ELEMENTI;
                        TERMINATORE                 VARCHAR2(3):='...';
                        LUNGHEZZA_MAX           NUMBER:=4000;
                BEGIN
            
                            IF SELF.LISTA_ELEMENTI.EXISTS(1) THEN -- se esiste almeno un elemento nella lista
            
                                    -- inizializza una nuova lista di appoggio
                                    LISTA_OUTPUT := T_LISTA_ELEMENTI();
            
                                    -- riversamento dei soli elementi in DISTINCT
                                    LISTA_OUTPUT := SELF.LISTA_ELEMENTI MULTISET UNION DISTINCT SELF.LISTA_ELEMENTI;
            
                                    -- ordinamento degli elementi
                                    SELECT CAST(MULTISET(SELECT * FROM TABLE(LISTA_OUTPUT) ORDER BY 1 ) AS T_LISTA_ELEMENTI ) INTO LISTA_OUTPUT FROM DUAL;
            
                                    -- concatenazione in una stringa                        
                                    FOR I IN LISTA_OUTPUT.FIRST .. LISTA_OUTPUT.LAST - 1
                                    LOOP
                                        STRINGA_OUTPUT := STRINGA_OUTPUT || LISTA_OUTPUT(I) || SELF.SEPARATORE;
                                    END LOOP;
                                    STRINGA_OUTPUT := STRINGA_OUTPUT || LISTA_OUTPUT(LISTA_OUTPUT.LAST);
            
                                    -- se la stringa supera la dimensione massima impostata, tronca e termina con un terminatore
                                    IF LENGTH(STRINGA_OUTPUT) > LUNGHEZZA_MAX THEN
                                                RETURN_VALUE := SUBSTR(STRINGA_OUTPUT, 0, LUNGHEZZA_MAX - LENGTH(TERMINATORE)) || TERMINATORE;
                                    ELSE
                                                RETURN_VALUE:=STRINGA_OUTPUT;
                                    END IF;
            
                            ELSE -- se non esiste nessun elemento, restituisci NULL
            
                                    RETURN_VALUE := NULL;
            
                            END IF;
            
                    RETURN ODCICONST.SUCCESS;
                END;
            
                MEMBER FUNCTION ODCIAGGREGATEMERGE(SELF IN OUT T_LISTAGG_DISTINCT, CTX2 IN T_LISTAGG_DISTINCT) RETURN NUMBER IS
                BEGIN
                    RETURN ODCICONST.SUCCESS;
                END;
            
            END; -- fine corpo
            
            CREATE
            FUNCTION LISTAGG_DISTINCT (INPUT LISTAGG_DISTINCT_PARAMS) RETURN VARCHAR2
                PARALLEL_ENABLE AGGREGATE USING T_LISTAGG_DISTINCT;
            
            // Example
            SELECT LISTAGG_DISTINCT(LISTAGG_DISTINCT_PARAMS(OWNER, ', ')) AS LISTA_OWNER
            FROM SYS.ALL_OBJECTS;
            

            很抱歉,但在某些情况下(对于非常大的集合),Oracle 可能会返回此错误:

            Object or Collection value was too large. The size of the value
            might have exceeded 30k in a SORT context, or the size might be
            too big for available memory.
            

            但我认为这是一个很好的开始;)

            【讨论】:

              【解决方案9】:

              LISTAGG 的一个令人讨厌的地方是,如果连接字符串的总长度超过 4000 个字符(SQL 中 VARCHAR2 的限制),则会抛出以下错误,这在 Oracle 版本高达 12.1 中很难管理

              ORA-01489: 字符串连接的结果太长

              12cR2 中添加的一个新特性是LISTAGGON OVERFLOW 子句。 包含此子句的查询如下所示:

              SELECT pid, LISTAGG(Desc, ' ' on overflow truncate) WITHIN GROUP (ORDER BY seq) AS desc
              FROM B GROUP BY pid;
              

              以上将限制输出为 4000 个字符,但不会抛出 ORA-01489 错误。

              这些是ON OVERFLOW 子句的一些附加选项:

              • ON OVERFLOW TRUNCATE 'Contd..' :这将显示 'Contd..' 在 字符串结尾(默认为...
              • ON OVERFLOW TRUNCATE '' :这将显示 4000 个字符 没有任何终止字符串。
              • ON OVERFLOW TRUNCATE WITH COUNT :这将显示总数 终止字符之后的末尾字符数。 例如:-'...(5512)'
              • ON OVERFLOW ERROR :如果您希望 LISTAGG 失败 ORA-01489 错误(无论如何都是默认的)。

              【讨论】:

                【解决方案10】:

                我需要一个 DISTINCT 版本,然后解决了这个问题。

                RTRIM(REGEXP_REPLACE(
                                       (value, ', ') WITHIN GROUP( ORDER BY value)), 
                                            '([^ ]+)(, \1)+','\1'),', ') 
                

                【讨论】:

                  【解决方案11】:

                  如果您不需要特定顺序的连接值,并且分隔符可以是逗号,您可以这样做:

                  select col1, stragg(distinct col2)
                    from table
                   group by col1
                  

                  【讨论】:

                    【解决方案12】:

                    处理多个 listagg 的最简单方法是每列使用 1 个 WITH(子查询因子),其中包含来自 select distinct 的该列的 listagg:

                        WITH tab AS 
                        (           
                            SELECT 1 as col1, 2 as col2, 3 as col3, 'Smith' as created_by FROM dual
                            UNION ALL SELECT 1 as col1, 2 as col2, 3 as col3,'John'  as created_by FROM dual
                            UNION ALL SELECT 1 as col1, 3 as col2, 4 as col3,'Ajay'  as created_by FROM dual
                            UNION ALL SELECT 1 as col1, 4 as col2, 4 as col3,'Ram'   as created_by FROM dual
                            UNION ALL SELECT 1 as col1, 5 as col2, 6 as col3,'Jack'  as created_by FROM dual
                        )
                        , getCol2 AS
                        (
                            SELECT  DISTINCT col1, listagg(col2,',') within group (order by col2)  over (partition by col1) AS col2List
                            FROM ( SELECT DISTINCT col1,col2 FROM tab)
                        )
                        , getCol3 AS
                        (
                            SELECT  DISTINCT col1, listagg(col3,',') within group (order by col3)  over (partition by col1) AS col3List
                            FROM ( SELECT DISTINCT col1,col3 FROM tab)
                        )
                        select col1,col2List,col3List
                        FROM getCol2
                        JOIN getCol3
                        using (col1)
                    

                    这给出了:

                    col1  col2List  col3List
                    1     2,3,4,5   3,4,6
                    

                    【讨论】:

                      【解决方案13】:

                      我编写了一个函数来使用正则表达式来处理这个问题。 in 参数为: 1) listagg 调用本身 2) 分隔符的重复

                      create or replace function distinct_listagg
                        (listagg_in varchar2,
                         delimiter_in varchar2)
                      
                         return varchar2
                         as
                         hold_result varchar2(4000);
                         begin
                      
                         select rtrim( regexp_replace( (listagg_in)
                            , '([^'||delimiter_in||']*)('||
                            delimiter_in||'\1)+($|'||delimiter_in||')', '\1\3'), ',')
                            into hold_result
                            from dual;
                      
                      return hold_result;
                      
                      end;
                      

                      现在您不必每次执行此操作时都重复正则表达式,只需说:

                      select distinct_listagg(
                                             listagg(myfield,', ') within group (order by 1),
                                             ', '
                                             )
                           from mytable;
                      

                      【讨论】:

                        【解决方案14】:

                        使用这样创建的 listagg_clob 函数:

                        create or replace package list_const_p
                        is
                        list_sep varchar2(10) := ',';
                        end list_const_p;
                        /
                        sho err
                        
                        create type listagg_clob_t as object(
                        v_liststring varchar2(32767),
                        v_clob clob,
                        v_templob number,
                        
                        static function ODCIAggregateInitialize(
                        sctx IN OUT listagg_clob_t
                        ) return number,
                        member function ODCIAggregateIterate(
                        self IN OUT listagg_clob_t, value IN varchar2
                        ) return number,
                        member function ODCIAggregateTerminate(
                        self IN OUT listagg_clob_t, returnValue OUT clob, flags IN number
                        ) return number,
                        member function ODCIAggregateMerge(
                        self IN OUT listagg_clob_t, ctx2 IN OUT listagg_clob_t
                        ) return number
                        );
                        /
                        sho err
                        
                        create or replace type body listagg_clob_t is
                        
                        static function ODCIAggregateInitialize(sctx IN OUT listagg_clob_t)
                        return number is
                        begin
                        sctx := listagg_clob_t('', '', 0);
                        return ODCIConst.Success;
                        end;
                        
                        member function ODCIAggregateIterate(
                        self IN OUT listagg_clob_t,
                        value IN varchar2
                        ) return number is
                        begin
                        if nvl(lengthb(v_liststring),0) + nvl(lengthb(value),0) <= 4000 then
                        self.v_liststring:=self.v_liststring || value || list_const_p.list_sep;
                        else
                        if self.v_templob = 0 then
                        dbms_lob.createtemporary(self.v_clob, true, dbms_lob.call);
                        self.v_templob := 1;
                        end if;
                        dbms_lob.writeappend(self.v_clob, length(self.v_liststring), v_liststring);
                        self.v_liststring := value || list_const_p.list_sep;
                        end if;
                        return ODCIConst.Success;
                        end;
                        
                        member function ODCIAggregateTerminate(
                        self IN OUT listagg_clob_t,
                        returnValue OUT clob,
                        flags IN number
                        ) return number is
                        begin
                        if self.v_templob != 0 then
                        dbms_lob.writeappend(self.v_clob, length(self.v_liststring), self.v_liststring);
                        dbms_lob.trim(self.v_clob, dbms_lob.getlength(self.v_clob) - 1);
                        else
                        self.v_clob := substr(self.v_liststring, 1, length(self.v_liststring) - 1);
                        end if;
                        returnValue := self.v_clob;
                        return ODCIConst.Success;
                        end;
                        
                        member function ODCIAggregateMerge(self IN OUT listagg_clob_t, ctx2 IN OUT listagg_clob_t) return number is
                        begin
                        if ctx2.v_templob != 0 then
                        if self.v_templob != 0 then
                        dbms_lob.append(self.v_clob, ctx2.v_clob);
                        dbms_lob.freetemporary(ctx2.v_clob);
                        ctx2.v_templob := 0;
                        else
                        self.v_clob := ctx2.v_clob;
                        self.v_templob := 1;
                        ctx2.v_clob := '';
                        ctx2.v_templob := 0;
                        end if;
                        end if;
                        if nvl(lengthb(self.v_liststring),0) + nvl(lengthb(ctx2.v_liststring),0) <= 4000 then
                        self.v_liststring := self.v_liststring || ctx2.v_liststring;
                        ctx2.v_liststring := '';
                        else
                        if self.v_templob = 0 then
                        dbms_lob.createtemporary(self.v_clob, true, dbms_lob.call);
                        self.v_templob := 1;
                        end if;
                        dbms_lob.writeappend(self.v_clob, length(self.v_liststring), self.v_liststring);
                        dbms_lob.writeappend(self.v_clob, length(ctx2.v_liststring), ctx2.v_liststring);
                        self.v_liststring := '';
                        ctx2.v_liststring := '';
                        end if;
                        return ODCIConst.Success;
                        end;
                        end;
                        /
                        sho err
                        
                        CREATE or replace FUNCTION listagg_clob (input varchar2) RETURN clob
                        PARALLEL_ENABLE AGGREGATE USING listagg_clob_t;
                        /
                        sho err 
                        

                        【讨论】:

                          【解决方案15】:

                          要解决字符串长度问题,您可以使用XMLAGG,它类似于listagg,但它返回一个clob。

                          然后您可以使用regexp_replace 解析并获取唯一值,然后使用dbms_lob.substr() 将其转换回字符串。如果你有大量不同的值,你仍然会用这种方式耗尽空间,但在很多情况下,下面的代码应该可以工作。

                          您还可以更改您使用的分隔符。在我的例子中,我想要 '-' 而不是 ',' 但你应该能够替换我的代码中的破折号,如果你愿意的话可以使用逗号。

                          select col1,
                              dbms_lob.substr(ltrim(REGEXP_REPLACE(REPLACE(
                                   REPLACE(
                                     XMLAGG(
                                       XMLELEMENT("A",col2)
                                         ORDER BY col2).getClobVal(),
                                       '<A>','-'),
                                       '</A>',''),'([^-]*)(-\1)+($|-)', 
                                     '\1\3'),'-'), 4000,1) as platform_mix
                          from table
                          

                          【讨论】:

                          • 这是一个好主意,需要调用 dbms_xmlgen.convert(string, 1) 来删除和 & - > &amp 转换。见我的帖子link
                          【解决方案16】:

                          您可以通过 RegEx 替换来做到这一点。这是一个例子:

                          -- Citations Per Year - Cited Publications main query. Includes list of unique associated core project numbers, ordered by core project number.
                          SELECT ptc.pmid AS pmid, ptc.pmc_id, ptc.pub_title AS pubtitle, ptc.author_list AS authorlist,
                            ptc.pub_date AS pubdate,
                            REGEXP_REPLACE( LISTAGG ( ppcc.admin_phs_org_code || 
                              TO_CHAR(ppcc.serial_num,'FM000000'), ',') WITHIN GROUP (ORDER BY ppcc.admin_phs_org_code || 
                              TO_CHAR(ppcc.serial_num,'FM000000')),
                              '(^|,)(.+)(,\2)+', '\1\2')
                            AS projectNum
                          FROM publication_total_citations ptc
                            JOIN proj_paper_citation_counts ppcc
                              ON ptc.pmid = ppcc.pmid
                             AND ppcc.citation_year = 2013
                            JOIN user_appls ua
                              ON ppcc.admin_phs_org_code = ua.admin_phs_org_code
                             AND ppcc.serial_num = ua.serial_num
                             AND ua.login_id = 'EVANSF'
                          GROUP BY ptc.pmid, ptc.pmc_id, ptc.pub_title, ptc.author_list, ptc.pub_date
                          ORDER BY pmid;
                          

                          也在这里发布:Oracle - unique Listagg values

                          【讨论】:

                            【解决方案17】:

                            如何创建一个专门的功能来制作“独特”的部分:

                            create or replace function listagg_distinct (t in str_t, sep IN VARCHAR2 DEFAULT ',') 
                              return VARCHAR2
                            as 
                              l_rc VARCHAR2(4096) := '';
                            begin
                              SELECT listagg(val, sep) WITHIN GROUP (ORDER BY 1)
                                INTO l_rc
                                FROM (SELECT DISTINCT column_value val FROM table(t));
                              RETURN l_rc;
                            end;
                            /
                            

                            然后用它来做聚合:

                            SELECT col1, listagg_distinct(cast(collect(col_2) as str_t ), ', ')
                              FROM your_table
                              GROUP BY col_1;
                            

                            【讨论】:

                              【解决方案18】:

                              listagg() 忽略 NULL 值,因此在第一步中您可以使用 lag() 函数来分析之前的记录是否具有相同的值,如果是则为 NULL,否则为“新值”。

                              WITH tab AS 
                              (           
                                        SELECT 1 as col1, 2 as col2, 'Smith' as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 2 as col2, 'John'  as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 3 as col2, 'Ajay'  as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 4 as col2, 'Ram'   as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 5 as col2, 'Jack'  as created_by FROM dual
                              )
                              SELECT col1
                                   , CASE 
                                     WHEN lag(col2) OVER (ORDER BY col2) = col2 THEN 
                                       NULL 
                                     ELSE 
                                       col2 
                                     END as col2_with_nulls
                                   , created_by
                                FROM tab;
                              

                              结果

                                    COL1 COL2_WITH_NULLS CREAT
                              ---------- --------------- -----
                                       1               2 Smith
                                       1                 John
                                       1               3 Ajay
                                       1               4 Ram
                                       1               5 Jack
                              

                              请注意,第二个 2 被 NULL 替换。现在你可以用 listagg() 包裹一个 SELECT。

                              WITH tab AS 
                              (           
                                        SELECT 1 as col1, 2 as col2, 'Smith' as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 2 as col2, 'John'  as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 3 as col2, 'Ajay'  as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 4 as col2, 'Ram'   as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 5 as col2, 'Jack'  as created_by FROM dual
                              )
                              SELECT listagg(col2_with_nulls, ',') WITHIN GROUP (ORDER BY col2_with_nulls) col2_list
                                FROM ( SELECT col1
                                            , CASE WHEN lag(col2) OVER (ORDER BY col2) = col2 THEN NULL ELSE col2 END as col2_with_nulls
                                            , created_by
                                         FROM tab );
                              

                              结果

                              COL2_LIST
                              ---------
                              2,3,4,5
                              

                              您也可以在多个列上执行此操作。

                              WITH tab AS 
                              (           
                                        SELECT 1 as col1, 2 as col2, 'Smith' as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 2 as col2, 'John'  as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 3 as col2, 'Ajay'  as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 4 as col2, 'Ram'   as created_by FROM dual
                              UNION ALL SELECT 1 as col1, 5 as col2, 'Jack'  as created_by FROM dual
                              )
                              SELECT listagg(col1_with_nulls, ',') WITHIN GROUP (ORDER BY col1_with_nulls) col1_list
                                   , listagg(col2_with_nulls, ',') WITHIN GROUP (ORDER BY col2_with_nulls) col2_list
                                   , listagg(created_by, ',')      WITHIN GROUP (ORDER BY created_by) created_by_list
                                FROM ( SELECT CASE WHEN lag(col1) OVER (ORDER BY col1) = col1 THEN NULL ELSE col1 END as col1_with_nulls
                                            , CASE WHEN lag(col2) OVER (ORDER BY col2) = col2 THEN NULL ELSE col2 END as col2_with_nulls
                                            , created_by
                                         FROM tab );
                              

                              结果

                              COL1_LIST COL2_LIST CREATED_BY_LIST
                              --------- --------- -------------------------
                              1         2,3,4,5   Ajay,Jack,John,Ram,Smith
                              

                              【讨论】:

                                【解决方案19】:

                                我认为这可能会有所帮助 - 如果列重复,则将列值设置为 NULL - 然后它不会附加到 LISTAGG 字符串:

                                with test_data as 
                                (
                                      select 1 as col1, 2 as col2, 'Smith' as created_by from dual
                                union select 1, 2, 'John' from dual
                                union select 1, 3, 'Ajay' from dual
                                union select 1, 4, 'Ram' from dual
                                union select 1, 5, 'Jack' from dual
                                union select 2, 5, 'Smith' from dual
                                union select 2, 6, 'John' from dual
                                union select 2, 6, 'Ajay' from dual
                                union select 2, 6, 'Ram' from dual
                                union select 2, 7, 'Jack' from dual
                                )
                                SELECT col1  ,
                                      listagg(col2 , ',') within group (order by col2 ASC) AS orig_value,
                                      listagg(CASE WHEN rwn=1 THEN col2 END , ',') within group (order by col2 ASC) AS distinct_value
                                from 
                                    (
                                    select row_number() over (partition by col1,col2 order by 1) as rwn, 
                                           a.*
                                    from test_data a
                                    ) a
                                GROUP BY col1   
                                

                                结果:

                                COL1  ORIG         DISTINCT
                                1   2,2,3,4,5   2,3,4,5
                                2   5,6,6,6,7   5,6,7
                                

                                【讨论】:

                                  【解决方案20】:

                                  如果您想要跨 MULTIPLE 列的不同值,想要控制排序顺序,不想使用可能会消失的未记录函数,并且不想进行多次全表扫描,您可能会发现这个结构很有用:

                                  with test_data as 
                                  (
                                        select 'A' as col1, 'T_a1' as col2, '123' as col3 from dual
                                  union select 'A', 'T_a1', '456' from dual
                                  union select 'A', 'T_a1', '789' from dual
                                  union select 'A', 'T_a2', '123' from dual
                                  union select 'A', 'T_a2', '456' from dual
                                  union select 'A', 'T_a2', '111' from dual
                                  union select 'A', 'T_a3', '999' from dual
                                  union select 'B', 'T_a1', '123' from dual
                                  union select 'B', 'T_b1', '740' from dual
                                  union select 'B', 'T_b1', '846' from dual
                                  )
                                  select col1
                                       , (select listagg(column_value, ',') within group (order by column_value desc) from table(collect_col2)) as col2s
                                       , (select listagg(column_value, ',') within group (order by column_value desc) from table(collect_col3)) as col3s
                                  from 
                                  (
                                  select col1
                                       , collect(distinct col2) as collect_col2
                                       , collect(distinct col3) as collect_col3
                                  from test_data
                                  group by col1
                                  );
                                  

                                  【讨论】:

                                  • 如果您将“union”替换为“union all”,您可能会节省更多时间。
                                  【解决方案21】:

                                  我通过首先对值进行分组来克服这个问题,然后使用 listagg 进行另一个聚合。像这样的:

                                  select a,b,listagg(c,',') within group(order by c) c, avg(d)
                                  from (select a,b,c,avg(d)
                                        from   table
                                        group by (a,b,c))
                                  group by (a,b)
                                  

                                  只有一个全表访问,比较容易扩展到更复杂的查询

                                  【讨论】:

                                    【解决方案22】:

                                    有没有人想过使用 PARTITION BY 子句?在此查询中,我可以获取应用程序服务列表和访问权限。

                                    SELECT DISTINCT T.APP_SVC_ID, 
                                           LISTAGG(RTRIM(T.ACCESS_MODE), ',') WITHIN GROUP(ORDER BY T.ACCESS_MODE) OVER(PARTITION BY T.APP_SVC_ID) AS ACCESS_MODE 
                                      FROM APP_SVC_ACCESS_CNTL T 
                                     GROUP BY T.ACCESS_MODE, T.APP_SVC_ID
                                    

                                    我不得不为 NDA 删掉 where 子句,但你明白了。

                                    【讨论】:

                                    • 我不明白这个查询如何为LISTAGG 获取不同的项目。看来您每行只有一个T.ACCESS_MODE,因为您是按它分组的?
                                    【解决方案23】:

                                    如果打算将此转换应用于多个列,我扩展了 a_horse_with_no_name 的解决方案:

                                    SELECT * FROM
                                    (SELECT LISTAGG(GRADE_LEVEL, ',') within group(order by GRADE_LEVEL) "Grade Levels" FROM (select distinct GRADE_LEVEL FROM Students) t)                     t1,
                                    (SELECT LISTAGG(ENROLL_STATUS, ',') within group(order by ENROLL_STATUS) "Enrollment Status" FROM (select distinct ENROLL_STATUS FROM Students) t)          t2,
                                    (SELECT LISTAGG(GENDER, ',') within group(order by GENDER) "Legal Gender Code" FROM (select distinct GENDER FROM Students) t)                               t3,
                                    (SELECT LISTAGG(CITY, ',') within group(order by CITY) "City" FROM (select distinct CITY FROM Students) t)                                                  t4,
                                    (SELECT LISTAGG(ENTRYCODE, ',') within group(order by ENTRYCODE) "Entry Code" FROM (select distinct ENTRYCODE FROM Students) t)                             t5,
                                    (SELECT LISTAGG(EXITCODE, ',') within group(order by EXITCODE) "Exit Code" FROM (select distinct EXITCODE FROM Students) t)                                 t6,
                                    (SELECT LISTAGG(LUNCHSTATUS, ',') within group(order by LUNCHSTATUS) "Lunch Status" FROM (select distinct LUNCHSTATUS FROM Students) t)                     t7,
                                    (SELECT LISTAGG(ETHNICITY, ',') within group(order by ETHNICITY) "Race Code" FROM (select distinct ETHNICITY FROM Students) t)                              t8,
                                    (SELECT LISTAGG(CLASSOF, ',') within group(order by CLASSOF) "Expected Graduation Year" FROM (select distinct CLASSOF FROM Students) t)                     t9,
                                    (SELECT LISTAGG(TRACK, ',') within group(order by TRACK) "Track Code" FROM (select distinct TRACK FROM Students) t)                                         t10,
                                    (SELECT LISTAGG(GRADREQSETID, ',') within group(order by GRADREQSETID) "Graduation ID" FROM (select distinct GRADREQSETID FROM Students) t)                 t11,
                                    (SELECT LISTAGG(ENROLLMENT_SCHOOLID, ',') within group(order by ENROLLMENT_SCHOOLID) "School Key" FROM (select distinct ENROLLMENT_SCHOOLID FROM Students) t)       t12,
                                    (SELECT LISTAGG(FEDETHNICITY, ',') within group(order by FEDETHNICITY) "Federal Race Code" FROM (select distinct FEDETHNICITY FROM Students) t)                         t13,
                                    (SELECT LISTAGG(SUMMERSCHOOLID, ',') within group(order by SUMMERSCHOOLID) "Summer School Key" FROM (select distinct SUMMERSCHOOLID FROM Students) t)                               t14,
                                    (SELECT LISTAGG(FEDRACEDECLINE, ',') within group(order by FEDRACEDECLINE) "Student Decl to Prov Race Code" FROM (select distinct FEDRACEDECLINE FROM Students) t)          t15
                                    

                                    这是 Oracle Database 11g 企业版版本 11.2.0.2.0 - 64 位生产。
                                    我无法使用 STRAGG,因为无法区分和排序。

                                    性能线性扩展,这很好,因为我添加了所有感兴趣的列。以上 77K 行耗时 3 秒。仅一次汇总,0.172 秒。我有一种方法可以一次性区分表中的多个列。

                                    【讨论】:

                                      【解决方案24】:

                                      您可以使用未记录的wm_concat 函数。

                                      select col1, wm_concat(distinct col2) col2_list 
                                      from tab1
                                      group by col1;
                                      

                                      此函数返回 clob 列,如果您愿意,可以使用 dbms_lob.substr 将 clob 转换为 varchar2。

                                      【讨论】:

                                      • 这正是我所需要的,并且在我现有的聚合查询中完美地工作,而不是将该查询包装在外部查询中。使用wm_concat(distinct x) 有什么问题?
                                      • 因为它没有记录并且在 12c 上不存在。但无论如何,在旧版本上,我认为这是最好的方法。
                                      • 谢谢@kemalettinerbakırcı! @thg您应该考虑,如果某些东西没有记录,您不知道它的副作用是什么,以及文档告诉您有关记录功能的任何其他类型的事情;你只是把它当作一个黑匣子,你只知道哪个杠杆根据民间传说做了什么。
                                      • 永远不要使用wm_concat。见Why not use WM_CONCAT function in Oracle?
                                      • 感谢@Koshinae 和@LalitKumar。我可以确认在 12c 上使用 WM_CONCAT 会返回“无效标识符”错误
                                      猜你喜欢
                                      • 2021-03-09
                                      • 2016-09-02
                                      • 1970-01-01
                                      • 1970-01-01
                                      • 2021-11-27
                                      • 2021-06-03
                                      • 1970-01-01
                                      • 1970-01-01
                                      • 1970-01-01
                                      相关资源
                                      最近更新 更多