【问题标题】:Count string occurances within a list column - Snowflake/SQL计算列表列中的字符串出现次数 - Snowflake/SQL
【发布时间】:2021-03-23 08:08:19
【问题描述】:

我有一个表,其中有一列包含如下字符串列表:

示例:

STRING                                                                 User_ID    [...]
"[""null"",""personal"",""Other""]"                                    2122213    .... 
"[""Other"",""to_dos_and_thing""]"                                     2132214    ....  
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"         4342323    ....

问题:

我希望能够计算每个唯一字符串出现的次数(字符串在字符串列中可以用逗号分隔),但只知道如何执行以下操作:

SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by  u.STRING
order by cnt desc;

但是上述方法不起作用,因为它只计算使用特定字符串分组的用户 ID 的数量。

使用上面示例的理想输出应该是这样的!

期望的输出:

STRING                     COUNT_Instances                                                             
"null"                     1223
"personal"                 543
"Other"                    324                  
"to_dos_and_thing"         221                                
"getting_things_done"      146
"Work!!!!!"                22 

【问题讨论】:

    标签: sql snowflake-cloud-data-platform


    【解决方案1】:

    根据您的描述,这是我的示例表:

    create table u (user_id number, string varchar);
    
    insert into u values
    (2122213, '"[""null"",""personal"",""Other""]"'),
    (2132214, '"[""Other"",""to_dos_and_thing""]"'),
    (2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
    

    我使用 SPLIT_TO_TABLE 将每个字符串拆分为一行,然后使用 REGEXP_SUBSTR 清理数据。所以这里是查询和输出:

    select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
    lateral SPLIT_TO_TABLE( string  , ',' ) s
    GROUP BY extracted
    order by count(*) DESC;
    
    
    +---------------------+----------+
    |      EXTRACTED      | COUNT(*) |
    +---------------------+----------+
    | Other               |        2 |
    | null                |        1 |
    | personal            |        1 |
    | to_dos_and_thing    |        1 |
    | getting_things_done |        1 |
    | TO_dos_and_thing    |        1 |
    | Work!!!!!           |        1 |
    +---------------------+----------+
    

    SPLIT_TO_TABLE https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-01-06
      • 1970-01-01
      • 2014-08-22
      • 2011-03-03
      • 2020-12-10
      • 2016-01-03
      • 2019-06-03
      • 2018-04-04
      相关资源
      最近更新 更多