【问题标题】:Is there a way to count the number of unique value across multiple columns in SQL有没有办法计算 SQL 中多列中唯一值的数量
【发布时间】:2021-09-19 14:14:16
【问题描述】:

我想根据tx_id统计唯一值的个数,这里是部分原始数据:

table : Treatment Record 
+------------------+-----------+----------------+------------------+
|        SN        |  tx_id    |       pa3      |       pa4        |
+------------------+-----------+----------------+------------------+
| I2120210007014   |   149362  | V16F2021117016 |   V15S2021145018 |
| I2120210007014   |   149362  | V15S2021144019 |   V15S2021145018 |
| I2120210007014   |   149362  | V16F2021117017 |   V15S2021145018 |
| I2120210007014   |   149362  | V16F2021117017 |   V15S2021145018 |
| I2120210007014   |   149362  | V16F2021117017 |   V15S2021145018 |
| I2120210007014   |   148716  | V15C2021116010 |   V15C20211091016|
+------------------+-----------+----------------+------------------+

例如,结果应如下所示:

+------------------+-----------+----------------+-------+
|        SN        |  tx_id    |  V16F |  V15S  |  V15C |
+------------------+-----------+-------+--------+-------+
| I2120210007014   |   149362  |   2   |    2   |   0   |
| I2120210007014   |   148716  |   0   |    0   |   2   |
+------------------+-----------+----------------+-------+

从原始数据中,您可以看到有两个不同的tx_id,我用它来识别每个组。因此,例如,所有tx_id = '149362 都在同一个组中。

而在pa3pa4 列中,有2 个不同的组,可以通过查看前4 个字符进行分类,例如“V16F”、“V15S”。此外,我必须计算同一组中不同措辞的数量。例如,您可以看到pa3 列包含V16F2021117016V15S2021144019V16F2021117017,而pa4 列仅包含。因此,有V15S2021145018

因此,我们将“V16F”组计数为 2,将“V15S”组计数为 2。您可能会注意到,计数不是基于 pa3pa4 列,而是基于最后 4 个字符。例如V16F2021117016V16F2021117017,它们属于同一个组,“V16F”,但由于最后4个字符分别是'7016'和'7017',所以不同的单词。

但是我现在找不到出路,只在下面输入了一些 sql 代码。希望有人可以帮助我。

SELECT tx_id, 
       sum(case when val like 'V16F%' then 1 else 0 end), 
       sum(case when val2 like 'V15S%' then 1 else 0 end) 
FROM ( select tx_id, pa3 as val, pa4 as val2 from Cool group by pa3, pa4)
GROUP BY tx_id   

这是错误的输出:

+------------------+-----------+----------------+
|        SN        |  tx_id    |  V16F |  V15S  |
+------------------+-----------+-------+--------+
| I2120210007014   |   149362  |   3   |    3   |
| I2120210007014   |   148716  |   0   |    0   |  
+------------------+-----------+----------------+

【问题讨论】:

标签: mysql sql database count distinct


【解决方案1】:

最简单的方法是使用UNION ALL 将所有pa3s 和pa4s 放在1 列中,然后聚合:

SELECT SN, tx_id,
       COUNT(DISTINCT CASE WHEN pa LIKE 'V16F%' THEN pa END) V16F,
       COUNT(DISTINCT CASE WHEN pa LIKE 'V15S%' THEN pa END) V15S,
       COUNT(DISTINCT CASE WHEN pa LIKE 'V15C%' THEN pa END) V15C
FROM (
  SELECT SN, tx_id, pa3 pa FROM tablename
  UNION ALL
  SELECT SN, tx_id, pa4 pa FROM tablename
) t  
GROUP BY SN, tx_id

或者,使用UNION,它会删除重复的行,因此不需要DISTINCT

SELECT SN, tx_id,
       COUNT(CASE WHEN pa LIKE 'V16F%' THEN pa END) V16F,
       COUNT(CASE WHEN pa LIKE 'V15S%' THEN pa END) V15S,
       COUNT(CASE WHEN pa LIKE 'V15C%' THEN pa END) V15C
FROM (
  SELECT SN, tx_id, pa3 pa FROM tablename
  UNION 
  SELECT SN, tx_id, pa4 pa FROM tablename
) t  
GROUP BY SN, tx_id

可以进一步简化为:

SELECT SN, tx_id,
       SUM(pa LIKE 'V16F%') V16F,
       SUM(pa LIKE 'V15S%') V15S,
       SUM(pa LIKE 'V15C%') V15C
FROM (
  SELECT SN, tx_id, pa3 pa FROM tablename
  UNION 
  SELECT SN, tx_id, pa4 pa FROM tablename
) t  
GROUP BY SN, tx_id

另一种方法是直接使用条件聚合,逻辑更复杂,适用于该示例数据:

SELECT SN, tx_id,
       COUNT(DISTINCT CASE WHEN pa3 LIKE 'V16F%' THEN pa3 END) +
       COUNT(DISTINCT CASE WHEN pa4 LIKE 'V16F%' THEN pa4 END) -
       SUM(pa3 = pa4) V16F,
       COUNT(DISTINCT CASE WHEN pa3 LIKE 'V15S%' THEN pa3 END) +
       COUNT(DISTINCT CASE WHEN pa4 LIKE 'V15S%' THEN pa4 END) -
       SUM(pa3 = pa4) V15S,
       COUNT(DISTINCT CASE WHEN pa3 LIKE 'V15C%' THEN pa3 END) +
       COUNT(DISTINCT CASE WHEN pa4 LIKE 'V15C%' THEN pa4 END) -
       SUM(pa3 = pa4) V15C
FROM tablename
GROUP BY SN, tx_id

请参阅demo

【讨论】:

  • 请问为什么FROM ( SELECT SN, tx_id, pa3 pa FROM tablename UNION ALL SELECT SN, tx_id, pa4 pa FROM tablename ) 后面有个t
  • @CheTou 是子查询的别名。在 MySql 中,所有子查询都必须有别名。
猜你喜欢
  • 2022-01-10
  • 2023-02-23
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-11-04
  • 2020-10-21
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多