【问题标题】:Generate combinations and identify which occurs the most生成组合并确定出现最多的组合
【发布时间】:2019-09-09 14:23:06
【问题描述】:

我有一个包含 3 列的表格。我需要生成基于特定“诊断”执行的表的值(在“测试”列中)的组合(总是在 3 中)。然而,一个特定的诊断可能有 2 个或更少的测试,在这种情况下,逻辑仍然会输出该组合,尽管有 2 个值。 参考下表,对于每个 cust_id 都有一个“诊断”列,基于该列执行“测试”。现在对于每个诊断值组,我需要在“测试”列中生成相应值的唯一组合。请注意,组合应始终具有 3 个值(其中值 >= 3),但对于具有小于 3 个值(1 或 2)的诊断,仍应输出相应的组合(具有可用的 1 或 2 个值并替换Null 代替不可用的值)。

患者:

pat_id | diagnosis | tests

1001 | Thyroid | CAT
1001 | Thyroid | MRI
1001 | Thyroid | Blood

1001 | Tonsil  | CAT
1001 | Tonsil  | MRI
1001 | Tonsil  | Blood
1001 | Tonsil  | RAPID

1002 | Pneumonia | MRI
1002 | Pneumonia | Eliza

1003 | Bronchitis | X-Ray

因此,对于 pat_id = '1001'diagnosis = 'Thyroid',我们看到“测试”有 3 个不同的值。因此,只有 1 个唯一组合是可能的,即{CAT, MRI, Blood}。同样,对于pat_id = '1001' and diagnosis = 'Tonsil',我们看到“test”列中有 4 个不同的值。因此,将有 4 种组合,即{CAT, MRI, Blood}, {CAT, MRI, RAPID}, {MRI, Blood, RAPID} & {CAT, blood, RAPID}。对于pat_id = '1002',仅存在两个唯一值。因此组合将只有 1,即{MRI, Eliza}。同样,pat_id = '1003' 只有 1 个值,即 X 射线,因此对于 '1003',输出应该是 {X-Ray}

像这样,我需要为组中的所有诊断值生成相似的组合,最后确定在该表中出现最多次数的唯一组合。输出应该是表中出现次数最多的组合

到目前为止,下面的 sql 返回所有具有 3 个或更多值的组合。但它无法输出小于 3 个值的值。意思是,1002 & 1003 没有被输出,因为它们的值少于 3 个,但需要输出。该解决方案还需要处理此类情况。

select p1.pat_id, p1.diagnosis, p1.tests, p2.tests, p3.tests
from patient p1 join
     patient p2
     on p1.pat_id = p2.pat_id and p1.diagnosis = p2.diagnosis and
        p1.tests < p2.tests join
     patient p3
     on p2.pat_id = p3.pat_id and p2.diagnosis = p3.diagnosis and
        p2.tests < p3.tests ;

还请说明我如何确定哪种组合出现次数最多。 谢谢。

【问题讨论】:

标签: mysql sql


【解决方案1】:

您可以使用带有左连接的查询来允许第二个和第三个测试为 NULL。但是,您将需要为具有两个或多个测试的组删除带有 NULL 的行。您可以使用相关(依赖)COUNT(*) 子查询来实现:

select
  p1.pat_id,
  p1.diagnosis,
  p1.tests as test1,
  p2.tests as test2,
  p3.tests as test3
from patient p1
left join patient p2
  on  p2.diagnosis = p1.diagnosis
  and p2.pat_id = p1.pat_id
  and p2.tests > p1.tests
left join patient p3
  on  p3.diagnosis = p1.diagnosis
  and p3.pat_id = p1.pat_id
  and p3.tests > p2.tests
where
  case (
    select count(*)
    from patient p
    where p.diagnosis = p1.diagnosis
      and p.pat_id = p1.pat_id
  )
    when 1 then true
    when 2 then p2.tests is not null
    else        p3.tests is not null
  end
order by p1.pat_id, p1.diagnosis

结果:

| pat_id | diagnosis  | test1 | test2 | test3 |
| ------ | ---------- | ----- | ----- | ----- |
| 1001   | Thyroid    | Blood | CAT   | MRI   |
| 1001   | Tonsil     | CAT   | MRI   | RAPID |
| 1001   | Tonsil     | Blood | MRI   | RAPID |
| 1001   | Tonsil     | Blood | CAT   | MRI   |
| 1001   | Tonsil     | Blood | CAT   | RAPID |
| 1002   | Pneumonia  | Eliza | MRI   |       |
| 1003   | Bronchitis | X-Ray |       |       |

View on DB Fiddle

要按出现次数对不同组合进行排序,只需将其修改为 GROUP BY ... ORDER BY COUNT(*) 查询:

select 
  p1.tests as test1,
  p2.tests as test2,
  p3.tests as test3,
  count(*) as cnt
from patient p1
left join patient p2
  on  p2.diagnosis = p1.diagnosis
  and p2.pat_id = p1.pat_id
  and p2.tests > p1.tests
left join patient p3
  on  p3.diagnosis = p1.diagnosis
  and p3.pat_id = p1.pat_id
  and p3.tests > p2.tests
where
  case (
    select count(*)
    from patient p
    where p.diagnosis = p1.diagnosis
      and p.pat_id = p1.pat_id
  )
    when 1 then true
    when 2 then p2.tests is not null
    else        p3.tests is not null
  end
group by p1.tests, p2.tests, p3.tests
order by cnt desc

结果:

| test1 | test2 | test3 | cnt |
| ----- | ----- | ----- | --- |
| Blood | CAT   | MRI   | 2   |
| CAT   | MRI   | RAPID | 1   |
| Blood | MRI   | RAPID | 1   |
| Eliza | MRI   |       | 1   |
| X-Ray |       |       | 1   |
| Blood | CAT   | RAPID | 1   |

View on DB Fiddle

【讨论】:

  • 工作就像一个魅力......感谢所有的帮助......非常感谢!
【解决方案2】:

我想你想要left joingroup by

select p1.tests, p2.tests, p3.tests, count(*)
from patient p1 left join
     patient p2
     on p1.pat_id = p2.pat_id and p1.diagnosis = p2.diagnosis and
        p1.tests < p2.tests left join
     patient p3
     on p2.pat_id = p3.pat_id and p2.diagnosis = p3.diagnosis and
        p2.tests < p3.tests
group by p1.tests, p2.tests, p3.tests
order by count(*) desc;

我不确定diagnosis 是否也应该是结果集的一部分。正如你描述的结果,它似乎不是,但对我来说这是有道理的。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-02-10
    • 2021-07-13
    • 1970-01-01
    • 1970-01-01
    • 2018-07-07
    • 1970-01-01
    相关资源
    最近更新 更多