【问题标题】:SQL query to extract overlaps between records BigQuery用于提取记录之间重叠的 SQL 查询 BigQuery
【发布时间】:2021-01-15 11:36:19
【问题描述】:

我正在尝试从以下数据构造查询:

time    user_id adver_id    tactic_id
time1   123 adv1    tac1
time2   123 adv1    tac1
time3   123 adv1    tac2
time4   124 adv1    tac1
time6   125 adv2    tac3
time7   123 adv2    tac1

预期结果应如下所示:

    adver_id    adver_id_overlap    tactic_id   tactic_id_overlap   unique_users    total_records
    adv1    adv1    tac1    tac1    2   3
    adv1    adv1    tac1    tac2    1   2
    adv1    adv2    tac1    tac1    1   2
...

我试过这个查询:

WITH adver_id_subquery AS
(
SELECT
user_id,
adver_id AS adver_id
FROM dataset1
GROUP BY user_id, adver_id
),
tactic_id_subquery AS
(
SELECT
user_id,
tactic_id AS tactic_id
FROM dataset1
GROUP BY user_id, tactic_id
)
SELECT
table1.adver_id AS adver_id, table1.adver_id AS adver_id_overlap, table2.tactic_id AS tactic_id, table2.tactic_id AS tactic_id_overlap, 
COUNT(*) AS unique_users
FROM adver_id_subquery AS table1
CROSS JOIN tactic_id_subquery AS table2
WHERE table1.user_id = table2.user_id 
GROUP BY adver_id,adver_id_overlap, tactic_id, tactic_id_overlap  
ORDER BY adver_id,adver_id_overlap, tactic_id, tactic_id_overlap

但结果与我需要的有点不同:

adver_id    adver_id_overlap    tactic_id   tactic_id_overlap   unique_users
adv1    adv1    tac1    tac1    2
adv1    adv1    tac2    tac2    1
adv2    adv2    tac1    tac1    1
adv2    adv2    tac2    tac2    1
adv2    adv2    tac3    tac3    1

上面的结果似乎只有重复行,例如:adv1-adv2、tac1-tac1、tac2-tac2 等。我希望看到重叠,例如:tac1-tac2、tac2-tac3 等。另外,我是无法获得 total_records。 Count(*) 似乎会导致 unique_users。

感谢您在获得所需结果方面的任何帮助。

【问题讨论】:

  • 请解释您想要的结果的逻辑。 “重叠”是什么意思?
  • 嗨@GordonLinoff,这是为了显示不同 adver_id 之间的重叠,关于被定位的唯一用户和总记录的策略。例如,我可以看到 tact1 和 tact2 被 2 个唯一用户看到,我们拥有的总记录是 3。希望这是有道理的。

标签: sql google-bigquery overlap


【解决方案1】:

您的评论表明您想要一个自加入和聚合。像这样的:

select d1.adveri_d, as adverid1, d2.adver_id as adverid2,
       d1.tactic_id as tactic_1, d2.tactic as tactic_2,
       count(*) as num_overlaps,
       count(distinct users) as num_users
from dataset1 d1 join
     dataset2 d2
     on d1.adver_id <> d2.adver_id or
        d1.tactic_id <> d2.tactic_id
group by 1, 2, 3, 4

【讨论】:

    猜你喜欢
    • 2013-12-07
    • 2014-12-23
    • 2022-12-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多