【发布时间】:2019-11-14 10:45:00
【问题描述】:
我正在使用 SQL 创建一个步骤漏斗报告。
它返回如下行:
delivered_email,anonymous_id,opened_email,step1_delivered,step2_opened,step3_landing_page,step4_cta_clicked,steps_completed
email1@example.com,,,true,false,false,false,1
email2@example.com,id2,email2@example.com,true,true,true,true,4
email2@example.com,id3,email2@example.com,true,true,false,false,2
同一个电子邮件地址有多个条目,因为这些人参与了多个会话。但是,在这种情况下,我只对完成最多步骤的每个人的会话感兴趣。例如。上述情况下的实际结果应该是 2 行而不是 3 行,其中对于 email2@example.com 仅返回 steps_completed = 4 的情况:
delivered_email,anonymous_id,opened_email,step1_delivered,step2_opened,step3_landing_page,step4_cta_clicked,steps_completed
email1@example.com,,,true,false,false,false,1
email2@example.com,id2,email2@example.com,true,true,true,true,4
通常可以通过将结果与每个用户的max(steps_completed) 连接起来,作为described on Stackoverflow。但是,在我的情况下,steps_completed 列实际上是作为另一个子查询的一部分计算的。因此,在其上创建连接需要我复制粘贴整个子查询,而这将无法维护。
这是查询:
select
*
from
(
-- Counts for each sesssion how many steps were completed
-- This can be used to only select the session with the most steps completed for each unique email address
select
*,
if(step1_delivered, 1, 0) +
if(step2_opened, 1, 0) +
if(step3_landing_page, 1, 0) +
if(step4_cta_clicked, 1, 0)
as steps_completed
from
(
-- Below subquery combines email addresses with associated anonymous_ids
-- Note that a single email may have multiple entries here if they used multiple devices
-- In the rest of the funnel we are interested only in the case grouped by email with the most steps completed
select
t_delivered.email as delivered_email,
t_identifies.id as anonymous_id,
t_opened.email as opened_email,
t_delivered.email is not null as step1_delivered,
coalesce(t_opened.email, t_identifies.id) is not null as step2_opened,
t_landing_page.id is not null as step3_landing_page,
t_cta_clicked.id is not null as step4_cta_clicked
-- Step 1: Retrieve emails to which opener was sent
from
(
select context_traits_email as email
from drip.email_delivered
where email_subject like '%you are invited%'
group by email
) as t_delivered
-- Retrieve the anonymous_id for each email, if set (i.e. if identified)
-- Note that if we have identified a user we will assume they have opened the email
left join
(
select
email,
anonymous_id as id
from javascript.identifies
group by email, anonymous_id
) as t_identifies
on t_identifies.email = t_delivered.email
-- Step 2: retrieve which email addresses opened the opener email
left join
(
select context_traits_email as email
from drip.email_opened
group by email
) as t_opened
on t_opened.email = t_delivered.email
-- Step 3: landing page visited
left join
(
select anonymous_id as id
from javascript.pages
where context_page_title = 'Homepage'
group by anonymous_id
) as t_landing_page
on t_landing_page.id = t_identifies.id
-- Step 4: CTA clicked
left join
(
select anonymous_id as id
from javascript.dtc_file_selection_initiated
group by anonymous_id
) as t_cta_clicked
on t_cta_clicked.id = t_identifies.id
)
)
我如何将这个结果按delivered_email 分组,而结果(分组前)按steps_completed (desc) 排序而不重复我的子查询?
【问题讨论】:
-
你能把你的子查询变成一个视图,然后加入到需要的视图中吗?
-
@alexherm 有效,尽管它需要我保持一个单独的视图——你认为这是唯一的方法吗?
-
我确信还有另一种方法。但如果这行得通,那就去吧。维护是指设置一次还是需要定期更新?
-
@alexherm 我需要定期更新它。那里有一些子查询可以查询用户细分等,我需要根据我感兴趣的细分进行更改
-
minimal reproducible example 请。 PS你的问题是什么? “在结果(分组前)按steps_completed排序时,通过delivered_email对结果进行分组”是什么意思?表格没有顺序,所以在 group by 之前排序没有限制/顶部没有效果,但这显然不是你想要的,所以你想要什么?使用足够的单词和对部分示例的引用。当不介绍或总结全部细节时,“基本上”也只是意味着“不清楚”。 PS也不清楚这篇文章和代码与“另一个子查询”和避免“加入它”有什么关系。 PS 通过编辑而不是 cmets 进行澄清。
标签: sql join google-bigquery