【问题标题】:BigQuery GA Open Funnel Legacy SQL: Exclude Sessions that have viewed certain pagesBigQuery GA Open Funnel Legacy SQL:排除查看过某些页面的会话
【发布时间】:2019-08-11 02:31:03
【问题描述】:

我正在尝试在 BigQuery 中重新创建 GA 漏斗,这个打开的漏斗会排除查看过某些页面的会话,我尝试使用以下内容:AND NOT REGEXP_MATCH,NOT IN,但它仍然无法正常工作,我是仍然获得查看我要排除的页面的会话。

如果可能的话,我也想让它成为一个封闭的漏斗,此代码返回一个开放的漏斗。

另外,有没有更好的方法用标准 SQL 编写这个查询?

在这些方面需要帮助。谢谢。

SELECT COUNT(s0.firstHit) AS _test_your_details,
总和(s0.exit)作为_test_your_details_exits,
COUNT(s1.firstHit) AS _test_additional_new_details,
SUM(s1.exit) AS _test_additional_new_details_exits,
COUNT(s2.firstHit) AS _test_new_dress,
总和(s2.exit)作为_test_new_dress_exits,
COUNT(s3.firstHit) AS _test_test_details,
SUM(s3.exit) AS _test_test_details_exits,
COUNT(s4.firstHit) AS _test_cover_for_the_test,
总和(s4.exit)作为_test_cover_for_the_test_exits,
COUNT(s5.firstHit) AS _test_your_order,
SUM(s5.exit) AS _test_your_order_exits
从
  (选择 s0.fullVisitorId,
          s0.visitId,
          s0.firstHit,
          s0.退出,
          s1.firstHit,
          s1.退出,
          s2.firstHit,
          s2.退出,
          s3.firstHit,
          s3.退出,
          s4.firstHit,
          s4.退出,
          s5.firstHit,
          s5.exit
   从
     (选择 s0.fullVisitorId,
             s0.visitId,
             s0.firstHit,
             s0.退出,
             s1.firstHit,
             s1.退出,
             s2.firstHit,
             s2.退出,
             s3.firstHit,
             s3.退出,
             s4.firstHit,
             s4.退出
      从
        (选择 s0.fullVisitorId,
                s0.visitId,
                s0.firstHit,
                s0.退出,
                s1.firstHit,
                s1.退出,
                s2.firstHit,
                s2.退出,
                s3.firstHit,
                s3.退出
         从
           (选择 s0.fullVisitorId,
                   s0.visitId,
                   s0.firstHit,
                   s0.退出,
                   s1.firstHit,
                   s1.退出,
                   s2.firstHit,
                   s2.退出
            从
              (选择 s0.fullVisitorId,
                      s0.visitId,
                      s0.firstHit,
                      s0.退出,
                      s1.firstHit,
                      s1.退出
               从
                 (选择 fullVisitorId,
                         访问ID,
                         MIN(hits.hitNumber) AS firstHit,
                         MAX(IF(hits.isExit, 1, 0)) AS 退出
                  FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
                  WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 你的详细信息')
                    AND totals.visits = 1
                    AND channelGrouping NOT LIKE '%organic%'
                   AND hits.page.pagePath NOT in ('/test - 附加测试详细信息', '/test - 测试服装', '/test - 封面礼服')
                   AND NOT REGEXP_MATCH(hits.page.pagePath, r"^/(测试 - 附加测试细节|测试 - 测试服|测试 - 封面礼服)")
                  GROUP BY fullVisitorId,
                           访问ID)s0
               完全外连接
                 (选择 fullVisitorId,
                         访问ID,
                         MIN(hits.hitNumber) AS firstHit,
                         MAX(IF(hits.isExit, 1, 0)) AS 退出
                  FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
                  WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 额外的新细节')
                    AND totals.visits = 1
                    AND channelGrouping NOT LIKE '%organic%'
                  GROUP BY fullVisitorId,
                           visitId) s1 ON s0.fullVisitorId = s1.fullVisitorId
               AND s0.visitId = s1.visitId) s01
            完全外连接
              (选择 fullVisitorId,
                      访问ID,
                      MIN(hits.hitNumber) AS firstHit,
                      MAX(IF(hits.isExit, 1, 0)) AS 退出
               FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
               WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 新衣服')
                 AND totals.visits = 1
                 AND channelGrouping NOT LIKE '%organic%'
               GROUP BY fullVisitorId,
                        visitId) s2 ON s0.fullVisitorId = s2.fullVisitorId
            AND s0.visitId = s2.visitId) s012
         完全外连接
           (选择 fullVisitorId,
                   访问ID,
                   MIN(hits.hitNumber) AS firstHit,
                   MAX(IF(hits.isExit, 1, 0)) AS 退出
            FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
            WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 测试详情')
              AND totals.visits = 1
              AND channelGrouping NOT LIKE '%organic%'
            GROUP BY fullVisitorId,
                     visitId) s3 ON s0.fullVisitorId = s3.fullVisitorId
         AND s0.visitId = s3.visitId) s0123
      完全外连接
        (选择 fullVisitorId,
                访问ID,
                MIN(hits.hitNumber) AS firstHit,
                MAX(IF(hits.isExit, 1, 0)) AS 退出
         FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
         WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 测试封面')
           AND totals.visits = 1
          AND channelGrouping NOT LIKE '%organic%'
          AND hits.page.pagePath 不在 ('/test - 附加测试详细信息', '/test - 测试服')
         GROUP BY fullVisitorId,
                  visitId) s4 ON s0.fullVisitorId = s4.fullVisitorId
      AND s0.visitId = s4.visitId) s01234
   完全外连接
     (选择 fullVisitorId,
             访问ID,
             MIN(hits.hitNumber) AS firstHit,
             MAX(IF(hits.isExit, 1, 0)) AS 退出
      FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
      WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 你的订单')
        AND totals.visits = 1
        AND channelGrouping NOT LIKE '%organic%'
        AND hits.page.pagePath 不在 ('/test - 附加测试详细信息', '/test - 测试服')
         AND NOT REGEXP_MATCH(hits.page.pagePath, r"^/(测试 - 附加测试细节|测试 - 测试服|测试 - 封面礼服)")
      GROUP BY fullVisitorId,
               visitId) s5 ON s0.fullVisitorId = s5.fullVisitorId
   AND s0.visitId = s5.visitId) s012345

【问题讨论】:

    标签: google-analytics google-bigquery bigquery-standard-sql legacy-sql


    【解决方案1】:

    在标准 SQL 中,您可以在 hits 上编写一个简单的子查询来检查。例如:

    SELECT 
      fullvisitorid, visitstarttime,
      ARRAY(
        SELECT AS STRUCT hitNumber, type, page FROM t.hits ORDER BY hitNumber
      ) hits
    FROM
        `bigquery-public-data.google_analytics_sample.ga_sessions_20161104` t
    WHERE 
      -- exclude sessions with pages containing '/asearch.html'
      -- subquery checks for occurences in the whole query and returns boolean TRUE if found 
      -- NOT turns it into FALSE which filters it out
      NOT (SELECT COUNT(1)>0 FROM t.hits WHERE page.pagePath = '/asearch.html')
    ORDER BY array_length(hits) DESC
    LIMIT 1000
    

    我还编写了一个子查询来显示数组中会话的命中。 在旧版 SQL 中,您将使用 OMIT RECORD IF:

    SELECT 
      fullvisitorid, visitstarttime, hits.page.pagePath
    FROM
        [bigquery-public-data:google_analytics_sample.ga_sessions_20161104] t
    -- OMIT RECORD IF excludes on record level 
    -- if dimension is below record level, you need to aggregate (like with WITHIN)
    -- in this case I used MAX() to surface any possible TRUE resulting from the comparison
    OMIT RECORD IF MAX(hits.page.pagePath = '/asearch.html')
    LIMIT 1000
    

    希望有帮助!

    【讨论】:

    • 太棒了。谢谢你,会尝试看看它是否有效。 OMIT RECORD IF 是否允许多个参数?所以在我的情况下,多页......还有。任何想法如何使它成为一个封闭的漏斗,目前 SQL 查询是针对 BigQuery 中的一个开放的 GA 漏斗。
    • 当然...请参阅cloud.google.com/bigquery/docs/reference/legacy-sql#omit 并简单地将多个参数与 OR 结合起来
    猜你喜欢
    • 2015-04-30
    • 2019-08-28
    • 1970-01-01
    • 2018-05-31
    • 1970-01-01
    • 2016-09-23
    • 1970-01-01
    • 2017-11-04
    • 1970-01-01
    相关资源
    最近更新 更多