【问题标题】:GA BigQuery Export - COUNT(DISTINCT(fullVisitorId)) with source/medium overcountingGA BigQuery 导出 - COUNT(DISTINCT(fullVisitorId)) 与源/媒介多算
【发布时间】:2019-04-29 18:35:47
【问题描述】:

我在 GA BigQuery 导出中计算唯一身份用户时遇到问题。我使用示例数据重现了相同的错误。

SELECT sum(users) as users, sum(sessions) as sessions FROM (
  SELECT
    h.page.pagePath as page_path,
    trafficSource.source,
    trafficSource.medium,
    COUNT(DISTINCT(fullVisitorId)) AS users,
    COUNT(*) as sessions
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
  WHERE h.page.pagePath = "/home"
  GROUP BY page_path, source, medium
)
UNION ALL
SELECT sum(users) as users, sum(sessions) as sessions FROM (
  SELECT
    h.page.pagePath as page_path,
    COUNT(DISTINCT(fullVisitorId)) AS users,
    COUNT(*) as sessions
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
  WHERE h.page.pagePath = "/home"
  GROUP BY page_path
)

当我包含 sourcemedium 列时,不同的 fullVisitorId 计数比没有它们时高 10。包括这些列是如何导致fullVisitorIds 数量增加的?这对我来说没有意义。

这是什么原因造成的,我该如何获得准确的计数?

【问题讨论】:

    标签: google-analytics google-bigquery


    【解决方案1】:

    包含这些列如何导致 fullVisitorId 数量增加?这对我来说没有意义。

    如果你像这样运行你的内部查询,你就会明白为什么:

    SELECT
        MAX(fullVisitorId) AS fullVisitorId,
        h.page.pagePath as page_path,
        trafficSource.source,
        trafficSource.medium,
        COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
        COUNT(*) as sessions
      FROM
        `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
      WHERE h.page.pagePath = "/home"
      and fullVisitorId = '9902321252073939460'
      GROUP BY page_path, source, medium
    

    返回此结果:

    如您所见,因为用户来自 2 个不同的来源/媒体,您将同一用户计数两次,从而导致增加。

    解决此问题的一种方法是在源/媒体上使用聚合函数并将它们从GROUP BY 中删除,如下所示:

    
        SELECT sum(users) as users, sum(sessions) as sessions FROM (
          SELECT
            h.page.pagePath as page_path,
            MAX(trafficSource.source) as source,
            MAX(trafficSource.medium) as medium,
            COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
            COUNT(*) as sessions
          FROM
            `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
          WHERE h.page.pagePath = "/home"
          GROUP BY page_path
        )
        UNION ALL
        SELECT sum(users) as users, sum(sessions) as sessions FROM (
          SELECT
            h.page.pagePath as page_path,
            COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
            COUNT(*) as sessions
          FROM
            `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
          WHERE h.page.pagePath = "/home"
          GROUP BY page_path
        )
    

    现在用户数是一样的:

    【讨论】:

    • @josh 对此有何反馈?
    • 在被回答但被接受用于历史目的之间找到了这一点。感谢您抽出宝贵时间
    猜你喜欢
    • 2013-05-12
    • 1970-01-01
    • 2016-04-04
    • 1970-01-01
    • 2017-10-12
    • 1970-01-01
    • 2023-03-12
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多