【问题标题】:Flatten Firebase exports to BigQuery into tables where 1 row = 1 event (nested data within nested data)将 Firebase 导出到 BigQuery 到 1 行 = 1 个事件的表中(嵌套数据中的嵌套数据)
【发布时间】:2016-08-09 21:13:38
【问题描述】:

我想我可以通过参考一个更简单的数据示例here 提出一个更简单的问题来获得所需的信息,但我仍然需要一些帮助。

我对在 BigQuery 中查询 json 样式数据非常陌生,并且在处理 Firebase 为我转储到 BigQuery 中的分析(事件)数据时遇到了问题。 1行数据的格式如下(剪掉了一些绒毛)。

{
  "user_dim": {
    "user_id": "some_identifier_here",
    "user_properties": [
      {
        "key": "special_key1",
        "val": {
          "val": {
            "str_val": "894",
            "int_val": null
          }
        }
      },
      {
        "key": "special_key2",
        "val": {
          "val": {
            "str_val": "1",
            "int_val": null
          }
        }
      },
      {
        "key": "special_key3",
        "val": {
          "val": {
            "str_val": "23",
            "int_val": null
          }
        }
      }
    ],
    "device_info": {
      "device_category": "mobile",
      "mobile_brand_name": "Samsung",
      "mobile_model_name": "model_phone"
    },
    "dt_a": "1470625311138000",
    "dt_b": "1470620345566000"
  },
  "event_dim": [
    {
      "name": "user_engagement",
      "params": [
        {
          "key": "firebase_event_origin",
          "value": {
            "string_value": "auto",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "engagement_time_msec",
          "value": {
            "string_value": null,
            "int_value": "30006",
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470675614434000",
      "previous_timestamp_micros": "1470675551092000"
    },
    {
      "name": "new_game",
      "params": [
        {
          "key": "total_time",
          "value": {
            "string_value": "496048",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "armor",
          "value": {
            "string_value": "2",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "reason",
          "value": {
            "string_value": "power_up",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470675825988001",
      "previous_timestamp_micros": "1470675282500001"
    },
    {
      "name": "user_engagement",
      "params": [
        {
          "key": "firebase_event_origin",
          "value": {
            "string_value": "auto",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "engagement_time_msec",
          "value": {
            "string_value": null,
            "int_value": "318030",
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470675972778002",
      "previous_timestamp_micros": "1470675614434002"
    },
    {
      "name": "won_game",
      "params": [
        {
          "key": "total_time",
          "value": {
            "string_value": "497857",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "level",
          "value": {
            "string_value": null,
            "int_value": "207",
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "sword",
          "value": {
            "string_value": "iron",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470677171374007",
      "previous_timestamp_micros": "1470671343784007"
    }
  ]
}

根据我最初的问题的答案,我已经能够很好地处理对象user_dim 的第一部分。但是,每当我尝试对 event_dim 字段(取消嵌套)类似的方法时,查询都会失败并显示消息“错误:标量子查询产生了多个元素”。我怀疑这是因为event_dim 本身就是一个数组,并且包含其中也有数组的结构。

如果这有帮助,那就是给我错误的基本查询,尽管应该注意的是,我在 BQ 中处理这种类型的数据时已经完全脱离了我的元素,并且可能完全偏离了方向:

SELECT
  (SELECT name FROM UNNEST(event_dim) WHERE name = 'user_engagement') AS event_name
FROM
  my_table;

我想要的最终结果是一个查询,它可以将包含许多这些类型的对象的表转换为一个表,该表在每个对象event_dim 数组中的每个事件输出 1 行。即对于上面的示例对象,我希望它输出 4 行,其中第一组列是相同的,并且只是来自 user_dim 的元数据。然后,我想要可以根据我知道的每个可能事件存在的内容明确定义的列,例如event_name, firebase_event_origin, engagement_time_msec, total_time, armor, reason, level, sword,然后填充该事件参数中的值,如果不存在则填充为 NULL。

【问题讨论】:

  • 您能否分享给您“错误:标量子查询产生多个元素。”的查询?
  • @FelipeHoffa 已编辑

标签: google-bigquery


【解决方案1】:

基于 Mikhail 的回答,但基于实际的 Firebase 数据集:

SELECT 
  user_dim.app_info.app_instance_id,
  timestamp_micros,
  (SELECT value.int_value FROM UNNEST(dim.params) WHERE key = "level") AS level,
  (SELECT value.int_value FROM UNNEST(dim.params) WHERE key = "coins") AS coins,
  (SELECT value.int_value FROM UNNEST(dim.params) WHERE key = "powerups") AS powerups
FROM `dataset.table`, UNNEST(event_dim) AS dim
WHERE timestamp_micros=1464718937589000 

(保存在这里以备将来参考,更容易复制粘贴)

【讨论】:

    【解决方案2】:

    希望,下面能给你下一个推送

    WITH YourTable AS (
      SELECT ARRAY[
        STRUCT(
          "user_engagement" AS name,
          ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
            STRUCT("firebase_event_origin", STRUCT("auto", NULL)),
            STRUCT("engagement_time_msec", STRUCT("30006", NULL))] AS params,
          1470675614434000 AS TIMESTAMP_MICROS,
          1470675551092000 AS previous_timestamp_micros
        ),
        STRUCT(
          "new_game" AS name,
          ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
            STRUCT("total_time", STRUCT("496048", NULL)),
            STRUCT("armor", STRUCT("2", NULL)),
            STRUCT("reason", STRUCT("power_up", NULL))] AS params,
          1470675825988001 AS TIMESTAMP_MICROS,
          1470675282500001 AS previous_timestamp_micros
        ),
        STRUCT(
          "user_engagement" AS name,
          ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
            STRUCT("firebase_event_origin", STRUCT("auto", NULL)),
            STRUCT("engagement_time_msec", STRUCT("318030", NULL))] AS params,
          1470675972778002 AS TIMESTAMP_MICROS,
          1470675614434002 AS previous_timestamp_micros
        ),
        STRUCT(
          "won_game" AS name,
          ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
            STRUCT("total_time", STRUCT("497857", NULL)),
            STRUCT("level", STRUCT("207", NULL)),
            STRUCT("sword", STRUCT("iron", NULL))] AS params,
          1470677171374007 AS TIMESTAMP_MICROS,
          1470671343784007 AS previous_timestamp_micros
        )
      ] AS event_dim
    )
    SELECT 
      name, 
      (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "firebase_event_origin") AS firebase_event_origin,
      (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "engagement_time_msec") AS engagement_time_msec,
      (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "total_time") AS total_time,
      (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "armor") AS armor,
      (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "reason") AS reason,
      (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "level") AS level,
      (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "sword") AS sword
    FROM YourTable, UNNEST(event_dim) AS dim
    

    【讨论】:

      猜你喜欢
      • 2017-12-08
      • 1970-01-01
      • 1970-01-01
      • 2019-10-28
      • 2019-02-07
      • 2018-03-15
      • 2019-01-11
      • 2017-07-18
      • 1970-01-01
      相关资源
      最近更新 更多