【问题标题】:BigQuery / Shopify Order Data QueryBigQuery / Shopify 订单数据查询
【发布时间】:2018-11-15 22:12:59
【问题描述】:

如果自上次导入后发生更改,我从 Shopify 导入的订单会在 BigQuery 中为每个订单创建一个新条目,这样您就可以看到订单属性如何随时间而变化,而不仅仅是上次导入状态。这还会在表中为同一顺序创建多个条目,其中唯一的唯一部分是 _sdc_batched_atsdc_sequence 值。我有时会看到多达 30 个相同顺序的条目。

表架构...

order:
  order_number: Int
  fulfillments: Array
  _sdc_batched_at: DateTime
  _sdc_sequence: Int

我做了什么...

我创建了一个分区表,该表基本上归结为给定日期范围和履行 > 0 之间的条目子集

减少数据集的初始查询...

with orders as (
    select order_number, fulfillments, _sdc_batched_at, _sdc_sequence
    from `project.shopify.orders`
    where created_at between '2018-11-08' and '2018-11-15'
    and ARRAY_LENGTH(fulfillments) > 0
)

问题... 我在尝试使用 distinct 或 group by 时遇到了问题,因为fulfills 是一个数组,这会让事情变得很糟糕。如何编写一个查询,只返回_sdc_batched_at 值的最新订单条目?

样本数据

    [
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 02:46:21.270 UTC",
        "_sdc_sequence": "1541817507934"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 03:16:16.606 UTC",
        "_sdc_sequence": "1541819139795"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
        "_sdc_sequence": "1541821046476"
    },
    {
        "order_number": "5545",
        "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
        ],
        "_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
        "_sdc_sequence": "1541822755508"
    },
    {
        "order_number": "2212",
        "fulfillments": [
            {
                "tracking_url": null,
                "id": "617029074993",
                "tracking_company": "ups",
                "tracking_number": "Z1234567890"
            }
        ],
        "_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
        "_sdc_sequence": "1541821046476"
    },
    {
        "order_number": "2212",
        "fulfillments": [
            {
                "tracking_url": null,
                "id": "617029074993",
                "tracking_company": "ups",
                "tracking_number": "Z1234567890"
            }
        ],
        "_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
        "_sdc_sequence": "1541822755508"
    }
    ]

预期结果

仅返回 _sdc_batched_at 值的最新条目

{
    "order_number": "5545",
    "fulfillments": [
    {
        "tracking_url": null,
        "id": "617029074993",
        "tracking_company": "ups",
        "tracking_number": "Z1234567890"
    }
    ],
    "_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
    "_sdc_sequence": "1541822755508"
},
{
    "order_number": "2212",
    "fulfillments": [
        {
            "tracking_url": null,
            "id": "617029074993",
            "tracking_company": "ups",
            "tracking_number": "Z1234567890"
        }
    ],
    "_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
    "_sdc_sequence": "1541822755508"
}

【问题讨论】:

  • 提供您的数据的简化示例以及预期结果!

标签: google-bigquery shopify bigquery-standard-sql


【解决方案1】:

以下是 BigQuery 标准 SQL

SELECT AS VALUE ARRAY_AGG(t ORDER BY _sdc_batched_at DESC LIMIT 1)[OFFSET(0)] 
FROM `project.shopify.orders` t
GROUP BY order_number   

显然你可以为你的 WHERE 子句添加所有需要的东西

【讨论】:

  • 我将通过一些方案来确保这在标记为已解决之前适用于不同的数据集。到目前为止看起来很有希望。
  • 当然 :o) 祝你好运,如果使用这种方法有任何问题,请告诉我们
  • 标记为已解决。如果我自己无法弄清楚,我明天会有一个后续问题。从本质上讲,我对这些数据的最终目标是获得 order_number 和 tracking_number 的一对一列表。由于一个订单可能有许多跟踪号,我需要深入研究履行数组,抓取每个,提取 tracking_number,然后使用其父编号作为 order_number 值。
  • 当然。如果您的后续问题将在当前问题的范围内 - 只需使用 cmets。但如果这将是新问题 - 请将其作为新问题发布:o)
  • 我会提出一个新问题。它是相关的,但可能是未来搜索的一个新问题。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2022-07-05
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多