【发布时间】:2018-11-15 22:12:59
【问题描述】:
如果自上次导入后发生更改,我从 Shopify 导入的订单会在 BigQuery 中为每个订单创建一个新条目,这样您就可以看到订单属性如何随时间而变化,而不仅仅是上次导入状态。这还会在表中为同一顺序创建多个条目,其中唯一的唯一部分是 _sdc_batched_at 和 sdc_sequence 值。我有时会看到多达 30 个相同顺序的条目。
表架构...
order:
order_number: Int
fulfillments: Array
_sdc_batched_at: DateTime
_sdc_sequence: Int
我做了什么...
我创建了一个分区表,该表基本上归结为给定日期范围和履行 > 0 之间的条目子集
减少数据集的初始查询...
with orders as (
select order_number, fulfillments, _sdc_batched_at, _sdc_sequence
from `project.shopify.orders`
where created_at between '2018-11-08' and '2018-11-15'
and ARRAY_LENGTH(fulfillments) > 0
)
问题...
我在尝试使用 distinct 或 group by 时遇到了问题,因为fulfills 是一个数组,这会让事情变得很糟糕。如何编写一个查询,只返回_sdc_batched_at 值的最新订单条目?
样本数据
[
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 02:46:21.270 UTC",
"_sdc_sequence": "1541817507934"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:16:16.606 UTC",
"_sdc_sequence": "1541819139795"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
"_sdc_sequence": "1541821046476"
},
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
"_sdc_sequence": "1541822755508"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 03:46:12.704 UTC",
"_sdc_sequence": "1541821046476"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
"_sdc_sequence": "1541822755508"
}
]
预期结果
仅返回 _sdc_batched_at 值的最新条目
{
"order_number": "5545",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:16:07.952 UTC",
"_sdc_sequence": "1541822755508"
},
{
"order_number": "2212",
"fulfillments": [
{
"tracking_url": null,
"id": "617029074993",
"tracking_company": "ups",
"tracking_number": "Z1234567890"
}
],
"_sdc_batched_at": "2018-11-10 04:1:07.952 UTC",
"_sdc_sequence": "1541822755508"
}
【问题讨论】:
-
提供您的数据的简化示例以及预期结果!
标签: google-bigquery shopify bigquery-standard-sql