【问题标题】:Getting error while inserting array of json to bigQuery将json数组插入bigQuery时出错
【发布时间】:2019-01-03 10:51:48
【问题描述】:

读取数据时出错,错误信息:解析JSON失败:新数组启动时未找到对象。; BeginArray 返回 false

我在下面创建了示例 json 数据。这是json数组。并且每个 json 对象都在新行中。

当我只加载一个 json 对象而不保留在数组中时,它可以工作。

JSON 正文 -

[
  { "item_name": "dfkhjf", "gtin": "123456", "brand": "Om-Publication","category_name": "books", "country_code": "IN", "marktet_place": "india", "price": 2239, "sellerId": 234, "create_time": "2017-07-19T16:00:46.000Z" },
  { "item_name": "toy-gun", "gtin": "1234234445", "brand": "Toy", "category_name": "toy", "country_code": "IN", "marktet_place": "flipMe", "price": 2239, "sellerId": 234, "create_time": "2017-08-19T16:00:46.000Z" },
  { "item_name": "Drone", "gtin": "12342356456", "brand": "Drone-XX", "category_name": "drone", "country_code": "IN", "marktet_place": "drone-maker", "price": 2239, "sellerId": 234, "create_time": "2017-09-19T16:00:46.000Z" }
]

【问题讨论】:

  • 你能解释一下你是如何执行插入命令的并提供一个例子
  • 我已将此数据上传到 GCS 存储桶中的文件中,例如 -mydata.json。然后从谷歌控制台webUI,我创建表保持模式自动检测为真。并从 GCS 上传此文件。
  • 如果您想在 BigQuery 中获取 3 行,您可以从 JSON 中删除 [],它应该可以工作
  • 我确实试过了。它起作用了

标签: google-bigquery


【解决方案1】:

正如loading JSON data stored in GCS to BigQuery 的文档中所述,JSON 数据必须在Newline Delimited JSON format 中,其中每一行都是有效的独立 JSON 值,因此您应该使用 (2) 来代替 (1):

(1)

[
  { "item_name": "dfkhjf", "gtin": "123456", "brand": "Om-Publication","category_name": "books", "country_code": "IN", "marktet_place": "india", "price": 2239, "sellerId": 234, "create_time": "2017-07-19T16:00:46.000Z" },
  { "item_name": "toy-gun", "gtin": "1234234445", "brand": "Toy", "category_name": "toy", "country_code": "IN", "marktet_place": "flipMe", "price": 2239, "sellerId": 234, "create_time": "2017-08-19T16:00:46.000Z" },
  { "item_name": "Drone", "gtin": "12342356456", "brand": "Drone-XX", "category_name": "drone", "country_code": "IN", "marktet_place": "drone-maker", "price": 2239, "sellerId": 234, "create_time": "2017-09-19T16:00:46.000Z" }
]

(2)

{ "item_name": "dfkhjf", "gtin": "123456", "brand": "Om-Publication","category_name": "books", "country_code": "IN", "marktet_place": "india", "price": 2239, "sellerId": 234, "create_time": "2017-07-19T16:00:46.000Z" }
{ "item_name": "toy-gun", "gtin": "1234234445", "brand": "Toy", "category_name": "toy", "country_code": "IN", "marktet_place": "flipMe", "price": 2239, "sellerId": 234, "create_time": "2017-08-19T16:00:46.000Z" }
{ "item_name": "Drone", "gtin": "12342356456", "brand": "Drone-XX", "category_name": "drone", "country_code": "IN", "marktet_place": "drone-maker", "price": 2239, "sellerId": 234, "create_time": "2017-09-19T16:00:46.000Z" }

更新:

这里有一个分步指南来展示它是如何工作的:

使用我共享的内容创建一个 JSON 文件(在我的情况下为 file.json)(确保删除数组括号 [] 以及每行末尾的逗号 ,

$ cat file.json
{ "item_name": "dfkhjf", "gtin": "123456", "brand": "Om-Publication","category_name": "books", "country_code": "IN", "marktet_place": "india", "price": 2239, "sellerId": 234, "create_time": "2017-07-19T16:00:46.000Z" }
{ "item_name": "toy-gun", "gtin": "1234234445", "brand": "Toy", "category_name": "toy", "country_code": "IN", "marktet_place": "flipMe", "price": 2239, "sellerId": 234, "create_time": "2017-08-19T16:00:46.000Z" }
{ "item_name": "Drone", "gtin": "12342356456", "brand": "Drone-XX", "category_name": "drone", "country_code": "IN", "marktet_place": "drone-maker", "price": 2239, "sellerId": 234, "create_time": "2017-09-19T16:00:46.000Z" }

将文件加载到 BQ,运行如下命令:

$ bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON dataset.table file.json                                   
Upload complete.
Waiting on bqjob_XXXXXXXXXXX ... (1s) Current status: DONE

现在查询表以检查内容是否正确上传:

$ bq query --use_legacy_sql=false "SELECT * FROM dataset.table"
Waiting on bqjob_r3ef14ac0d0a6c856_000001681819e9fc_1 ... (0s) Current status: DONE
+----------+-------+---------------------+---------------+----------------+--------------+---------------+-------------+-----------+
| sellerId | price |     create_time     | marktet_place |     brand      | country_code | category_name |    gtin     | item_name |
+----------+-------+---------------------+---------------+----------------+--------------+---------------+-------------+-----------+
|      234 |  2239 | 2017-07-19 16:00:46 | india         | Om-Publication | IN           | books         |      123456 | dfkhjf    |
|      234 |  2239 | 2017-08-19 16:00:46 | flipMe        | Toy            | IN           | toy           |  1234234445 | toy-gun   |
|      234 |  2239 | 2017-09-19 16:00:46 | drone-maker   | Drone-XX       | IN           | drone         | 12342356456 | Drone     |
+----------+-------+---------------------+---------------+----------------+--------------+---------------+-------------+-----------+

【讨论】:

  • @jyoti 我已经更新了我的答案,展示了使负载正常工作的分步过程。如果您仍然遇到此问题,请尽可能具体地说明您看到的问题/错误。
  • 如果 json 对象具有完全相同的内容但在一行中,OP 的加载作业是否应该工作?我知道一旦它在 bq 中,你就必须 UNNEST。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2016-08-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-10-12
  • 2022-12-10
  • 2015-04-18
相关资源
最近更新 更多