【发布时间】:2021-07-21 15:03:16
【问题描述】:
您好,我目前正在尝试建立从 Google Cloud Storage 到 Big Query 表的每日数据传输。这些表仅用于存储原始数据(JSON 文件),稍后我会使用预定查询将它们取消嵌套。我手动创建表时没有问题,但是当我启动测试传输时出现错误 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1(即使我之前删除了表中的数据)。
以下是原始数据表/JSON 模式的示例:
{
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "relationships",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "createdBy",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "validationWorkflow",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "projects",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "expensesReport",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "agency",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "files",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "STRING",
"mode": "REPEATED"
}]
}
}, {
"name": "resource",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "validations",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}, {
"name": "orders",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "data",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}]
}
}]
}
}, {
"name": "attributes",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "plannedTimes",
"type": "STRING",
"mode": "REPEATED"
}, {
"name": "state",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "exceptionalTimes",
"type": "STRING",
"mode": "REPEATED"
}, {
"name": "closed",
"type": "BOOLEAN",
"mode": "NULLABLE"
}, {
"name": "informationComments",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "regularTimes",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "batch",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "title",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "workUnitType",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "name",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "activityType",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "reference",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "project",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "reference",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "row",
"type": "INTEGER",
"mode": "NULLABLE"
}, {
"name": "delivery",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "endDate",
"type": "DATE",
"mode": "NULLABLE"
}, {
"name": "startDate",
"type": "DATE",
"mode": "NULLABLE"
}, {
"name": "title",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "startDate",
"type": "DATE",
"mode": "NULLABLE"
}, {
"name": "duration",
"type": "FLOAT",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "updateDate",
"type": "TIMESTAMP",
"mode": "NULLABLE"
}, {
"name": "creationDate",
"type": "TIMESTAMP",
"mode": "NULLABLE"
}, {
"name": "absencesTimes",
"type": "RECORD",
"mode": "REPEATED",
"schema": {
"fields": [{
"name": "workUnitType",
"type": "RECORD",
"mode": "NULLABLE",
"schema": {
"fields": [{
"name": "name",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "activityType",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "reference",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "duration",
"type": "FLOAT",
"mode": "NULLABLE"
}, {
"name": "startDate",
"type": "DATE",
"mode": "NULLABLE"
}]
}
}, {
"name": "term",
"type": "STRING",
"mode": "NULLABLE"
}]
}
}, {
"name": "type",
"type": "STRING",
"mode": "NULLABLE"
}, {
"name": "id",
"type": "INTEGER",
"mode": "NULLABLE"
}]
}
}, {
"name": "date",
"type": "DATE",
"mode": "NULLABLE"
}]
}
我知道 BQ 更适合 JSON 换行分隔格式,但该表必须包含原始数据,即使它最后只有 1 行。最奇怪的是,传输适用于某些具有相似架构的文件。
我应该怎么做才能使这些转移工作?
感谢您的帮助
【问题讨论】:
-
所以您正在尝试使用数据传输服务将 json 文件从 GCS 加载到 BigQuery 中的表中,该表具有您共享的特定架构,对吧?
-
是的,就像日常生活一样。
标签: json google-bigquery google-cloud-storage transfer