【问题标题】:Json schema file will not execute in BigQuery Python APIJson 架构文件不会在 BigQuery Python API 中执行
【发布时间】:2021-05-09 14:05:22
【问题描述】:

我遇到了 Bigquery Python API 的问题。这是我执行脚本时的堆栈跟踪:

Traceback (most recent call last):
  File "createTable.py", line 17, in <module>
    open_schema()
  File "createTable.py", line 12, in open_schema
    table = bigquery.Table(table_id, schema=schema)
    ...
    "Schema items must either be fields or compatible "
ValueError: Schema items must either be fields or compatible mapping representations.

脚本很简单,打开一个schema文件并创建表:

   from google.cloud import bigquery
   # Construct a BigQuery client object.
   client = bigquery.Client()
   table_id = "project-py-290522:bq_dts.bq-test"
       
   def open_schema():
       with open("hcl-schema.json","r", encoding = "utf-8") as fName:
           schema = fName.readlines()
       
           table = bigquery.Table(table_id, schema=schema)
           print(repr(table))
           client.create_table(table)  # Make an API request.
       
   if __name__ == "__main__":
       open_schema()
       
   print("Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)) 

当我在控制台和 CLI 中执行架构时,表会按原样完美创建。控制台和 CLI 如何执行创建表但在 API 中阻塞。我已经搜索和搜索并没有找到答案。有人可以帮忙吗?

这是存储在 hcl-schema.json 文件中的架构。为简洁起见,我缩短了属性列表,但在其他方面保持不变:

    [  
      {
        "name":"user_id",
        "type":"STRING",
        "mode":"NULLABLE"
      },
      {
        "name":"msg_version",
        "type":"STRING",
        "mode":"REQUIRED"           
      },
      {
        "name":"APIStreamData",
        "type":"RECORD",
        "mode":"REQUIRED",
        "fields":
        [
          {
            "name":"msg_version",
            "type":"STRING",
            "mode":"REQUIRED"
          },
          {
            "name":"streams",
            "type":"RECORD",
            "mode":"REPEATED",
            "fields":
            [
                {
                  "name":"length",
                  "type":"STRING",
                  "mode":"REQUIRED"
                },                      
                {
                  "name":"cached",
                  "type":"STRING",
                  "mode":"NULLABLE"
                },  
              {
                "name":"track",
                "type":"RECORD",
                "mode":"REQUIRED",
                "fields":
                [
                  {
                    "name":"msg_version",
                    "type":"STRING",
                    "mode":"REQUIRED"
                  },
                  {
                    "name":"track_id",
                    "type":"STRING",
                    "mode":"REQUIRED"
                  }
                ]
              }    
    
            ]
          }
        ]
      }
    ]

谢谢

茫然和困惑

【问题讨论】:

  • 可能是文件编码的问题...尝试在table = bigquery.Table(table_id, schema=schema)之前打印出schema变量

标签: python json google-bigquery


【解决方案1】:

您可以使用 json 文件的 dict 表示,而不是原始问题中的字符串:

with open("schema.json") as json_file:
    schema_dict = json.load(json_file)
    table = bigquery.Table(table_id, schema=schema_dict)
    table = client.create_table(table)

【讨论】:

    【解决方案2】:

    我所指的以下文档显示了要在 Python 中指定的架构,如下所示 https://cloud.google.com/bigquery/docs/tables#creating_an_empty_table_with_a_schema_definition

    schema = [bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),bigquery.SchemaField("age", "INTEGER", mode="REQUIRED")]

    我尝试使用以下 JSON 和代码,效果很好。我认为您提供的 json 在 RECORD 中有 RECORD 所以我们需要相应地处理

    from google.cloud import bigquery
    import json
    # Construct a BigQuery client object.
    client = bigquery.Client()
    table_id = "my-project.mock_dataset.bq-test"
    
    def open_schema():
                
        bigquerySchema = []
        bigqueryfieldSchema = []
        with open('test.json') as f:
            bigqueryColumns = json.load(f)
            print(bigqueryColumns)
            for col in bigqueryColumns:
                if col['type'] != 'RECORD':
                    print(col['name'])
                    bigquerySchema.append(bigquery.SchemaField(col['name'], col['type'],mode=col['mode']))
                else:
                    for colfield in col['fields']:
                        bigqueryfieldSchema.append(bigquery.SchemaField(colfield['name'], colfield['type'],colfield['mode']))
                    print(bigqueryfieldSchema)
                    print(col['fields'])
                    bigquerySchema.append(bigquery.SchemaField(col['name'], col['type'],col['mode'],'',bigqueryfieldSchema))
            print(bigquerySchema)
    
        table = bigquery.Table(table_id, schema=bigquerySchema)
        print(repr(table))
        client.create_table(table)  # Make an API request.
        print("Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)) 
    
    if __name__ == "__main__":
        open_schema()
    
     [  
          {
            "name":"user_id",
            "type":"STRING",
            "mode":"NULLABLE"
          },
          {
            "name":"msg_version",
            "type":"STRING",
            "mode":"REQUIRED"           
          },
          {
            "name":"APIStreamData",
            "type":"RECORD",
            "mode":"REQUIRED",
            "fields":
            [
              {
                "name":"msg_version",
                "type":"STRING",
                "mode":"REQUIRED"
              },
              {
                 "name":"track_id",
                 "type":"STRING",
                 "mode":"REQUIRED"
              }
           ]
          }
     ]
    

    【讨论】:

    • 非常感谢!这是一个巨大的帮助。现在我需要将其转换为对 json 的递归潜入。我不知道我会遇到多少嵌套记录
    • 很高兴它对您有用。我认为根据您的嵌套级别,我们必须相应地更改脚本。
    • 好吧,从表面上看,它似乎在工作。然而,经过几次失败后,我发现我还必须考虑代码页。但是...
    • 我认为它可以帮助您解决问题,而不是将 json 作为模式传递,我们需要在使用 Python API 时将其作为 SchemaField 对象传递?如果是,请考虑接受并投票。 stackoverflow.com/help/someone-answers
    • 我刚遇到这个问题,似乎 json 文件的简单 dict 表示对我有用。在这里查看我的其他答案。
    猜你喜欢
    • 2018-11-25
    • 1970-01-01
    • 2022-01-22
    • 1970-01-01
    • 2015-10-23
    • 2017-01-03
    • 1970-01-01
    • 1970-01-01
    • 2020-03-22
    相关资源
    最近更新 更多