【问题标题】:Read Avro file with python to create a SQL table用python读取Avro文件创建SQL表
【发布时间】:2018-08-10 13:21:51
【问题描述】:

我正在尝试从包含我的表结构的 AVRO 文件创建一个 SQL 表:

{
  "type" : "record",
  "name" : "warranty",
  "doc" : "Schema generated by Kite",
  "fields" : [ {
    "name" : "id",
    "type" : "long",
    "doc" : "Type inferred from '1'"
  }, {
    "name" : "train_id",
    "type" : "long",
    "doc" : "Type inferred from '21691'"
  }, {
    "name" : "siemens_nr",
    "type" : "string",
    "doc" : "Type inferred from 'Loco-001'"
  }, {
    "name" : "uic_nr",
    "type" : "long",
    "doc" : "Type inferred from '193901'"
  }, {
    "name" : "Configuration",
    "type" : "string",
    "doc" : "Type inferred from 'ZP28'"
  }, {
    "name" : "Warranty_Status",
    "type" : "string",
    "doc" : "Type inferred from 'Out_of_Warranty'"
  }, {
    "name" : "Warranty_Data_Type",
    "type" : "string",
    "doc" : "Type inferred from 'Real_based_on_preliminary_acceptance_date'"
  }, {
    "name" : "of_progression",
    "type" : "long",
    "doc" : "Type inferred from '100'"
  }, {
    "name" : "Delivery_Date",
    "type" : "string",
    "doc" : "Type inferred from '18/12/2009'"
  }, {
    "name" : "Warranty_on_Delivery_Date",
    "type" : "string",
    "doc" : "Type inferred from '18/12/2013'"
  }, {
    "name" : "Customer_Status",
    "type" : "string",
    "doc" : "Type inferred from 'homologation'"
  }, {
    "name" : "Commissioning_Date",
    "type" : "string",
    "doc" : "Type inferred from '6/10/2010'"
  }, {
    "name" : "Preliminary_acceptance_date",
    "type" : "string",
    "doc" : "Type inferred from '6/01/2011'"
  }, {
    "name" : "Warranty_Start_Date",
    "type" : "string",
    "doc" : "Type inferred from '6/01/2011'"
  }, {
    "name" : "Warranty_End_Date",
    "type" : "string",
    "doc" : "Type inferred from '6/01/2013'"
  }, {
    "name" : "Effective_End_Warranty_Date",
    "type" : [ "null", "string" ],
    "doc" : "Type inferred from 'null'",
    "default" : null
  }, {
    "name" : "Level_2_in_function",
    "type" : "string",
    "doc" : "Type inferred from '17/07/2015'"
  }, {
    "name" : "Baseline",
    "type" : "string",
    "doc" : "Type inferred from '2.10.23.4'"
  }, {
    "name" : "TC_report",
    "type" : "string",
    "doc" : "Type inferred from 'A480140'"
  }, {
    "name" : "Last_version_Date",
    "type" : "string",
    "doc" : "Type inferred from 'A-23/09/2015'"
  } ]
}

为了完成这项工作,我正在使用(如果您有其他更简单的建议,那就太好了)

所以使用 python 我会得到这样的结果:

{'name':'id',type':'long','doc':'blablabla'}

我的问题是如何根据这个结果在 python 中创建一个 SQL 表?

感谢您的帮助

【问题讨论】:

  • 您需要根据您的结构描述,将SQL命令构建为字符串,然后连接到数据库,然后执行该语句。您需要哪方面的帮助?
  • 但我需要从生成的架构 avro 创建这个 sql 命令

标签: python sql avro apache-nifi create-table


【解决方案1】:

使用 json 模块,你可以从你的字符串中得到一个字典,然后你有一个字段定义数组。您遍历该数组以生成 SQL 语句。

注意:您将需要一些机制来将 avro 字段类型映射到 SQL 字段类型,特别是如果您有像 "type" : [ "null", "string" ] 这样的类型。

这是一个基于您的字符串构建 SQL CREATE TABLE 语句的代码的工作示例:

import json

schema_str = """{
  "type" : "record",
  "name" : "warranty",
  "doc" : "Schema generated by Kite",
  "fields" : [ {
    "name" : "id",
    "type" : "long",
    "doc" : "Type inferred from '1'"
  }, {
    "name" : "train_id",
    "type" : "long",
    "doc" : "Type inferred from '21691'"
  }, {
    "name" : "siemens_nr",
    "type" : "string",
    "doc" : "Type inferred from 'Loco-001'"
  } ]
}"""

schema = json.loads(schema_str)
fields =  schema['fields']

sql_string = 'CREATE TABLE ' + schema['name'] + ' ( \n'
for field in fields : 
    sql_string = sql_string + field['name'] + ' ' + field['type'] + ', \n'

sql_string = sql_string[:-3] + '\n)'  # get rid of last comma and close the field list

print sql_string

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-03-16
    • 2019-05-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多