【发布时间】:2020-07-11 12:08:27
【问题描述】:
Databricks 的新手。有一个我从中创建数据框的 SQL 数据库表。其中一列是 JSON 字符串。我需要将嵌套的 JSON 分解为多个列。已使用此 post 和此 post 将我带到现在的位置。
示例 JSON:
{
"Module": {
"PCBA Serial Number": "G7456789",
"Manufacturing Designator": "DISNEY",
"Firmware Version": "0.0.0",
"Hardware Revision": "46858",
"Manufacturing Date": "10/17/2018 4:04:25 PM",
"Test Result": "Fail",
"Test Start Time": "10/22/2018 6:14:14 AM",
"Test End Time": "10/22/2018 6:16:11 AM"
}
到目前为止的代码:
#define schema
schema = StructType(
[
StructField('Module',ArrayType(StructType(Seq
StructField('PCBA Serial Number',StringType,True),
StructField('Manufacturing Designator',StringType,True),
StructField('Firmware Version',StringType,True),
StructField('Hardware Revision',StringType,True),
StructField('Test Result',StringType,True),
StructField('Test Start Time',StringType,True),
StructField('Test End Time',StringType,True))), True) ,True),
StructField('Test Results',StringType(),True),
StructField('HVM Code Errors',StringType(),True)
]
#use from_json to explode json by applying it to column
df.withColumn("ActivityName", from_json("ActivityName", schema))\
.select(col('ActivityName'))\
.show()
错误:
SyntaxError: invalid syntax
File "<command-1632344621139040>", line 10
StructField('PCBA Serial Number',StringType,True),
^
SyntaxError: invalid syntax
【问题讨论】:
标签: json pyspark apache-spark-sql pyspark-sql azure-databricks