【发布时间】:2022-11-03 16:48:47
【问题描述】:
我正在尝试转换以下架构;
|-- a: struct (nullable = true)
| |-- b: struct (nullable = true)
| | |-- one: double (nullable = true)
| | |-- two: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- three: string (nullable = true)
| | |-- four: boolean (nullable = true)
| |-- c: struct (nullable = true)
| | |-- one: double (nullable = true)
| | |-- two: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- three: string (nullable = true)
| | |-- four: boolean (nullable = true)
进入这个;
|-- a: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- struct_key: string (nullable = true)
| | |-- one: double (nullable = true)
| | |-- two: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- three: string (nullable = true)
| | |-- four: boolean (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- struct_key: string (nullable = true)
| | |-- one: double (nullable = true)
| | |-- two: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- three: string (nullable = true)
| | |-- four: boolean (nullable = true)
真的只是想获取结构键并将其转换为字符串并将其添加到列中。 数据集中的 b/c 结构很多,因此需要一些通配符来转换它们。 使用火花 3.2.1
数据是从 JSON 生成的,所以是这样读取的;
df = spark.read.json(json_file)
【问题讨论】:
-
selectExpr('array(a.*) as a')应该适用于您的情况
标签: apache-spark pyspark