【发布时间】:2020-04-23 17:45:12
【问题描述】:
我有一些带有架构的 pyspark 数据框:
|-- doc_id: string (nullable = true)
|-- msp_contracts: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _VALUE: string (nullable = true)
| | |-- _el1: string (nullable = true)
| | |-- _el2: long (nullable = true)
| | |-- _el3: string (nullable = true)
| | |-- _el4: string (nullable = true)
| | |-- _el5: string (nullable = true)
我如何得到这个数据框:
|-- doc_id: string (nullable = true)
|-- _el1: string (nullable = true)
|-- _el3: string (nullable = true)
|-- _el4: string (nullable = true)
|-- _el5: string (nullable = true)
我尝试选择:
explode('msp_contracts').select(
col(u'msp_contracts.element._el1'),
col(u'msp_contracts.element._el2')
)
但我可能有错误:
'Column' object is not callable
【问题讨论】:
-
试试:
df.selectExpr("inline_outer(msp_contracts)").drop("_VALUE", "_el2").show()