【发布时间】:2021-07-09 00:44:52
【问题描述】:
我有一个将普通列与 Json 列混合的平面文件
2020-08-05 00:00:04,489|{"Colour":"Blue", "Reason":"Sky","number":"1"}
2020-10-05 00:00:04,489|{"Colour":"Yellow", "Reason":"Flower","number":"2"}
我想用 pyspark 把它弄平:
|Timestamp|Colour|Reason|
|--------|--------|--------|
|2020-08-05 00:00:04,489|Blue| Sky|
|2020-10-05 00:00:04,489|Yellow| Flower|
目前我只能弄清楚如何使用 spark.read.json 和 Map 转换 Json,但是如何组合时间戳等常规列?
【问题讨论】:
标签: json apache-spark pyspark apache-spark-sql