【发布时间】:2020-12-06 06:42:02
【问题描述】:
我有一个这样的数据框:
root
|-- runKeyId: string (nullable = true)
|-- entities: string (nullable = true)
+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities |
+--------+--------------------------------------------------------------------------------------------+
|1 |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339},{"Partition":{"Name":"DDD"},"id":339}|
我想用 scala 来解释一下:
+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities |
+--------+--------------------------------------------------------------------------------------------+
|1 |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339}
+--------+--------------------------------------------------------------------------------------------+
|2 |{"Partition":{"Name":"DDD"},"id":339}
+--------+--------------------------------------------------------------------------------------------+
【问题讨论】:
-
您是如何阅读文件的?它看起来像jsonl格式,那么你可以简单地阅读
spark.read.json("json_path")自动将json分隔到行。 -
这里输入我得到它作为一个字符串而不是 json
-
你是如何读取输入 jsons 的数据的?
-
val parseDF = decompressDataDF .select($"_1.entities")
-
我在这里提供了类似问题的答案。请看一下 - stackoverflow.com/a/63375812/4758823
标签: arrays json scala apache-spark