【发布时间】:2021-12-28 22:40:03
【问题描述】:
val someDF = Seq(
(8, """{"details":{"decision":"ACCEPT","source":"Rules"}"""),
(64, """{"details":{"decision":"ACCEPT","source":"Rules"}""")
).toDF("number", "word")
someDF.show(false):
+------+---------------------------------------------------------------+
|number|word |
+------+---------------------------------------------------------------+
|8 |{"details":{"decision":"ACCEPT","source":"Rules"} |
|64 |{"details":{"decision":"ACCEPT","source":"Rules"} |
+------+---------------------------------------------------------------+
问题陈述: 我想将所有列合并为 1 列,其中 JSON 类型保留在单个输出列中。就像我在下面得到的那样,这不是引号等的转义。
我尝试了什么:
someDF.toJSON.toDF.show(false)
// this escaped the quotes, which I don't want
+------------------------------------------------------------------------------------------------+
|value |
+------------------------------------------------------------------------------------------------+
|{"number":8,"word":"{\"details\":{\"decision\":\"ACCEPT\",\"source\":\"Rules\"}"} |
|{"number":64,"word":"{\"details\":{\"decision\":\"ACCEPT\",\"source\":\"Rules\"}"} |
+------------------------------------------------------------------------------------------------+
someDF.select( to_json(struct(col("*"))).alias("value")) 也有同样的问题
我想要什么:
+------------------------------------------------------------------------------------------------+
|value |
+------------------------------------------------------------------------------------------------+
|{"number":8,"word":{"details":{"decision":"ACCEPT","source":"Rules"}}} |
|{"number":64,"word":{"details":{"decision":"ACCEPT","source":"Rules"}}} |
+------------------------------------------------------------------------------------------------+
有没有办法做到这一点?
更新: 虽然我在这里使用了一个简单的数据框,但实际上我有数百列,因此手动定义的架构对我不起作用。
【问题讨论】:
标签: scala apache-spark apache-spark-sql