【问题标题】:Scala - How to Convert String Column to Array of JsonScala - 如何将字符串列转换为 Json 数组
【发布时间】:2020-04-01 07:31:21
【问题描述】:

使用Below DataFrame我得到一个Json数组,但数据类型是字符串,我正在寻找帮助将这个字符串转换为JSON数组。

val rawDF = spark.sql("select 1").withColumn("parent_id", lit("Parent_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]"""))
        rawDF.show(false)

输入输出数据框:

Input DataFrame :

+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+
|item_id   |s_tag  |jsonString                                                                                                                         |
+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+
|Item_12345|S_12345|[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]      |
+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+


Output DataFrame :
+----------+-------+-----------------------------------------+
|item_id   |s_tag  |jsonString                               |
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
+----------+-------+-----------------------------------------+

问题陈述:

jsonString 是字符串数据,但看起来像 json 数组,我想将此列转换/转换为 Json 数组以拆分为可能的行数 作为输出数据帧。

到目前为止我已经尝试过什么:

val jsonArray = udf((value: String) => new JSONArray(value)) // Or how to convert as Array of json.

val strToJsonArray = rawDF.withColumn("arrJson", jsonArray(rawDF("jsonString"))).drop("jsonString") //This is not working.

//If We can convert To Array then using below code I can Split the Json Column in expected Output.
val splittedDF = strToJsonArray.withColumn("splittedJson", explode(strToJsonArray.col("arrJson"))).drop("arrJson")

如何将我的字符串转换为 JSON 值数组?

【问题讨论】:

    标签: arrays json scala apache-spark explode


    【解决方案1】:

    这种情况不需要UDF,我们可以使用spark内置函数split,regexp_replace,explode

    Example:

    //sample data
    val rawDF = spark.sql("""select string("Item_12345") as item_id""").withColumn("s_tag", lit("S_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]"""))
    
    //to make valid array we first replace (},) with (}},) then remove ("[|]") and split on (},) it results array finally we explode on the array. 
    rawDF.
    selectExpr("item_id","s_tag","""explode(split(regexp_replace(regexp_replace(jsonString,'(\\\},)','}},'),'(\\\[|\\\])',''),"},")) as jsonString""").
    show(false)
    
    //+----------+-------+-----------------------------------------+
    //|item_id   |s_tag  |jsonString                               |
    //+----------+-------+-----------------------------------------+
    //|Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
    //|Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
    //|Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
    //+----------+-------+-----------------------------------------+
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-03-12
    • 2015-07-06
    • 1970-01-01
    • 2017-07-20
    • 2015-04-05
    • 2011-11-22
    • 2022-01-27
    相关资源
    最近更新 更多