【问题标题】:UDF over the array elements in Pyspark also add the static elementPyspark 中数组元素上的 UDF 还添加了静态元素
【发布时间】:2021-08-16 09:45:09
【问题描述】:

我有一个如下所示的数据框,

df.select("col1").show(1,False)
col1
-------------
[[2,1,0,1,,free],[3,1,0,1,4,free]]

另一种展示方式:)

df.select(to_json(struct("col1")))show(1,False)

col1
-----------------
{"col1":[{ "0":"2","1":"1","2":"0","3":"1","5":"free"},{"0":"3","1":"1","2":"0","3":"1","4":"4","5":"free"}]}

现在我想实现下面的数据框,有一个结构要从现有的列中创建,还需要添加新的静态字段'value:zzz'

col1
--------------
{"col1":[{"1":"1","2":"0","3":"1","5":"free","value":"ZZZ","newattrib":{"0":"2"}},{"1":"1","2":"0","3":"1","4":"4","5":"free","value":"ZZZ","newattrib":{"0":"3"}}]}

请向我建议实现这一目标的方法。

【问题讨论】:

  • 只看输入/输出,我们应该明白你在做什么?请逻辑解释...

标签: apache-spark pyspark


【解决方案1】:

使用transform函数

df = (df
      .selectExpr("""transform(col1, v -> struct(v.`1` `1`,
                                                 v.`2` `2`,
                                                 v.`3` `3`,
                                                 v.`4` `4`,
                                                 v.`5` `5`,
                                                 'ZZZ' value,
                                                 struct(v.`0` `0`) newattrib)) col1""")
      .select(to_json(struct("col1")).alias('col1'))
      )
df.show(truncate=False)

# +--------------------------------------------------------------------------------------------------------------------------------------------------+
# |col1                                                                                                                                              |
# +--------------------------------------------------------------------------------------------------------------------------------------------------+
# |{"col1":[{"1":1,"2":0,"3":1,"5":"free","value":"ZZZ","newattrib":{"0":2}},{"1":1,"2":0,"3":1,"4":4,"5":"free","value":"ZZZ","newattrib":{"0":3}}]}|
# +--------------------------------------------------------------------------------------------------------------------------------------------------+

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2013-01-13
    • 1970-01-01
    • 1970-01-01
    • 2022-12-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-08-28
    相关资源
    最近更新 更多