【发布时间】:2021-05-27 22:10:36
【问题描述】:
我正在尝试使用分隔符“,”拆分列表,但在列表元素中还有字符“,”,例如:
1|[this is first element, this is seconde element, this is (bad, element)]
我想在数据名中玩,但是第三个元素中的这个逗号破坏了逻辑
current output :
id |name |val
1 |Column0|this is first element
1 |Column2|this is seconde element
1 |Column3|this is (bad
1 |Column4|element)
expected output:
id |name |val
1 |Column0|this is first element
1 |Column1|this is seconde element
1 |Column2|this is (bad, element)
df = df.select("id",f.split("text", ",").alias("text"),f.posexplode_outer(f.split("text", ",")).alias("pos", "val")).drop("val") \ .select("id","text",f.concat(f.lit("Column"),f.col("pos").cast("string")).alias("name"),f.expr("text[pos]").alias("val"))
【问题讨论】:
标签: python apache-spark pyspark