【发布时间】:2022-01-25 20:09:29
【问题描述】:
我目前正在转换数据框中的一些日期数据,如下所示:
+-----------+------------+
|first_col|sec_col-------|
+---------+--------------+
|a--------|28-04-2021 |
|a--------|01-03-2017 |
|a--------|"Feb 23, 2012"|
|a--------|"May 01, 2019"|
+---------+--------------+
我现在想将最后两行转换为更好的日期格式,如下所示:23-Feb-2012 我想用正则表达式来做这个,但是下面的代码不起作用:
from pyspark.sql import functions as f
from pyspark.sql.functions import regexp_replace, regexp_extract
#(a lot of stuff happens here which is not important for the question so I let it out)
input_df = input_df.withColumn("sec_col", input_df.sec_col.cast("String"))
.withColumn("sec_col2",
f.when(input_df.sec_col.rlike("\"\w{3} \d{2}, \d{4}\""),
f.concat(regexp_extract("sec_col","\"(\w{3}) (\d{2}), (\d{4})\"",2),f.lit("-"), regexp_extract("sec_col","\"(\w{3}) (\d{2}), (\d{4})\"",1),f.lit("-"),regexp_extract("sec_col","\"(\w{3}) (\d{2}), (\d{4})\"",3))))
.otherwise(f.col("sec_col"))
谁能帮忙?
【问题讨论】: