【发布时间】:2019-12-18 20:59:10
【问题描述】:
我需要删除字符串中的单引号。列名称是关键字。我有一个隐藏在字符串中的数组。所以我需要在 Spark Dataframe 中使用 Regex 从字符串的开头和结尾删除单引号。字符串如下所示:
Keywords=
'
[
"shade perennials"," shade loving perennials"," perennial plants"," perennials"," perennial flowers"," perennial plants for shade"," full shade perennials"
]
'
我尝试了以下方法:
remove_single_quote = udf(lambda x: x.replace(u"'",""))
cleaned_df = spark_df.withColumn('Keywords', remove_single_quote('Keywords'))
但是单引号还在,我也试过(u"\'","")
【问题讨论】:
标签: regex apache-spark pyspark