【发布时间】:2018-03-26 18:53:53
【问题描述】:
Pyspark n00b...如何用其自身的子字符串替换列?我正在尝试从字符串的开头和结尾删除选定数量的字符。
from pyspark.sql.functions import substring
import pandas as pd
pdf = pd.DataFrame({'COLUMN_NAME':['_string_','_another string_']})
# this is what i'm looking for...
pdf['COLUMN_NAME_fix']=pdf['COLUMN_NAME'].str[1:-1]
df = sqlContext.createDataFrame(pdf)
# following not working... COLUMN_NAME_fix is blank
df.withColumn('COLUMN_NAME_fix', substring('COLUMN_NAME', 1, -1)).show()
这非常接近但略有不同Spark Dataframe column with last character of other column。然后是这个 LEFT and RIGHT function in PySpark SQL
【问题讨论】:
标签: pyspark pyspark-sql