【发布时间】:2021-12-13 12:44:40
【问题描述】:
我有以下 pyspark 数据框
a = ['480s','480s','499s','499s','650s','650s','702s','702s','736s','736s','736s','737s','737s']
b = ['North','West','East','North','East','North','North','West','North','South','West','North','West']
df = pd.DataFrame(dict(dcode=a, zone=b))
dcode zone
0 480s North
1 480s West
2 499s East
3 499s North
4 650s East
5 650s North
6 702s North
7 702s West
8 736s North
9 736s South
10 736s West
11 737s North
12 737s West
我希望我的数据框看起来像 -
dcode zone output
0 480s North NW
1 480s West NW
2 499s East
3 499s North NW
4 650s East
5 650s North NW
6 702s North
7 702s West
8 736s North
9 736s South
10 736s West
11 737s North
12 737s West
同样,我正在使用这个逻辑,但它没有给出想要的结果。
df_ = df.withColumn("output", F.when((F.col("Zone") == "North") | (F.col("Zone") == "West") & (F.col("dcode") != "702s") | (F.col("dcode") != "736s") | (F.col("dcode") != "737s"), "NW"))
仅当区域为北或西且解码不在 736,737s,702s 中时,我才希望在输出列中出现 NW。
【问题讨论】:
标签: apache-spark pyspark