【问题标题】:Dataframe Won't Print数据框不会打印
【发布时间】:2021-09-23 02:59:48
【问题描述】:
import pyspark.sql.functions as f

df_ssaGenderWithinTenPercent = df_ssaGender.select("name", "women", "men", "total", "gender", "gender_ratio", \
f.when((df_ssaGender.gender_ratio >.45) & (df_ssaGender.gender_ratio < .55) & (df_ssaGender.gender_ratio >= 10000)).orderBy("gender", "gender_ratio", ascending = False)
df_ssaGenderWithinTenPercent.show()

所以我之前创建了一个名为 df_ssaGender 的数据框,并正在选择这些列。我需要获取 gender_ratio 介于 45% 和 55% 之间的数据。但是,每当我运行它时,我都会不断收到此语法错误,并且我很确定代码是正确的。有什么想法吗?


【问题讨论】:

    标签: sql pyspark syntax-error


    【解决方案1】:

    通过分解你的代码,我发现了 2 个你遗漏的地方

    df_ssaGenderWithinTenPercent = (df_ssaGender
      .select(
        "name",
        "women",
        "men",
        "total",
        "gender",
        "gender_ratio",
        f.when(
          (df_ssaGender.gender_ratio >.45) &
          (df_ssaGender.gender_ratio < .55) &
          (df_ssaGender.gender_ratio >= 10000) # you're also missing a retrun value here
        )
      ) # you were missing this
      .orderBy("gender", "gender_ratio", ascending = False)
    )
    df_ssaGenderWithinTenPercent.show()
    

    【讨论】:

      猜你喜欢
      • 2013-10-31
      • 2020-12-06
      • 1970-01-01
      • 1970-01-01
      • 2022-01-13
      • 2012-12-15
      • 2012-08-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多