【问题标题】:I want to replicate sql OUTER APPLY functinally in pyspark我想在 pyspark 中复制 sql OUTER APPLY functinally
【发布时间】:2023-01-19 03:27:58
【问题描述】:

我想在 pyspark 中复制什么“OUTER APPLY”功能。

这是我的示例数据框

## Deparment table
data = [
    (1,'Engineering'),
    (2,'Administration'),
    (3,'Sales'),
    (4,'Marketing'),
  (5,'Finance')
]
schema = StructType([
    StructField('DepartmentID', IntegerType(), True),
    StructField('Name', StringType(), True)
])

Department = spark.createDataFrame(data=data, schema =schema) 
Department.show()

+------------+--------------+
|DepartmentID|          Name|
+------------+--------------+
|           1|   Engineering|
|           2|Administration|
|           3|         Sales|
|           4|     Marketing|
|           5|       Finance|
+------------+--------------+

## Employee table
data = [
    (1,'Orlando', 'Gee', 1),
    (2,'Keith', 'Harris', 2),
     (3,'Donna', 'Carreras', 3),
     (4,'Janet', 'Gates', 3),
]
schema = StructType([
    StructField('EmployeeID', IntegerType(), True),
    StructField('FirstName', StringType(), True),
  StructField('LastName', StringType(), True),
  StructField('DepartmentID', IntegerType(), True),
  
])
Employee = spark.createDataFrame(data=data, schema =schema) 
Employee.show()
+----------+---------+--------+------------+
|EmployeeID|FirstName|LastName|DepartmentID|
+----------+---------+--------+------------+
|         1|  Orlando|     Gee|           1|
|         2|    Keith|  Harris|           2|
|         3|    Donna|Carreras|           3|
|         4|    Janet|   Gates|           3|
+----------+---------+--------+------------+

我尝试创建一个临时表并使用 spark SQL 命令来查询,就像我们通常在临时表上所做的那样......但我不断得到

`[PARSE_SYNTAX_ERROR] 'OUTER'处或附近的语法错误(第 3 行,位置 2)

== SQL ==

从 D 部门中选择 * 外部应用 --^^^ ( 从员工 E 中选择 * 其中 E.DepartmentID = D.DepartmentID ) 一种 `

错误。任何帮助表示赞赏。

Employee.createOrReplaceTempView("Employee")
Department.createOrReplaceTempView("Department")

sql_query = """
  SELECT * FROM Department D 
  OUTER APPLY 
    ( 
      SELECT * FROM Employee E 
      WHERE E.DepartmentID = D.DepartmentID 
    ) A
"""

result_df = sqlContext.sql(sql_query)

【问题讨论】:

    标签: apache-spark pyspark apache-spark-sql


    【解决方案1】:

    OUTER APPLY 不是 Spark SQL 语法中的选项

    但是,OUTER APPLY 命令将产生与 LEFT OUTER JOIN 相同的结果。 LEFT OUTER JOIN这是Spark SQL Syntax中的一个选项。

    使用 LEFT OUTER JOIN 而不是 OUTER APPLY 在 Spark SQL 语法中看起来像这样,

    sql_query = """
      SELECT * FROM Department D 
      LEFT OUTER JOIN Employee E ON E.DepartmentID = D.DepartmentID 
    """
    

    使用LEFT OUTER JOIN而不是OUTER APPLYPySpark Syntax中看起来像这样,

    Department.join(Employee, Employee.DepartmentID ==  Department.DepartmentID, "left_outer") 
         .show(truncate=False)
    

    【讨论】:

      猜你喜欢
      • 2013-12-05
      • 2022-12-01
      • 1970-01-01
      • 2016-03-20
      • 2011-10-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-08-28
      相关资源
      最近更新 更多