【发布时间】:2020-07-06 18:55:51
【问题描述】:
对于下面的输出,我想运行多个 sql 查询,如下面的代码所示,但是 spark 不支持多个 sql 语句,您能否建议一些其他解决方法,这将非常有帮助,谢谢:)
expected Output:-
Col_name Max_val Min_value
Name Null Null
Age 15 5
height 100 8
CODE :-
from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import *
df = sc.parallelize([ \
Row(name='Alice', age=5, height=80), \
Row(name='Kate', age=10, height=90), \
Row(name='Brain', age=15, height=100)]).toDF()
df.createOrReplaceTempView("Test")
df3 = spark.sql("select max(name) as name ,max(age) as age,max(height) as height from Test" )
df4=df.selectExpr("stack(3,'name',bigint(name),'age',bigint(age),'height',bigint(height)) as (col_name,max_data)")
df5 = spark.sql("select min(name) as name ,min(age) as age,min(height) as height from Test" )
df6=df.selectExpr("stack(3,'name',bigint(name),'age',bigint(age),'height',bigint(height)) as (col_name,min_data)")
df7=df4.join(df6,['col_name'],'inner').groupBy("col_name").orderBy("col_name")
df7.show()
【问题讨论】:
标签: python sql python-3.x apache-spark pyspark