muyue123

 

4个分位数的取法:

df1 = spark.createDataFrame([(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10),(2,1),(2,10),(2,100)],[\'id\',\'cnt\'])

cnt_med_1 = F.expr(\'percentile_approx(cnt, 0.25)\')
cnt_med_2 = F.expr(\'percentile_approx(cnt, 0.5)\')
cnt_med_3 = F.expr(\'percentile_approx(cnt, 0.75)\')
cnt_med_4 = F.expr(\'percentile_approx(cnt, 0.90)\')

df1.groupBy(\'id\').agg(F.max(\'cnt\').alias(\'max_cnt\'),cnt_med_1.alias(\'cnt_med_1\'),cnt_med_2.alias(\'cnt_med_2\'),cnt_med_3.alias(\'cnt_med_3\'),cnt_med_4.alias(\'cnt_med_4\')).show()

 

分类:

技术点:

相关文章: