【发布时间】:2020-02-05 03:13:13
【问题描述】:
我正在尝试根据 rate_increase 获得每个地区的前 5 个项目。 我正在尝试使用 spark.sql 如下:
输入:
district item rate_increase(%)
Arba coil 500
Arba pen -85
Arba hat 50
Cebu oil -40
Cebu pen 1100
Top5item = spark.sql('select district, item , rate_increase, ROW_NUMBER() OVER (PARTITION BY district ORDER BY rate_increase DESC) AS RowNum from rateTable where rate_increase > 0')
这很有效。 如何在同一个语句中过滤前 5 个产品。我尝试如下,是通过 spar.sql 更好的方法吗?
Top5item = spark.sql('select district, item from (select NCSA, Product, growthRate, ROW_NUMBER() OVER (PARTITION BY NCSA ORDER BY growthRate DESC) AS RowNum from rateTable where rate_increase > 0) where RowNum <= 5 order by NCSA')
输出:
district item rate_increase(%)
Arba coil 500
Arba hat 50
Cebu pen 1100
谢谢。
【问题讨论】:
-
实际上由于某些原因,DESC 不起作用。什么是 groupby(district) 并获得前 5 项的嵌套方式?谢谢。
标签: python-3.x pandas pyspark pyspark-sql