【问题标题】:How to sort a struct array in a spark dataframe? [duplicate]如何对火花数据框中的结构数组进行排序? [复制]
【发布时间】:2022-02-14 09:58:29
【问题描述】:

我有以下代码和来自Aggregating multiple columns with custom function in Spark 的输出。

import org.apache.spark.sql.functions.{collect_list, struct}
import sqlContext.implicits._

val df = Seq(
  ("john", "tomato", 1.99),
  ("john", "carrot", 0.45),
  ("bill", "apple", 0.99),
  ("john", "banana", 1.29),
  ("bill", "taco", 2.59)
).toDF("name", "food", "price")

df.groupBy($"name")
  .agg(collect_list(struct($"food", $"price")).as("foods"))
  .show(false)

df.printSchema

输出和架构:

+----+---------------------------------------------+
|name|foods                                        |
+----+---------------------------------------------+
|john|[[tomato,1.99], [carrot,0.45], [banana,1.29]]|
|bill|[[apple,0.99], [taco,2.59]]                  |
+----+---------------------------------------------+

root
 |-- name: string (nullable = true)
 |-- foods: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- food: string (nullable = true)
 |    |    |-- price: double (nullable = false)

我想根据df("foods.food") 进行排序。如何排序以获得下面的输出?

+----+---------------------------------------------+
|name|foods                                        |
+----+---------------------------------------------+
|john|[[banana,1.29], [carrot,0.45], [tomato,1.99]]|
|bill|[[apple,0.99], [taco,2.59]]                  |
+----+---------------------------------------------+

编辑:我希望能够选择要排序的属性。例如,如果我想按价格排序,我想要这样的输出:

+----+---------------------------------------------+
|name|foods                                        |
+----+---------------------------------------------+
|john|[[carrot,0.45], [banana,1.29], [tomato,1.99]]|
|bill|[[apple,0.99], [taco,2.59]]                  |
+----+---------------------------------------------+

【问题讨论】:

  • array_sort 与自定义比较器功能一起使用,请参阅此post

标签: dataframe scala apache-spark


【解决方案1】:

您可以使用sort_array 函数。参考here

df.groupBy($"name")
  .agg(sort_array(collect_list(struct($"food", $"price"))).as("foods"))
  .show(false)

【讨论】:

  • 谢谢!但是如果我想根据价格属性进行排序呢?我还希望能够选择要排序的属性
  • 这能回答你的问题吗?stackoverflow.com/questions/49671354/…。另一种方式:首先用````price```构造struct。结构($“价格”,$“食物”)。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-07-08
  • 1970-01-01
相关资源
最近更新 更多