【问题标题】:How to get this output in spark sql?如何在 spark sql 中获得此输出?
【发布时间】:2019-06-17 16:39:45
【问题描述】:

如何使用 spark.sql 获得列出每年所有电影的输出?

Ouput:
(1988,{(Rain Man),(Die Hard)})
(1990,{(The Godfather: Part III),(Die Hard 2),(The Silence of the Lambs),(King of New York)})
(1992,{(Unforgiven),(Bad Lieutenant),(Reservoir Dogs)})
(1994,{(Pulp Fiction)})

这是json数据:

{ "id": "movie:1", "title": "Vertigo", "year": 1958, "genre": "Drama", "summary": "A retired San Francisco detective suffering from acrophobia investigates the strange activities of an old friend's wife, all the while becoming dangerously obsessed with her.", "country": "USA", "director": { "id": "artist:3", "last_name": "Hitchcock", "first_name": "Alfred", "year_of_birth": "1899" }, "actors": [ { "id": "artist:15", "role": "John Ferguson" }, { "id": "artist:16", "role": "Madeleine Elster" } ] }

这是我尝试过的代码:

val hiveCtx = new org.apache.spark.sql.hive.HiveContext(sc) 
val movies = hiveCtx.jsonFile("movies.json") 
movies.createOrReplaceTempView("movies")
val ty = hiveCtx.sql("SELECT year, title FROM movies")

请帮我找到正确的查询。

感谢您的帮助。

【问题讨论】:

  • 您如何存储这些数据?能否请您包含您用于达到这一点的所有代码?
  • 创建 hivectx: val hiveCtx = new org.apache.spark.sql.hive.HiveContext(sc) val movies = hiveCtx.jsonFile("movies.json") movies.createOrReplaceTempView("movies")现在我需要一个 sql 查询来获取列出每年所有电影的输出 val ty = hiveCtx.sql("SELECT year, title FROM movies")?

标签: scala apache-spark apache-spark-sql


【解决方案1】:

您可以在不使用 spark.sql 的情况下获得类似的结果。您可以简单地对数据框本身执行操作:

movies.groupBy($"year").agg(concat_ws("; ", collect_list($"title"))).show

使用的数据集:

{ "id": "movie:1", "title": "Vertigo", "year": 1958, "genre": "Drama", "summary": "A retired San Francisco detective suffering from acrophobia investigates the strange activities of an old friend's wife, all the while becoming dangerously obsessed with her.", "country": "USA", "director": { "id": "artist:3", "last_name": "Hitchcock", "first_name": "Alfred", "year_of_birth": "1899" }, "actors": [ { "id": "artist:15", "role": "John Ferguson" }, { "id": "artist:16", "role": "Madeleine Elster" } ] }
{ "id": "movie:2", "title": "The Blob", "year": 1958, "genre": "Drama", "summary": "The Blob", "country": "USA", "director": { "id": "artist:3", "last_name": "Hitchcock", "first_name": "Alfred", "year_of_birth": "1899" }, "actors": [ { "id": "artist:15", "role": "John Ferguson" }, { "id": "artist:16", "role": "Madeleine Elster" } ] }

输出:

+----+----------------------------------+
|year|concat_ws(; , collect_list(title))|
+----+----------------------------------+
|1958|                 Vertigo; The Blob|
+----+----------------------------------+

【讨论】:

    猜你喜欢
    • 2021-12-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-10-18
    相关资源
    最近更新 更多