【问题标题】:Showing status of Hive query in PySpark在 PySpark 中显示 Hive 查询的状态
【发布时间】:2017-05-16 18:58:59
【问题描述】:

我正在从 sparksession (spark) 运行 Hive 查询

spark.sql('SELECT * FROM SOME_TABLE').show()

sql 函数中是否有参数,或者是否有配置来打印类似于 Hive cli 中显示的状态?

Hadoop job information for Stage-1: number of mappers: 1193; number of reducers: 1099
2017-05-16 14:54:38,165 Stage-1 map = 0%,  reduce = 0%
2017-05-16 14:54:49,625 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 213.84 sec
2017-05-16 14:54:50,678 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 4495.91 sec
2017-05-16 14:54:51,729 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 5081.18 sec
2017-05-16 14:54:52,778 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 5244.48 sec
2017-05-16 14:54:53,818 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 7186.78 sec
2017-05-16 14:54:54,851 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 7702.71 sec
2017-05-16 14:54:55,887 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 7968.09 sec
2017-05-16 14:54:56,919 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 8325.11 sec

【问题讨论】:

    标签: hadoop apache-spark hive pyspark


    【解决方案1】:

    是的,您可以通过多种方式查看状态。

    1) 要查看作业运行时的 [相当详细] 状态,请将 logLevel 更改为“INFO”:spark.sparkContext.setLogLevel("INFO")

    2) 使用 Spark 或 YARN 用户界面(通常端口 18088 用于 Spark,4040 用于本地,8088 用于 YARN)

    用户界面中的事件日志将向您显示您需要了解的内容,或者进度条是一种更简单的视觉效果。

    相关文档:https://spark.apache.org/docs/latest/monitoring.html

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-07-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-11-12
      相关资源
      最近更新 更多