【问题标题】:Zeppelin Error When Running Pyspark Script运行 Pyspark 脚本时的 Zeppelin 错误
【发布时间】:2020-01-24 18:36:32
【问题描述】:

我正在尝试使用开发端点运行 AWS 粘合作业并遇到此错误:

org.apache.thrift.transport.TTransportException

这里仅指定了我开箱即用的操作:https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-local-notebook.html

启动命令:

$sudo bash
$./zeppelin-daemon.sh start

java 版本:openjdk 版本“1.8.0_222” zeppelin 版本:版本 0.7.3 MacOS 版本:10.14.6

代码:

%pyspark
import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.transforms import *

# Create a Glue context
glueContext = GlueContext(SparkContext.getOrCreate())

错误日志:

INFO [2019-09-24 11:12:26,138] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job paragraph_1569298185348_-1564147927 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreterexisting_process458983990
INFO [2019-09-24 11:12:26,138] ({pool-2-thread-2} Paragraph.java[jobRun]:362) - run paragraph 20190924-000945_1425987694 using pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@67a01c6d
ERROR [2019-09-24 11:12:26,154] ({pool-2-thread-2} Job.java[run]:188) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
... 11 more
ERROR [2019-09-24 11:12:26,239] ({pool-2-thread-2} RemoteScheduler.java[getStatus]:281) - Unknown status
java.lang.IllegalArgumentException: No enum constant org.apache.zeppelin.scheduler.Job.Status.UNKNOWN
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.zeppelin.scheduler.Job$Status.valueOf(Job.java:51)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:271)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:342)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR [2019-09-24 11:12:26,240] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2056) - Error
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
... 11 more
WARN [2019-09-24 11:12:26,240] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2064) - Job 20190924-000945_1425987694 is finished, status: ERROR, exception: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException, result: org.apache.thrift.transport.TTransportException
INFO [2019-09-24 11:12:26,263] ({pool-2-thread-2} SchedulerFactory.java[jobFinished]:137) - Job paragraph_1569298185348_-1564147927 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreterexisting_process458983990

Spark 解释器配置:

https://imgur.com/a/Ya0qt2p

什么没用:

  • 重新安装 zeppelin
  • 重启飞艇
  • 安装 apache spark 并将 SPARK_HOME 设置为 /usr/local/Cellar/apache-spark/2.4.4/
  • 向 spark 解释器配置添加 yarn.. 参数

【问题讨论】:

  • 解释器日志中有什么?当解释器没有回答就退出时,我看到过类似的情况

标签: java apache-spark pyspark apache-zeppelin


【解决方案1】:

事实证明,这是 AWS Glue 团队的问题。这个问题应该得到解决。

【讨论】:

    猜你喜欢
    • 2017-01-29
    • 2015-10-27
    • 1970-01-01
    • 2020-03-05
    • 2018-12-20
    • 2014-06-11
    • 1970-01-01
    • 2013-10-06
    相关资源
    最近更新 更多