【发布时间】:2017-01-01 08:46:35
【问题描述】:
我正在使用带有 Hadoop YARN 的 Spark 1.6.2 集群,Oozie。我已经安装了 Zeppelin 0.6.1(带有所有解释器的二进制包:zeppelin-0.6.1-bin-all.tgz)。当我尝试将 SparkR 脚本与 %spark.r 解释器一起使用时,
%spark.r
# Creating SparkConext and connecting to Cloudant DB
sc1 <- sparkR.init(sparkEnv = list("cloudant.host"="host_name","cloudant.username"="user_name","cloudant.password"="password", "jsonstore.rdd.schemaSampleSize"="-1"))
# Database to be connected to extract the data
database <- "sensordata"
# Creating Spark SQL Context
sqlContext <- sparkRSQL.init(sc)
# Creating DataFrame for the "sensordata" Cloudant DB
sensorDataDF <- read.df(sqlContext, database, header='true', source = "com.cloudant.spark",inferSchema='true')
# Get basic information about the DataFrame(sensorDataDF)
printSchema(sensorDataDF)
我收到以下错误(日志):
ERROR [2016-08-25 03:28:37,336] (
{Thread-77}
JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:373)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:111)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:237)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:296)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:281)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:370)
... 3 more
我们将不胜感激。
【问题讨论】:
-
尝试将 spark master 设置为 local[2],如果可行,您可能在 env 文件中缺少一些环境变量
标签: apache-spark apache-spark-sql sparkr apache-zeppelin