【问题标题】:sparkR Rstudio errorsparkR Rstudio 错误
【发布时间】:2016-07-20 05:29:38
【问题描述】:

Rstudio 中的 sparkR 无法读取数据出错。

如何解决我能做什么

环境

R:version 3.3.1
RStudio:Version 0.99.902 
sparkR:Version 1.6.1
mac:Version 10.11.6

代码

SPARK_HOME <- "/usr/local/Cellar/apache-spark/1.6.1/libexec"
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shell"')
.libPaths(c(file.path(SPARK_HOME, "R", "lib"), .libPaths()))
library(SparkR)

sc <- sparkR.init(master="local[3]", sparkHome=SPARK_HOME,
               sparkEnvir=list(spark.driver.maemory="6g",
                               sparkPackages="com.databricks:spark-csv_2.10:1.4.0"))

sqlContext <- sparkRSQL.init(sc)

警告

WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.

代码

df <- read.df(sqlContext, "iris.csv", source="com.databricks.spark.csv", inferSchema="true")

警告

WARN : Your hostname, xxxx-no-MacBook-Pro.local resolves to a loopback/non-reachable address: fe80:0:0:0:701f:d8ff:fe34:fd1%8, but we couldn't find any external IP address!

错误

ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)

警告

16/07/20 14:00:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)

错误

16/07/20 14:00:44 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
16/07/20 14:00:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/07/20 14:00:44 INFO TaskSchedulerImpl: Cancelling stage 0
16/07/20 14:00:44 INFO DAGScheduler: ResultStage 0 (first at CsvRelation.scala:267) failed in 60.099 s
16/07/20 14:00:44 INFO DAGScheduler: Job 0 failed: first at CsvRelation.scala:267, took 60.168711 s
16/07/20 14:00:44 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
invokeJava(isStatic = TRUE, className, methodName, ...) でエラー: 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)

因为不了解已解决的方法。 请告诉我。

【问题讨论】:

    标签: macos rstudio sparkr


    【解决方案1】:

    试试这个

    Sys.setenv(SPARK_HOME="/usr/local/Cellar/apache-spark/1.6.1/libexec")
    Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.4.0" "sparkr-shell"')
    library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R","lib")))
    sc <- sparkR.init(master="local", sparkEnvir = list(spark.driver.memory="4g", spark.executor.memory="6g"))
    
    sqlContext <- sparkRSQL.init(sc)
    

    它对我有用。

    【讨论】:

      猜你喜欢
      • 2015-09-28
      • 1970-01-01
      • 1970-01-01
      • 2016-12-16
      • 1970-01-01
      • 2015-09-16
      • 1970-01-01
      • 2018-02-24
      • 2017-02-11
      相关资源
      最近更新 更多