【问题标题】:spark test on local machine本地机器上的火花测试
【发布时间】:2016-02-18 18:29:09
【问题描述】:

我正在使用 sbt 测试在 Spark 1.3.1 上运行单元测试,除了单元测试非常慢之外,我一直遇到 java.lang.ClassNotFoundException: org.apache.spark.storage.RDDBlockId 问题。通常这意味着依赖问题,但我不知道从哪里开始。尝试在新机器上安装所有东西,包括新的 hadoop、新的 ivy2,但我仍然遇到同样的问题

非常感谢任何帮助

例外:

Exception in thread "Driver Heartbeater" java.lang.ClassNotFoundException: 
    org.apache.spark.storage.RDDBlockId
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)

我的 build.sbt:

libraryDependencies ++=  Seq( 
  "org.scalaz"              %% "scalaz-core" % "7.1.2" excludeAll ExclusionRule(organization = "org.slf4j"), 
  "com.typesafe.play"       %% "play-json" % "2.3.4" excludeAll ExclusionRule(organization = "org.slf4j"), 
  "org.apache.spark"        %% "spark-core" % "1.3.1" % "provided"  withSources() excludeAll (ExclusionRule(organization = "org.slf4j"), ExclusionRule("org.spark-project.akka", "akka-actor_2.10")), 
  "org.apache.spark"        %% "spark-graphx" % "1.3.1" % "provided" withSources() excludeAll (ExclusionRule(organization = "org.slf4j"), ExclusionRule("org.spark-project.akka", "akka-actor_2.10")), 
  "org.apache.cassandra"    % "cassandra-all" % "2.1.6", 
  "org.apache.cassandra"    % "cassandra-thrift" % "2.1.6", 
  "com.typesafe.akka" %% "akka-actor" % "2.3.11", 
  "com.datastax.cassandra"  % "cassandra-driver-core" % "2.1.6" withSources() withJavadoc() excludeAll (ExclusionRule(organization = "org.slf4j"),ExclusionRule(organization = "org.apache.spark"),ExclusionRule(organization = "com.twitter",name = "parquet-hadoop-bundle")), 
  "com.github.nscala-time"  %% "nscala-time" % "1.2.0" excludeAll ExclusionRule(organization = "org.slf4j") withSources(), 
  "com.datastax.spark"      %% "spark-cassandra-connector-embedded" % "1.3.0-M2" excludeAll (ExclusionRule(organization = "org.slf4j"),ExclusionRule(organization = "org.apache.spark"),ExclusionRule(organization = "com.twitter",name = "parquet-hadoop-bundle")), 
  "com.datastax.spark"      %% "spark-cassandra-connector" % "1.3.0-M2" excludeAll (ExclusionRule(organization = "org.slf4j"),ExclusionRule(organization = "org.apache.spark"),ExclusionRule(organization = "com.twitter",name = "parquet-hadoop-bundle")), 
  "org.slf4j"               % "slf4j-api"            % "1.6.1", 
   "com.twitter"            % "jsr166e" % "1.1.0", 
  "org.slf4j"               % "slf4j-nop" % "1.6.1" % "test", 
  "org.scalatest"           %% "scalatest" % "2.2.1" % "test" excludeAll ExclusionRule(organization = "org.slf4j") 
) 

和我的火花测试设置(我已禁用所有测试设置)

(spark.kryo.registrator,com.my.spark.MyRegistrator) 
(spark.eventLog.dir,) 
(spark.driver.memory,16G) 
(spark.kryoserializer.buffer.mb,512) 
(spark.akka.frameSize,5) 
(spark.shuffle.spill,false) 
(spark.default.parallelism,8) 
(spark.shuffle.consolidateFiles,false) 
(spark.serializer,org.apache.spark.serializer.KryoSerializer) 
(spark.shuffle.spill.compress,false) 
(spark.driver.host,10.10.68.66) 
(spark.akka.timeout,300) 
(spark.driver.port,55328) 
(spark.eventLog.enabled,false) 
(spark.cassandra.connection.host,127.0.0.1) 
(spark.cassandra.connection.ssl.enabled,false) 
(spark.master,local[8]) 
(spark.cassandra.connection.ssl.trustStore.password,password) 
(spark.fileserver.uri,http://10.10.68.66:55329) 
(spark.cassandra.auth.username,username) 
(spark.local.dir,/tmp/spark) 
(spark.app.id,local-1436229075894) 
(spark.storage.blockManagerHeartBeatMs,300000) 
(spark.executor.id,<driver>) 
(spark.storage.memoryFraction,0.5) 
(spark.app.name,Count all entries 217885402) 
(spark.shuffle.compress,false) 

发送到独立或 mesos 的组装或打包 jar 可以正常工作!有什么建议吗?

【问题讨论】:

    标签: scala unit-testing apache-spark


    【解决方案1】:

    我们在 Spark 1.6.0 中遇到了同样的问题(已经有一个 bug 报告) 我们通过切换到 Kryo 序列化程序(无论如何你都应该使用它)来修复它。 所以这似乎是默认 JavaSerializer 中的一个错误。

    只需执行以下操作即可摆脱它:

    new SparkConf().setAppName("Simple Application").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    

    【讨论】:

      【解决方案2】:

      原因是广播变量很大。不确定原因(因为它适合内存),但从测试用例中删除它使其工作。

      【讨论】:

      • 我看到这个没有明确的广播变量,但在 DataFrame 上调用了 cache()
      • 我这边也有同样的问题,但是没有广播变量和缓存()
      猜你喜欢
      • 2017-02-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-03-24
      • 2018-06-09
      • 2014-11-06
      相关资源
      最近更新 更多