【发布时间】:2018-10-02 02:03:19
【问题描述】:
我正在使用 spark 2.1.0 和 hadoop 2.7.3。
我正在尝试使用 newAPIHadoopFile,非常简单的代码,只在一个类中使用 main 方法:
val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
val sparkContext = spark.sparkContext
val sparkConf = sparkContext.getConf
val file = "src/main/resources/chat.csv"
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkContext.getConf.registerKryoClasses(Array(
Class.forName("org.apache.hadoop.io.LongWritable"),
Class.forName("org.apache.hadoop.io.Text")
));
sparkConf.set("spark.kryo.classesToRegister", "org.apache.hadoop.io.LongWritable, org.apache.hadoop.io.Text")
val rdd = sparkContext.newAPIHadoopFile(file, classOf[KeyValueTextInputFormat], classOf[Text], classOf[Text])
rdd.collect().foreach(println)
我在 StackOverflow 中查看了很多帖子,但仍然出现错误:
java.io.NotSerializableException: org.apache.hadoop.io.Text
Serialization stack:
- object not serializable (class: org.apache.hadoop.io.Text, value: How about Italian?"})
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, ( How about Italian?"},))
- element of array (index: 0)
- array (class [Lscala.Tuple2;, size 3)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
编辑:chat.csv 的内容:
{from:"Gert", to:"Melissa", message:"Want to have dinner?"}
{from:"Melissa", to:"Gert", message:"Ok\
How about Italian?"}
【问题讨论】:
-
你能把代码从类名开始粘贴到这里吗?
-
所有代码都在这里..除了主要方法声明和导入。
标签: apache-spark serialization hadoop2