【发布时间】:2018-03-14 13:58:46
【问题描述】:
我正在使用 Spark-shell。我已将推文存储在 Kafka 主题中以使用 Spark-shell 执行情绪分析。
我添加了依赖项: org.apache.spark:spark-streaming-kafka_2.10:1.6.2 edu.stanford.nlp:stanford-corenlp:3.5.1
这些是我正在处理的代码:
import org.apache.spark._
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.kafka._
val conf = new SparkConf().setMaster("local[4]").setAppName("KafkaReceiver")
val ssc = new StreamingContext(conf, Seconds(5))
val kafkaStream = KafkaUtils.createStream(ssc, "sandbox.hortonworks.com:2181","test-consumer-group", Map("test12" -> 5))
val topCounts60 = kafkaStream.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(60)).map { case (topic, count) => (count, topic) }.transform(_.sortByKey(false))
topCounts60.foreachRDD(rdd => {
val topList = rdd.take(10)
println("\nPopular topics in last 60 seconds (%s total):".format(rdd.count()))
topList.foreach { case (count, tag) => println("%s (%s tweets)".format(tag, count)) }
})
kafkaStream.count().map(cnt => "Received " + cnt + " kafka messages.").print()
val wordSentimentFilePath = "hdfs://sandbox.hortonworks.com:8020/TwitterData/AFINN.txt"
val wordSentiments = ssc.sparkContext.textFile(wordSentimentFilePath).map { line =>
val Array(word, happiness) = line.split("\t")
(word, happiness)
} cache()
val happiest60 = kafkaStream.map(hashTag => (hashTag.tail, 1)).reduceByKeyAndWindow(_ + _, Seconds(60)). transform{topicCount => wordSentiments.join(topicCount)}
.map{case (topic, tuple) => (topic, tuple._1 * tuple._2)}.map{case (topic, happinessValue) => (happinessValue, topic)}.transform(_.sortByKey(false))
ssc.start()
ssc.stop()
但是在执行这些行时,
val happiest60 = kafkaStream.map(hashTag => (hashTag.tail,1)).reduceByKeyAndWindow(_ + _, Seconds(60)). transform{topicCount => wordSentiments.join(topicCount)}.map{case (topic, tuple) => (topic, tuple._1 * tuple._2)}.map{case (topic, happinessValue) => (happinessValue, topic)}.transform(_.sortByKey(false))
它抛出错误:
错误:值尾不是 (String, String) 的成员
【问题讨论】:
-
尝试声明所有变量的类型。这可能会帮助你找出你错在哪里。
标签: scala hadoop apache-spark apache-kafka