【发布时间】:2021-09-11 02:39:00
【问题描述】:
我正在开发一个 spark 流应用程序/代码,它不断地从 localhost 9098 读取数据。有没有办法将 localhost 修改为
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.log4j.Logger
import org.apache.log4j.Level
object StreamingApplication extends App {
Logger.getLogger("Org").setLevel(Level.ERROR)
//creating spark streaming context
val sc = new SparkContext("local[*]", "wordCount")
val ssc = new StreamingContext(sc, Seconds(5))
// lines is a Dstream
val lines = ssc.socketTextStream("localhost", 9098)
// words is a transformed Dstream
val words = lines.flatMap(x => x.split(" "))
// bunch of transformations
val pairs = words.map(x=> (x,1))
val wordsCount = pairs.reduceByKey((x,y) => x+y)
// print is an action
wordsCount.print()
// start the streaming context
ssc.start()
ssc.awaitTermination()
}
基本上,我需要帮助来修改以下代码:
val lines = ssc.socketTextStream("localhost", 9098)
到这里:
val lines = ssc.socketTextStream("<folder path>")
仅供参考,我正在使用 IntelliJ Idea 来构建它。
【问题讨论】:
-
搜索“spark流文件”的第一个结果是:sparkbyexamples.com/spark/…
标签: json scala apache-spark spark-streaming