【发布时间】:2016-12-01 15:05:57
【问题描述】:
最近,我正在学习这本书 - learning-spark-o-reilly-2015 。我尝试运行火花流示例 StreamingLogInput。代码如下:
val conf = new SparkConf().setMaster(master).setAppName("StreamingLogInput")
// Create a StreamingContext with a 1 second batch size
val ssc = new StreamingContext(conf, Seconds(1))
// Create a DStream from all the input on port 7777
val lines = ssc.socketTextStream("localhost", 7777)
val errorLines = processLines(lines)
// Print out the lines with errors, which causes this DStream to be evaluated
errorLines.print()
// start our streaming context and wait for it to "finish"
ssc.start()
def processLines(lines: DStream[String]) = {
// Filter our DStream for lines with "error"
lines.filter(_.contains("error"))
}
当我在单节点机器上运行这个程序时,如下所示,
$SPARK_HOME/bin/spark-submit \
--class com.oreilly.learningsparkexamples.scala.StreamingLogInput \
--master spark://singlenode:7077 \
/home/hadoop/project/learning-spark/target/scala-2.10/learning-spark-examples_2.10-0.0.1.jar \
spark://singlenode:7077
在另一个窗口中,我输入订单
nc -l 7777
然后输入一些假日志 但没有输出错误日志。 日志如下:
16/11/24 04:20:48 INFO BlockManagerInfo: Added input-0-1479932447800 in memory
on singlenode:37112 (size: 32.0 B, free: 267.2 MB)
16/11/24 04:20:49 INFO JobScheduler: Added jobs for time 1479932449000 ms
16/11/24 04:20:50 INFO JobScheduler: Added jobs for time 1479932450000 ms
16/11/24 04:20:51 INFO JobScheduler: Added jobs for time 1479932451000 ms
16/11/24 04:20:51 INFO BlockManagerInfo: Added input-0-1479932451000 in memory on singlenode:37112 (size: 33.0 B, free: 267.2 MB)
16/11/24 04:20:52 INFO JobScheduler: Added jobs for time 1479932452000 ms
16/11/24 04:20:53 INFO JobScheduler: Added jobs for time 1479932453000 ms
16/11/24 04:20:54 INFO JobScheduler: Added jobs for time 1479932454000 ms
16/11/24 04:20:55 INFO JobScheduler: Added jobs for time 1479932455000 ms
16/11/24 04:20:56 INFO JobScheduler: Added jobs for time 1479932456000 ms
16/11/24 04:20:57 INFO JobScheduler: Added jobs for time 1479932457000 ms
16/11/24 04:20:58 INFO JobScheduler: Added jobs for time 1479932458000 ms
为什么会发生这种情况?感谢任何帮助!
【问题讨论】:
-
您运行的配置是什么,请分享您的集群配置!这里
-
我只在我的虚拟机和一台机器上运行程序。而且spark配置很简单,master和worker运行在同一台机器上。我可以成功运行其他火花程序,但不能像其他人一样运行流。火花版本是 1.3.1。
标签: spark-streaming