【问题标题】:Why my spark streaming demo does not output anything为什么我的火花流演示不输出任何内容
【发布时间】:2016-12-01 15:05:57
【问题描述】:

最近,我正在学习这本书 - learning-spark-o-reilly-2015 。我尝试运行火花流示例 StreamingLogInput。代码如下:

val conf = new SparkConf().setMaster(master).setAppName("StreamingLogInput")
// Create a StreamingContext with a 1 second batch size
val ssc = new StreamingContext(conf, Seconds(1))
// Create a DStream from all the input on port 7777
val lines = ssc.socketTextStream("localhost", 7777)
val errorLines = processLines(lines)
// Print out the lines with errors, which causes this DStream to be evaluated
errorLines.print()
// start our streaming context and wait for it to "finish"
ssc.start()

def processLines(lines: DStream[String]) = {
// Filter our DStream for lines with "error"
lines.filter(_.contains("error"))
}

当我在单节点机器上运行这个程序时,如下所示,

$SPARK_HOME/bin/spark-submit \
--class com.oreilly.learningsparkexamples.scala.StreamingLogInput \
--master spark://singlenode:7077 \
/home/hadoop/project/learning-spark/target/scala-2.10/learning-spark-examples_2.10-0.0.1.jar \
spark://singlenode:7077 

在另一个窗口中,我输入订单

nc -l 7777 

然后输入一些假日志 但没有输出错误日志。 日志如下:

16/11/24 04:20:48 INFO BlockManagerInfo: Added input-0-1479932447800 in memory 
on singlenode:37112 (size: 32.0 B, free: 267.2 MB)
16/11/24 04:20:49 INFO JobScheduler: Added jobs for time 1479932449000 ms
16/11/24 04:20:50 INFO JobScheduler: Added jobs for time 1479932450000 ms
16/11/24 04:20:51 INFO JobScheduler: Added jobs for time 1479932451000 ms
16/11/24 04:20:51 INFO BlockManagerInfo: Added input-0-1479932451000 in memory on singlenode:37112 (size: 33.0 B, free: 267.2 MB) 
16/11/24 04:20:52 INFO JobScheduler: Added jobs for time 1479932452000 ms
16/11/24 04:20:53 INFO JobScheduler: Added jobs for time 1479932453000 ms
16/11/24 04:20:54 INFO JobScheduler: Added jobs for time 1479932454000 ms
16/11/24 04:20:55 INFO JobScheduler: Added jobs for time 1479932455000 ms
16/11/24 04:20:56 INFO JobScheduler: Added jobs for time 1479932456000 ms
16/11/24 04:20:57 INFO JobScheduler: Added jobs for time 1479932457000 ms
16/11/24 04:20:58 INFO JobScheduler: Added jobs for time 1479932458000 ms

为什么会发生这种情况?感谢任何帮助!

【问题讨论】:

  • 您运行的配置是什么,请分享您的集群配置!这里
  • 我只在我的虚拟机和一台机器上运行程序。而且spark配置很简单,master和worker运行在同一台机器上。我可以成功运行其他火花程序,但不能像其他人一样运行流。火花版本是 1.3.1。

标签: spark-streaming


【解决方案1】:

我通过在提交申请时指定多个执行者解决了这个问题,例如本地[3]。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-04-15
    • 2021-04-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多