【发布时间】:2020-04-01 19:32:03
【问题描述】:
当我从 Kafka 主题创建流并打印其内容时
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 pyspark-shell'
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
sc = SparkContext(appName="PythonStreamingKafkaWords")
ssc = StreamingContext(sc, 10)
lines = KafkaUtils.createDirectStream(ssc, ['sample_topic'], {"bootstrap.servers": 'localhost:9092'})
lines.pprint()
ssc.start()
ssc.awaitTermination()
我得到一个空结果
-------------------------------------------
Time: 2019-12-07 13:11:50
-------------------------------------------
-------------------------------------------
Time: 2019-12-07 13:12:00
-------------------------------------------
-------------------------------------------
Time: 2019-12-07 13:12:10
-------------------------------------------
同时,它在控制台中工作:
kafka-console-consumer --topic sample_topic --from-beginning --bootstrap-server localhost:9092
正确地给出了我在 Kafka 主题中的所有文本行:
ham Ok lor... Sony ericsson salesman... I ask shuhui then she say quite gd 2 use so i considering...
ham Ard 6 like dat lor.
ham Why don't you wait 'til at least wednesday to see if you get your .
ham Huh y lei...
spam REMINDER FROM O2: To get 2.50 pounds free call credit and details of great offers pls reply 2 this text with your valid name, house no and postcode
spam This is the 2nd time we have tried 2 contact u. U have won the £750 Pound prize. 2 claim is easy, call 087187272008 NOW1! Only 10p per minute. BT-national-rate.
ham Will ü b going to esplanade fr home?
. . .
将数据从 Kafka 主题流式传输到 Spark 流式应用程序的正确方法是什么?
【问题讨论】:
标签: apache-spark pyspark apache-kafka spark-streaming