【发布时间】:2019-12-10 02:01:16
【问题描述】:
我正在尝试创建一个光束管道,以在一个 PCollection 上同时应用多个 ParDo 变换,并将所有结果收集并打印到一个列表中。到目前为止,我已经经历了顺序过程,比如第一个 ParDo,然后是第二个 ParDo。 这是我为我的问题准备的一个例子:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
p = beam.Pipeline(options=PipelineOptions())
class Tr1(beam.DoFn):
def process(self, number):
number = number + 1
yield number
class Tr2(beam.DoFn):
def process(self, number):
number = number + 2
yield number
def pipeline_test():
numbers = p | "Create" >> beam.Create([1])
tr1 = numbers | "Tr1" >> beam.ParDo(Tr1())
tr2 = numbers | "Tr2" >> beam.ParDo(Tr2())
tr1 | "Print1" >> beam.Map(print)
tr2 | "Print2" >> beam.Map(print)
def main(argv):
del argv
pipeline_test()
result = p.run()
result.wait_until_finish()
if __name__ == '__main__':
app.run(main)
【问题讨论】:
-
您能否附上您看到 PTransform 正在按顺序运行的数据流作业图?
标签: python apache-beam