【问题标题】:DirectPipelineRunner - does it support standard glob patterns?DirectPipelineRunner - 它是否支持标准 glob 模式?
【发布时间】:2015-02-18 05:33:48
【问题描述】:

在云端执行我们的管道运行良好。但是当它作为DirectPipelineRunner(即本地)运行时,它会出错,并抱怨提供的文件模式。文件模式使用 glob。

这是在本地运行时的预期行为吗?

[..]
TextIO.Read.from("gs://cdf-testing/NetworkClicks_123456_2015010[1-2]*")
[..]

Feb 18, 2015 4:19:09 PM com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner run
INFO: Executing pipeline using the DirectPipelineRunner.
Feb 18, 2015 4:19:10 PM com.google.cloud.dataflow.sdk.util.GcsUtil expand
INFO: matching files in bucket cdf-testing, prefix NetworkClicks_123456_2015010[1-2] against pattern NetworkClicks_123456_2015010[1-2][^/]*
Exception in thread "main" java.lang.RuntimeException: Failed to read from source: com.google.cloud.dataflow.sdk.runners.worker.TextReader@55dbc59b
    at com.google.cloud.dataflow.sdk.util.ReaderUtils.readElemsFromReader(ReaderUtils.java:40)
    at com.google.cloud.dataflow.sdk.io.TextIO.evaluateReadHelper(TextIO.java:702)
    at com.google.cloud.dataflow.sdk.io.TextIO.access$000(TextIO.java:98)
    at com.google.cloud.dataflow.sdk.io.TextIO$Read$Bound$1.evaluate(TextIO.java:310)
    at com.google.cloud.dataflow.sdk.io.TextIO$Read$Bound$1.evaluate(TextIO.java:306)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:611)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:200)
    at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:196)
    at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:109)
    at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:204)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:584)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:328)
    at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:70)
    at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:145)
    at com.shinetech.tpc.engine.CDFEngine.loadClicks(CDFEngine.java:88)
    at com.shinetech.tpc.engine.CDFEngine.doMagic(CDFEngine.java:75)
    at com.shinetech.tpc.Main.main(Main.java:15)
Caused by: java.io.IOException: No match for file pattern 'gs://cdf-testing/NetworkClicks_123456_2015010[1-2]*'
    at com.google.cloud.dataflow.sdk.runners.worker.FileBasedReader.iterator(FileBasedReader.java:101)
    at com.google.cloud.dataflow.sdk.util.ReaderUtils.readElemsFromReader(ReaderUtils.java:35)
    ... 16 more

【问题讨论】:

    标签: google-cloud-storage google-cloud-dataflow


    【解决方案1】:

    不,两个跑步者的行为应该相同。听起来那是 DirectRunner 中的一个错误。感谢您的报告——修复完成后将在此处回复。

    【讨论】:

    • 只是跟进,此修复自 2 月 23 日起已在 Github 上,并将在月中发布到 maven 的下一个版本中。
    猜你喜欢
    • 2022-11-15
    • 2012-08-16
    • 1970-01-01
    • 2019-02-06
    • 2019-06-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多