【问题标题】:Apache Beam - Word Count Example not workingApache Beam - 字数统计示例不起作用
【发布时间】:2017-11-07 16:36:25
【问题描述】:

我正在尝试为 Google Cloud 制作自己的 DataFlow 运行器。 所以首先我尝试在我的计算机上本地执行此操作。 我尝试使用this,但是当我尝试运行 WordCount 时,我得到:

C:\Users\XXX\Documents\Test-Beam-3\word-count-beam>mvn compile exec:ja
va -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=
pom.xml --output=counts" -Pdirect-runner
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building word-count-beam 0.1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ word-count
-beam ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources,
i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory C:\Users\XXX\Documents\Test
-Beam-3\word-count-beam\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:compile (default-compile) @ word-count-be
am ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding Cp1252, i.e. b
uild is platform dependent!
[INFO] Compiling 21 source files to C:\Users\XXX\Documents\Test-Beam-3
\word-count-beam\target\classes
[INFO] /C:/Users/XXX/Documents/Test-Beam-3/word-count-beam/src/main/ja
va/org/apache/beam/examples/complete/game/utils/WriteToText.java: C:\Users\aalfe
rezaroca\Documents\Test-Beam-3\word-count-beam\src\main\java\org\apache\beam\exa
mples\complete\game\utils\WriteToText.java uses unchecked or unsafe operations.
[INFO] /C:/Users/XXX/Documents/Test-Beam-3/word-count-beam/src/main/ja
va/org/apache/beam/examples/complete/game/utils/WriteToText.java: Recompile with
 -Xlint:unchecked for details.
[INFO]
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
Nov 07, 2017 10:25:17 AM org.apache.beam.sdk.io.FileBasedSource getEstimatedSize
Bytes
INFO: Filepattern pom.xml matched 1 files with total size 14039
Nov 07, 2017 10:25:17 AM org.apache.beam.sdk.io.FileBasedSource expandFilePatter
n
INFO: Matched 1 files for pattern pom.xml
Nov 07, 2017 10:25:17 AM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern pom.xml into bundles of size 3509 took 11 ms and pro
duced 1 files and 4 bundles
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-count-beam\.temp-beam-2017-11-311_16
-25-17-1\, windowedWrites=false}
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-count-beam\.temp-beam-2017-11-311_16
-25-17-1\, windowedWrites=false}
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-count-beam\.temp-beam-2017-11-311_16
-25-17-1\, windowedWrites=false}
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-coun[t-beam\.temp-beam-2017-11-311_1
6-25-17-1\, windowedWrites=false}
WARNING]
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.jav
a:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessor
Impl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:293)
    at java.lang.Thread.run (Thread.java:748)
Caused by: org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.Il
legalStateException: Unable to find registrar for c
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUnti
lFinish (DirectRunner.java:331)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUnti
lFinish (DirectRunner.java:301)
    at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:200)
    at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:63)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:297)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:283)
    at org.apache.beam.examples.WordCount.main (WordCount.java:185)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.jav
a:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessor
Impl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:293)
    at java.lang.Thread.run (Thread.java:748)
Caused by: java.lang.IllegalStateException: Unable to find registrar for c
    at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal (FileSystems.jav
a:447)
    at org.apache.beam.sdk.io.FileSystems.match (FileSystems.java:111)
    at org.apache.beam.sdk.io.FileSystems.matchResources (FileSystems.java:174)
    at org.apache.beam.sdk.io.FileSystems.delete (FileSystems.java:321)
    at org.apache.beam.sdk.io.FileBasedSink$Writer.cleanup (FileBasedSink.java:9
05)
    at org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.processElement (Wri
teFiles.java:438)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.965 s
[INFO] Finished at: 2017-11-07T10:25:21-06:00
[INFO] Final Memory: 36M/647M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:java (d
efault-cli) on project word-count-beam: An exception occured while executing the
 Java class. null: InvocationTargetException: java.lang.IllegalStateException: U
nable to find registrar for c -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception

C:\Users\XXX\Documents\Test-Beam-3\word-count-beam>

我尝试从

更改数据源

gs://apache-beam-samples/shakespeare/kinglear.txt

C:\The Hunger Games.txt

但什么都没有。起初我认为这是防火墙/代理/网络相关的一些问题。 崩溃发生在 WordCount.java 第 184 行:

p.run().waitUntilFinish();

我很惊讶这不是开箱即用的,因为这应该是一个例子。

有什么提示吗? 有人遇到过这个问题吗?

编辑:

我发现someone said 这是 Windows 操作系统上与路径相关的问题。 我使用谷歌云存储(gs),但似乎代码使用了一些本地路径,导致崩溃。这是很久以前的事了,所以我不相信这个问题还没有解决。

【问题讨论】:

    标签: java apache-beam


    【解决方案1】:

    我有点困惑:你说你的输入来自gs://apache-beam-samples/shakespeare/kinglear.txt,但你的调用表明你正在使用-Dexec.args="--inputFile=pom.xml --output=counts" 运行程序,实际上根据它的日志输出,它正在读取你的@987654325 @ 文件并计算其中的单词。你在哪里指定kinglear.txt 路径?

    也就是说,它至少应该成功计算了pom.xml 中的单词。我认为这个 Windows 兼容性问题已在 HEAD 处得到修复 - 请参阅相应的 JIRA https://issues.apache.org/jira/browse/BEAM-2298

    【讨论】:

    • 如果我用本地路由更改 gs 路由,输出是一样的。该文件是路线所说的。 pom.xml 不是您计算单词的文件:maven.apache.org/guides/introduction/…
    • 是的,mvn 本身会读取 pom.xml,但您也在使用 -Dexec.args="--inputFile=pom.xml.." 调用 mvn exec,这意味着您要求 mvn将参数“--inputFile=pom.xml”传递给您的程序,以便您的程序读取 pom.xml 文件并计算其中的字数。如果你想让你的程序计算 gs://apache-beam-samples/shakespeare/kinglear.txt 中的单词,你需要通过 --inputFile=gs://apache-beam-samples/shakespeare/kinglear.txt。
    • 我怀疑您一直在更改 .java 代码中 --inputFile 参数的 default 值 - 但这是默认值,它被您在命令行上指定的任何内容。如果您希望默认值生效,您也可以不指定 --inputFile 在命令行上。
    • 没错。它将该文件作为 WordCount 的输入。但是,Windows 路径仍然存在问题,导致它崩溃。我在 Unbuntu 和 Mac OS 中尝试了相同的方法,都可以开箱即用。
    • 我明白了。在这种情况下,它与已经为下一个版本修复的 JIRA 匹配。
    【解决方案2】:

    我们可以将输出存储在 CloudSql 中吗?如果是,请提供步骤/流程

    仅供参考: 我可以按照这个将输出存储在云存储中 链接https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven

    mvn -Pdataflow-runner compile exec:java \
          -Dexec.mainClass=org.apache.beam.examples.WordCount \
          -Dexec.args="--project=<project_id> \
          --stagingLocation=gs://<bucket>/staging/ \
          --output=gs://<bucket>/output \
          --runner=DataflowRunner"
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-02-11
      • 1970-01-01
      • 1970-01-01
      • 2016-03-21
      • 2018-09-09
      • 1970-01-01
      相关资源
      最近更新 更多