【问题标题】:Using Google Storage from the Flink REPL从 Flink REPL 使用 Google Storage
【发布时间】:2019-03-15 09:30:46
【问题描述】:

我正在尝试将 csv 文件从谷歌云存储读取到 Flink REPL。由于我对 Flink 不是很精通,我更喜欢在 REPL 中工作,这样我就可以一次解决一个错误,而不是将我的代码放在 JAR 中,然后不知道从哪里开始出现所有错误。

对于这个例子,我将使用谷歌存储中公开可用的陆地卫星数据。

我创建了一个 dataproc 集群,并添加了一个 google cloud 提供的 bash 脚本,用于在集群创建时安装 flink。脚本可以在here找到。

由于我使用的是 dataproc 集群,我只需要将 gcs-connector jar 添加到类路径中。所以我像这样启动 Flink REPL:

/usr/lib/flink/bin/start-scala-shell.sh yarn -a /usr/lib/hadoop/lib/gcs-connector-hadoop2-1.9.10.jar

然后我在 REPL 中使用这行代码导入谷歌云存储:

 import com.google.cloud.hadoop.fs.gcs

最后,我尝试将公开可用的文件作为文本文件读取并得到以下错误:

 val landsaturl = "gs://gcp-public-data-landsat/LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_ANG.txt"
landsat.first(1).print()
2018-12-17 19:10:44,210 INFO  org.apache.flink.api.java.ExecutionEnvironment                - The job has 0 registered types and 0 default Kryo serializers
2018-12-17 19:10:44,210 WARN  org.apache.flink.configuration.Configuration                  - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2018-12-17 19:10:44,210 INFO  org.apache.flink.runtime.rest.RestClient                      - Rest client endpoint started.
2018-12-17 19:10:46,091 INFO  org.apache.flink.client.program.rest.RestClusterClient        - Submitting job 7aefc693e7911bd9ef80c3ebcf6a8343 (detached: false).
2018-12-17 19:10:46,091 INFO  org.apache.flink.client.program.rest.RestClusterClient        - Requesting blob server port.
2018-12-17 19:11:46,146 INFO  org.apache.flink.runtime.rest.RestClient                      - Shutting down rest endpoint.
2018-12-17 19:11:46,148 INFO  org.apache.flink.runtime.rest.RestClient                      - Rest endpoint shutdown complete.
org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.
  at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:452)
  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
  at org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:216)
  at org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:193)
  at org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:173)
  at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:816)
  at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
  at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
  at org.apache.flink.api.scala.DataSet.print(DataSet.scala:1726)
  ... 30 elided
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
  at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$5(RestClusterClient.java:357)
  at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
  at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
  at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
  at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
  at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:214)
  at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
  at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
  at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
  at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
  at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:195)
  at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
  at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
  at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
  at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
  at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
  at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
  at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
  at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
  at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
  at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
  at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
  at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
  at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
  at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
  at java.util.concurrent.CompletableFuture.biApply(CompletableFuture.java:1088)
  at java.util.concurrent.CompletableFuture$BiApply.tryFire(CompletableFuture.java:1070)
  ... 21 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
  ... 19 more
Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: Connection refused: cluster-****
  at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
  at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
  at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
  at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
  ... 16 more
Caused by: java.net.ConnectException: Connection refused: cluster-****
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
  at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
  ... 7 more

我是否遗漏了一个步骤,或者这是在 REPL 中无法做到的事情,只能使用胖 jar 并在 pom.xml 中指定与谷歌云存储相关的事情来完成?

【问题讨论】:

  • 看来你需要使用Flink HDFS connector,因为GCS连接器实现了HDFS接口。

标签: scala google-cloud-storage apache-flink google-cloud-dataproc


【解决方案1】:

您需要使用Flink HDFS connector,因为GCS连接器实现了HDFS接口。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-01-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-08-19
    • 2015-04-09
    相关资源
    最近更新 更多