【发布时间】:2016-08-30 13:16:13
【问题描述】:
我正在使用 Dataflow 通过 BigQueryIO.Write.to() 将数据写入 BigQuery。
有时,我会从 Dataflow 收到此警告:
{
metadata: {
severity: "WARNING"
projectId: "[...]"
serviceName: "dataflow.googleapis.com"
region: "us-east1-d"
labels: {
compute.googleapis.com/resource_type: "instance"
compute.googleapis.com/resource_name: "dataflow-[...]-08240401-e41e-harness-7dkd"
dataflow.googleapis.com/region: "us-east1-d"
dataflow.googleapis.com/job_name: "[...]"
compute.googleapis.com/resource_id: "[...]"
dataflow.googleapis.com/step_id: ""
dataflow.googleapis.com/job_id: "[...]"
}
timestamp: "2016-08-30T11:32:00.591Z"
projectNumber: "[...]"
}
insertId: "[...]"
log: "dataflow.googleapis.com/worker"
structPayload: {
message: "exception thrown while executing request"
work: "[...]"
thread: "117"
worker: "dataflow-[...]-08240401-e41e-harness-7dkd"
exception: "java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:918)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1535)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:37)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter$1.call(BigQueryTableInserter.java:229)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter$1.call(BigQueryTableInserter.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)"
logger: "com.google.api.client.http.HttpTransport"
stage: "F5"
job: "[...]"
}
}
我在此日志之后没有看到任何“重试”日志。
我的问题是:
- 我会丢失数据吗?我不知道写操作是否正确完成。如果我正确理解代码,则整个写入批处理处于不确定状态。
- 如果是这样,我是否有办法确保将数据写入 BigQuery 一次?
- 如果是这样,严重性不应该是 ERROR 而不是 WARNING?
以下是我的一些使用背景:
- 我在流模式下使用 Dataflow,使用 KafkaIO.java 从 Kafka 读取
- “有时”可以是每小时 0 到 3 次
- 根据工作的不同,我使用 2 到 36 名 n1-standard-4 类型的工人
- 根据工作的不同,我正在向 BigQuery 写入 3k 到 10k 条消息/秒
- 平均消息大小为 3kB
- Dataflow 工作器位于 us-east1-d 区域,BigQuery 数据集位置是美国
【问题讨论】:
标签: google-bigquery google-cloud-dataflow