【发布时间】:2017-01-05 00:15:23
【问题描述】:
将数据从 BigQuery 导出到 Google 存储的最佳方式是什么?请注意,我需要对Bigquery 运行查询,而不是导出所有数据。本质上,我需要对 BigQuery (如 select * from mytable where code=foo )运行自定义查询,并且查询的结果需要写入存储在 Google Cloud 上的 csv 中。
我相信,最好的方法是通过谷歌数据流。让我知道是否还有其他选择?
另外,我正在寻找一些有关如何完成此操作的示例。有什么地方可以找到一些例子吗?
这是我目前所拥有的 PipelineOptions pipelineOptions = PipelineOptionsFactory.create(); 管道 p = Pipeline.create(pipelineOptions);
Date date = new Date();
p.getOptions().setTempLocation("gs://mybucket/tmp"+date.getTime());
PCollection<TableRow> rowPCollection = p.apply(BigQueryIO.Read.named("promos")
.fromQuery("SELECT * FROM [projectid:mydataset.mytable] where id = 256 LIMIT 1000"));
PCollection<String> stringPCollection = rowPCollection.apply(ParDo.named("Extract").of(new DoFn<TableRow, String>() {
@Override
public void processElement(ProcessContext c) {
TableRow tableRow = c.element();
try {
String prettyString = tableRow.toPrettyString();
c.output(prettyString);
} catch (IOException e) {
log.error("Exception occurred:" + e.getMessage());
}
}
}));
stringPCollection.apply(TextIO.Write.named("WriteOutput").to("gs://mybucket/avexport").withSuffix(".csv"));
p.run();
运行时,创建 ParDo 时会引发异常
caused by: java.io.NotSerializableException: com.my.validation.CommonValidator
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at com.google.cloud.dataflow.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:50)
【问题讨论】:
标签: google-bigquery google-cloud-storage google-cloud-dataflow