【问题标题】:Stream file from Google Cloud Storage从 Google Cloud Storage 流式传输文件
【发布时间】:2017-07-11 14:32:48
【问题描述】:

这是从 Google Cloud Storage 下载文件的代码:

@Override
public void write(OutputStream outputStream) throws IOException {
    try {
        LOG.info(path);
        InputStream stream = new ByteArrayInputStream(GoogleJsonKey.JSON_KEY.getBytes(StandardCharsets.UTF_8));
        StorageOptions options = StorageOptions.newBuilder()
                .setProjectId(PROJECT_ID)
                .setCredentials(GoogleCredentials.fromStream(stream)).build();
        Storage storage = options.getService();
        final CountingOutputStream countingOutputStream = new CountingOutputStream(outputStream);
        byte[] read = storage.readAllBytes(BlobId.of(BUCKET, path));
        countingOutputStream.write(read);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        outputStream.close();
    }
}

这可行,但这里的问题是它必须先缓冲所有字节,然后才能流回此方法的客户端。这会导致很多延迟,尤其是当存储在 GCS 中的文件很大时。

有没有办法从 GCS 获取文件并将其直接流式传输到 OutputStream,这里的 OutputStream 顺便说一句是用于 Servlet。

【问题讨论】:

    标签: java servlets google-cloud-storage


    【解决方案1】:

    澄清一下,您需要 OutputStream 还是 InputStream ?看待这一点的一种方法是将数据存储在 Google Cloud Storage 对象中作为文件,并且您有一个 InputStream 来读取该文件。如果可行,请继续阅读。

    Storage API 中没有提供InputStreamOutputStream 的现有方法。但是有2 APIs in the Cloud Storage client library 暴露了一个从ReadableByteChannel 扩展而来的ReadChannel 对象(来自java NIO API)。

    ReadChannel reader(String bucket, String blob, BlobSourceOption... options);
    ReadChannel reader(BlobId blob, BlobSourceOption... options);
    

    一个使用这个的简单例子(取自StorageSnippets.java):

    /**
       * Example of reading a blob's content through a reader.
       */
      // [TARGET reader(String, String, BlobSourceOption...)]
      // [VARIABLE "my_unique_bucket"]
      // [VARIABLE "my_blob_name"]
      public void readerFromStrings(String bucketName, String blobName) throws IOException {
        // [START readerFromStrings]
        try (ReadChannel reader = storage.reader(bucketName, blobName)) {
          ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);
          while (reader.read(bytes) > 0) {
            bytes.flip();
            // do something with bytes
            bytes.clear();
          }
        }
        // [END readerFromStrings]
      }
    

    您还可以使用newInputStream() 方法将InputStream 包裹在ReadableByteChannel 上。

    public static InputStream newInputStream(ReadableByteChannel ch)

    即使您需要OutputStream,您也应该能够将数据从InputStream 或更好地从ReadChannel 对象复制到OutputStream

    完整示例

    将此示例运行为:PROGRAM_NAME <BUCKET_NAME> <BLOB_PATH>

    import java.io.IOException;
    import java.nio.ByteBuffer;
    import java.nio.channels.Channels;
    import java.nio.channels.WritableByteChannel;
    
    import com.google.cloud.ReadChannel;
    import com.google.cloud.storage.Bucket;
    import com.google.cloud.storage.BucketInfo;
    import com.google.cloud.storage.Storage;
    import com.google.cloud.storage.StorageOptions;
    
    /**
     * An example which reads the contents of the specified object/blob from GCS
     * and prints the contents to STDOUT.
     *
     * Run it as PROGRAM_NAME <BUCKET_NAME> <BLOB_PATH>
     */
    public class ReadObjectSample {
      private static final int BUFFER_SIZE = 64 * 1024;
    
      public static void main(String[] args) throws IOException {
        // Instantiates a Storage client
        Storage storage = StorageOptions.getDefaultInstance().getService();
    
        // The name for the GCS bucket
        String bucketName = args[0];
        // The path of the blob (i.e. GCS object) within the GCS bucket.
        String blobPath = args[1];
    
        printBlob(storage, bucketName, blobPath);
      }
    
      // Reads from the specified blob present in the GCS bucket and prints the contents to STDOUT.
      private static void printBlob(Storage storage, String bucketName, String blobPath) throws IOException {
        try (ReadChannel reader = storage.reader(bucketName, blobPath)) {
          WritableByteChannel outChannel = Channels.newChannel(System.out);
          ByteBuffer bytes = ByteBuffer.allocate(BUFFER_SIZE);
          while (reader.read(bytes) > 0) {
            bytes.flip();
            outChannel.write(bytes);
            bytes.clear();
          }
        }
      }
    }
    

    【讨论】:

    • 你好,在我的情况下,这段代码对我不起作用,这里的while循环甚至对我都不起作用,而同一个桶/文件适用于我的旧代码。
    • @xybrek - 我已经粘贴了一个适合我的完整工作示例。试一试。确保传递存储桶中 blob/文件的完整路径。示例:PROGRAM_NAME my-bucket-1 path/to/some/content.txt
    • 对我不起作用。我得到:线程“main”中的异常 java.lang.NoClassDefFoundError: com/google/auth/Credentials at com.xxx.test.ReadObjectSample .main(ReadObjectSample .java:28) 原因:java.lang.ClassNotFoundException: com. google.auth.Credentials
    • ClassNotFoundException 可能是由于旧的 Guava 版本。确保您使用的是至少 18.0
    【解决方案2】:

    目前我能找到的最干净的选项如下所示:

    Blob blob = bucket.get("some-file");
    ReadChannel reader = blob.reader();
    InputStream inputStream = Channels.newInputStream(reader);
    

    频道来自 java.nio。此外,您还可以使用 commons io 轻松地将 InputStream 读入 OutputStream:

    IOUtils.copy(inputStream, outputStream);
    

    【讨论】:

      【解决方案3】:

      代码,基于@Tuxdude 的回答

       @Nullable
          public byte[] getFileBytes(String gcsUri) throws IOException {
      
              Blob blob = getBlob(gcsUri);
              ReadChannel reader;
              byte[] result = null;
              if (blob != null) {
                  reader = blob.reader();
                  InputStream inputStream = Channels.newInputStream(reader);
                 result = IOUtils.toByteArray(inputStream);
              }
              return result;
          }
      

      //this will work only with files 64 * 1024 bytes on smaller
       @Nullable
          public byte[] getFileBytes(String gcsUri) throws IOException {
              Blob blob = getBlob(gcsUri);
      
              ReadChannel reader;
              byte[] result = null;
              if (blob != null) {
                  reader = blob.reader();
                  ByteBuffer bytes = ByteBuffer.allocate(64 * 1024);
      
                  while (reader.read(bytes) > 0) {
                      bytes.flip();
                      result = bytes.array();
                      bytes.clear();
                  }
              }
              return result; 
          }
      

      帮助代码:

         @Nullable
          Blob getBlob(String gcsUri) {
              //gcsUri is "gs://" + blob.getBucket() + "/" + blob.getName(),
              //example "gs://myapp.appspot.com/ocr_request_images/000c121b-357d-4ac0-a3f2-24e0f6d5cea185dffb40eee-850fab211438.jpg"
      
              String bucketName = parseGcsUriForBucketName(gcsUri);
              String fileName = parseGcsUriForFilename(gcsUri);
      
              if (bucketName != null && fileName != null) {
                  return storage.get(BlobId.of(bucketName, fileName));
              } else {
                  return null;
              }
          }
      
          @Nullable
          String parseGcsUriForFilename(String gcsUri) {
              String fileName = null;
              String prefix = "gs://";
              if (gcsUri.startsWith(prefix)) {
                  int startIndexForBucket = gcsUri.indexOf(prefix) + prefix.length() + 1;
                  int startIndex = gcsUri.indexOf("/", startIndexForBucket) + 1;
                  fileName = gcsUri.substring(startIndex);
              }
              return fileName;
          }
      
          @Nullable
          String parseGcsUriForBucketName(String gcsUri) {
              String bucketName = null;
              String prefix = "gs://";
              if (gcsUri.startsWith(prefix)) {
                  int startIndex = gcsUri.indexOf(prefix) + prefix.length();
                  int endIndex = gcsUri.indexOf("/", startIndex);
                  bucketName = gcsUri.substring(startIndex, endIndex);
              }
              return bucketName;
          }
      

      【讨论】:

        【解决方案4】:

        另一种(方便)从 Google Cloud Storage 流式传输文件的方式,google-cloud-nio

        Path path = Paths.get(URI.create("gs://bucket/file.csv"));
        InputStream in = Files.newInputStream(path);
        

        【讨论】:

          【解决方案5】:

          人们现在应该使用 Java 9 或更高版本,因此可以使用 InputStream transferTo 输出流:

          
              // the resource url is something like gs://youbucket/some/file/path.csv
              public InputStream getUriAsInputStream( Storage storage, String resourceUri) {
                  String[] parts = resourceUri.split("/");
                  BlobId blobId = BlobId.of(parts[2], String.join("/", Arrays.copyOfRange(parts, 3, parts.length)));
                  Blob blob = storage.get(blobId);
                  if (blob == null || !blob.exists()) {
                      throw new IllegalArgumentException("Blob [" + resourceUri + "] does not exist");
                  }
                  ReadChannel reader = blob.reader();
                  InputStream inputStream = Channels.newInputStream(reader);
                  return inputStream;
              }
          
          // use it with something like: 
          @Override
          public void write(OutputStream outputStream) throws IOException {
              try {
                  LOG.info(path);
                  InputStream stream = new ByteArrayInputStream(GoogleJsonKey.JSON_KEY.getBytes(StandardCharsets.UTF_8));
                  StorageOptions options = StorageOptions.newBuilder()
                          .setProjectId(PROJECT_ID)
                          .setCredentials(GoogleCredentials.fromStream(stream)).build();
                  Storage storage = options.getService();
                  final CountingOutputStream countingOutputStream = new CountingOutputStream(outputStream);
                  
                  final InputStream in = getUriAsInputStream(storage, "gs://your-bucket/path/to/file.csv");
                  in.transferTo(outputStream)
              } catch (Exception e) {
                  e.printStackTrace();
              } finally {
                  outputStream.close();
                  in.close();
              }
          }
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2020-01-15
            • 2014-09-18
            • 2019-09-14
            • 2021-12-30
            • 2017-04-30
            • 2019-04-03
            相关资源
            最近更新 更多