Guava Resources.readLines() 用于 Zip/Gzip 文件答案

【问题标题】：Guava Resources.readLines() for Zip/Gzip filesGuava Resources.readLines() 用于 Zip/Gzip 文件
【发布时间】：2015-08-14 13:53:38
【问题描述】：

我发现 Resources.readLines() 和 Files.readLines() 有助于简化我的代码。
问题是我经常从 URL（HTTP 和 FTP）读取 gzip 压缩的 txt 文件或 zip 档案中的 txt 文件。
有没有办法使用 Guava 的方法从这些 URL 中读取？还是只有 Java 的 GZIPInputStream/ZipInputStream 才有可能？

【问题讨论】：

如果您使用的是 Java 8，那么您可以使用 BufferedReader#lines()。
平！我在回答中为 Zip 添加了 ByteSource。

标签： java url guava readline

【解决方案1】：

您可以创建自己的ByteSources：

对于 GZip：

public class GzippedByteSource extends ByteSource {
  private final ByteSource source;
  public GzippedByteSource(ByteSource gzippedSource) { source = gzippedSource; }
  @Override public InputStream openStream() throws IOException {
    return new GZIPInputStream(source.openStream());
  }
}

然后使用它：

Charset charset = ... ;
new GzippedByteSource(Resources.asByteSource(url)).toCharSource(charset).readLines();

这里是 Zip 的实现。这假设您只阅读了一个条目。

public static class ZipEntryByteSource extends ByteSource {
  private final ByteSource source;
  private final String entryName;
  public ZipEntryByteSource(ByteSource zipSource, String entryName) {
    this.source = zipSource;
    this.entryName = entryName;
  }
  @Override public InputStream openStream() throws IOException {
    final ZipInputStream in = new ZipInputStream(source.openStream());
    while (true) {
      final ZipEntry entry = in.getNextEntry();
      if (entry == null) {
        in.close();
        throw new IOException("No entry named " + entry);
      } else if (entry.getName().equals(this.entryName)) {
        return new InputStream() {
          @Override
          public int read() throws IOException {
            return in.read();
          }

          @Override
          public void close() throws IOException {
            in.closeEntry();
            in.close();
          }
        };
      } else {
        in.closeEntry();
      }
    }
  }
}

你可以这样使用它：

Charset charset = ... ;
String entryName = ... ; // Name of the entry inside the zip file.
new ZipEntryByteSource(Resources.asByteSource(url), entryName).toCharSource(charset).readLines();

【讨论】：

GzipInputStream 应该是GZIPInputStream

【解决方案2】：

正如 Olivier Grégoire 所说，您可以为所需的任何压缩方案创建必要的 ByteSources，以便使用 Guava 的 readLines 函数。

不过，对于 zip 存档，虽然可以这样做，但我认为这不值得。制作自己的 readLines 方法会更容易，该方法会遍历 zip 条目并自行读取每个条目的行。这是一个演示如何读取和输出指向 zip 存档的 URL 行的类：

public class ReadLinesOfZippedUrl {
    public static List<String> readLines(String urlStr, Charset charset) {
        List<String> retVal = new LinkedList<>();
        try (ZipInputStream zipInputStream = new ZipInputStream(new URL(urlStr).openStream())) {
            for (ZipEntry zipEntry = zipInputStream.getNextEntry(); zipEntry != null; zipEntry = zipInputStream.getNextEntry()) {
                // don't close this reader or you'll close the underlying zip stream
                BufferedReader reader = new BufferedReader(new InputStreamReader(zipInputStream, charset));
                retVal.addAll(reader.lines().collect(Collectors.toList())); // slurp all the lines from one entry
            }
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
        return retVal;
    }

    public static void main(String[] args) {
        String urlStr = "http://central.maven.org/maven2/com/google/guava/guava/18.0/guava-18.0-sources.jar";
        Charset charset = StandardCharsets.UTF_8;
        List<String> lines = readLines(urlStr, charset);
        lines.forEach(System.out::println);
    }
}

【讨论】：