动态创建一堆 URL 的动态 zip答案

【问题标题】：Creating a dynamic zip of a bunch of URLs on the fly动态创建一堆 URL 的动态 zip
【发布时间】：2013-11-29 20:14:40
【问题描述】：

我正在尝试即时创建任意大小的 zip 文件。 zip 存档的来源是一堆 URL，可能很大（列表中有 500 个 4MB JPG）。我希望能够在请求中执行所有操作，并立即开始下载，并在构建时创建和流式传输 zip。它不必驻留在内存或服务器的磁盘上。

我最接近的是：注意：urls 是指向文件名的 URL 的键值对，因为它们应该存在于创建的 zip 中

Response.ClearContent();
Response.ClearHeaders();
Response.ContentType = "application/zip";
Response.AddHeader("Content-Disposition", "attachment; filename=DyanmicZipFile.zip");

using (var memoryStream = new MemoryStream())
{
    using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
    {
        foreach (KeyValuePair<string, string> fileNamePair in urls)
        {
            var zipEntry = archive.CreateEntry(fileNamePair.Key);

            using (var entryStream = zipEntry.Open())
                using (WebClient wc = new WebClient())
                    wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)).CopyTo(entryStream);

                //this doesn't work either
                //using (var streamWriter = new StreamWriter(entryStream))
                //  using (WebClient wc = new WebClient())
                //      streamWriter.Write(wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)));
        }
    }

    memoryStream.WriteTo(Response.OutputStream);
}
HttpContext.Current.ApplicationInstance.CompleteRequest();

这段代码给了我一个 zip 文件，但 zip 中的每个 JPG 文件都只是一个文本文件，上面写着“System.Net.ConnectStream”，我对此还有其他尝试，可以构建一个包含适当文件的 zip 文件，但是您可以通过一开始的延迟来判断服务器正在完全在内存中构建 zip，然后在最后将其炸毁。当文件数接近 50 时，它根本没有响应。 cmets 中的部分给我的结果与我尝试过 Ionic.Zip 的结果相同。

这是 IIS8 上的 .NET 4.5。我正在使用 VS2013 构建并尝试在 AWS Elastic Beanstalk 上运行它。

【问题讨论】：

StreamWriter 版本不起作用，因为对streamWriter.Write 的调用最终会在客户端流上调用ToString()，它解析为Object.ToString()，并且只写入类名。我真的不能说为什么CopyTo 版本不起作用。您确定会创建与 StreamWriter 版本相同的 zip 文件吗？
如果我将memoryStream.WriteTo(Response.OutputStream) 部分移动到foreach 循环中并添加Response.Flush()，它似乎会立即开始流式传输（第一场战斗获胜），但生成的zip 文件比它大得多应该。我最好的猜测是他们WriteTo() 调用正在写入整个流，所以 zip 文件是 (file1)+(file1+file2)+(file1+file2+file3) 等。Write() 不会直接转到 @ 987654333@.

标签： c# asp.net .net .net-4.5

【解决方案1】：

您正在尝试创建一个 zip 文件并在创建时让它流式传输。事实证明这非常困难。

您需要了解Zip file format。特别是，请注意本地文件条目的标题字段在整个文件被压缩之前无法更新（CRC、压缩和未压缩文件大小）。因此，在将其发送到响应流之前，您至少必须缓冲至少一个完整的文件。

所以充其量你可以这样做：

open archive
for each file
    create entry
    write file to entry
    read entry raw data and send to the response output stream

您将遇到的问题是没有记录的方式（我知道也没有未记录的方式）来读取原始数据。唯一的读取方法最终会解压缩数据并丢弃标头。

可能还有一些其他可用的 zip 库可以满足您的需求。我不建议尝试使用ZipArchive。

【讨论】：

您对将一堆文件堆在一起并像这样流式传输它们有什么建议吗？我不需要任何实际压缩，因为这些已经是在集合中的压缩文件。我只需要能够从多个文件中创建一个文件，并允许另一端的用户使用通常的操作系统安装工具打开它们。
另一个问题 - 我知道这不是同一个 zip 库，但根据 this 你可以用 SharpZip 做到这一点。我还没有机会尝试。
@BradMurray：我不知道 SharpZip 的功能，也不知道有什么方法可以满足您的要求。您可能可以编写一个 .tar 文件，但据我所知，Windows 中没有安装 tar 格式的阅读器（尽管有可能获得）。我也不知道 Windows 内置的 zip 支持是否可以处理您将生成的非常大的 zip 文件（看起来像 2 GB）。
我尝试了他们为 SharpZip 提供的示例代码，它的工作原理就像一个冠军 - 即时下载开始，服务器上没有大量资源。因为我正在压缩 JPG 和视频，所以我将压缩级别设置为零。我不确定当级别更大时它是否会起作用。使用 OSX 的解压缩器、Windows 的本机解压器和 7zip 解压下来的 zip 没有任何抱怨。唯一的缺点是浏览器不知道预期的文件总大小，因为我无法提前将该信息提供给响应。

【解决方案2】：

所以回答我自己的问题 - 这是适合我的解决方案：

private void ProcessWithSharpZipLib()
{
    byte[] buffer = new byte[4096];

    ICSharpCode.SharpZipLib.Zip.ZipOutputStream zipOutputStream = new ICSharpCode.SharpZipLib.Zip.ZipOutputStream(Response.OutputStream);
    zipOutputStream.SetLevel(0); //0-9, 9 being the highest level of compression
    zipOutputStream.UseZip64 = ICSharpCode.SharpZipLib.Zip.UseZip64.Off;

    foreach (KeyValuePair<string, string> fileNamePair in urls)
    {
        using (WebClient wc = new WebClient())
        {
            using (Stream wcStream = wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)))
            {
                ICSharpCode.SharpZipLib.Zip.ZipEntry entry = new ICSharpCode.SharpZipLib.Zip.ZipEntry(ICSharpCode.SharpZipLib.Zip.ZipEntry.CleanName(fileNamePair.Key));

                zipOutputStream.PutNextEntry(entry);

                int count = wcStream.Read(buffer, 0, buffer.Length);
                while (count > 0)
                {
                    zipOutputStream.Write(buffer, 0, count);
                    count = wcStream.Read(buffer, 0, buffer.Length);
                    if (!Response.IsClientConnected)
                    {
                        break;
                    }
                    Response.Flush();
                }
            }
        }
    }
    zipOutputStream.Close();

    Response.Flush();
    Response.End();
}

【讨论】：

【解决方案3】：

您使用的 zip 组件中必须有一种方法可以延迟将条目添加到存档中，即。在调用 zip.Save() 之后添加它们。我正在使用延迟技术使用 IonicZip，下载 flickr 相册的代码如下所示：

protected void Page_Load(object sender, EventArgs e)
{
    if (!IsLoggedIn())
        Response.Redirect("/login.aspx");
    else
    {
        // this is dco album id, find out what photosetId it maps to
        string albumId = Request.Params["id"];
        Album album = findAlbum(new Guid(albumId));
        Flickr flickr = FlickrInstance();
        PhotosetPhotoCollection photos = flickr.PhotosetsGetPhotos(album.PhotosetId, PhotoSearchExtras.OriginalUrl | PhotoSearchExtras.Large2048Url | PhotoSearchExtras.Large1600Url);

        Response.Clear();
        Response.BufferOutput = false;

        // ascii only
        //string archiveName = album.Title + ".zip";
        string archiveName = "photos.zip";
        Response.ContentType = "application/zip";
        Response.AddHeader("content-disposition", "attachment; filename=" + archiveName);
        int picCount = 0;
        string picNamePref = album.PhotosetId.Substring(album.PhotosetId.Length - 6);
        using (ZipFile zip = new ZipFile())
        {
            zip.CompressionMethod = CompressionMethod.None;
            zip.CompressionLevel = Ionic.Zlib.CompressionLevel.None;
            zip.ParallelDeflateThreshold = -1;
            _map = new Dictionary<string, string>();
            foreach (Photo p in photos)
            {
                string pictureUrl = p.Large2048Url;
                if (string.IsNullOrEmpty(pictureUrl))
                    pictureUrl = p.Large1600Url;
                if (string.IsNullOrEmpty(pictureUrl))
                    pictureUrl = p.LargeUrl;

                string pictureName = picNamePref + "_" + (++picCount).ToString("000") + ".jpg";
                _map.Add(pictureName, pictureUrl);
                zip.AddEntry(pictureName, processPicture);
            }
            zip.Save(Response.OutputStream);
        }
        Response.Close();
    }
}
private volatile Dictionary<string, string> _map;
protected void processPicture(string pictureName, Stream output)
{
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(_map[pictureName]);
    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    {
        using (Stream input = response.GetResponseStream())
        {
            byte[] buf = new byte[8092];
            int len;
            while ( (len = input.Read(buf, 0, buf.Length)) > 0)
                output.Write(buf, 0, len);
        }
        output.Flush();
    }
}

这样，Page_Load 中的代码会立即进入 zip.Save()，开始下载（客户端会看到“另存为”框，然后才会从 flickr 中提取图像。

【讨论】：

【解决方案4】：

此代码运行良好，但是当我将代码托管在 Windows azure 上作为云服务时，它会损坏我的 zip 文件并抛出消息无效文件

private void ProcessWithSharpZipLib(){
    byte[] buffer = new byte[4096];

    ICSharpCode.SharpZipLib.Zip.ZipOutputStream zipOutputStream = new ICSharpCode.SharpZipLib.Zip.ZipOutputStream(Response.OutputStream);
    zipOutputStream.SetLevel(0); //0-9, 9 being the highest level of compression
    zipOutputStream.UseZip64 = ICSharpCode.SharpZipLib.Zip.UseZip64.Off;

    foreach (KeyValuePair<string, string> fileNamePair in urls)
    {
        using (WebClient wc = new WebClient())
        {
            using (Stream wcStream = wc.OpenRead(GetUrlForEntryName(fileNamePair.Key)))
            {
                ICSharpCode.SharpZipLib.Zip.ZipEntry entry = new ICSharpCode.SharpZipLib.Zip.ZipEntry(ICSharpCode.SharpZipLib.Zip.ZipEntry.CleanName(fileNamePair.Key));

                zipOutputStream.PutNextEntry(entry);

                int count = wcStream.Read(buffer, 0, buffer.Length);
                while (count > 0)
                {
                    zipOutputStream.Write(buffer, 0, count);
                    count = wcStream.Read(buffer, 0, buffer.Length);
                    if (!Response.IsClientConnected)
                    {
                        break;
                    }
                    Response.Flush();
                }
            }
        }
    }
    zipOutputStream.Close();

    Response.Flush();
    Response.End();
}

此代码在本地机器上运行良好，但在部署到服务器后就不行了。如果它的大小很大，它会损坏我的 zip 文件。

【讨论】：