在 Python 中 gzip 文件答案

【问题标题】：gzip a file in Python在 Python 中 gzip 文件
【发布时间】：2011-12-30 16:46:54
【问题描述】：

我想用 Python 压缩一个文件。我正在尝试使用 subprocss.check_call()，但它一直失败并出现错误“OSError: [Errno 2] No such file or directory”。我在这里尝试的有问题吗？有没有比使用 subprocess.check_call 更好的压缩文件的方法？

from subprocess import check_call

def gZipFile(fullFilePath)
    check_call('gzip ' + fullFilePath)

谢谢！！

【问题讨论】：

为什么不docs.python.org/library/gzip.html？
相关：要从目录/dir/path 创建一个压缩包archive.tar.gz，你可以使用shutil.make_archive('archive', 'gztar', '/dir/path')

标签： python gzip subprocess

【解决方案1】：

有一个模块gzip。用法：

如何创建压缩 GZIP 文件的示例：

import gzip
content = b"Lots of content here"
f = gzip.open('/home/joe/file.txt.gz', 'wb')
f.write(content)
f.close()

如何 GZIP 压缩现有文件的示例：

import gzip
f_in = open('/home/joe/file.txt')
f_out = gzip.open('/home/joe/file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

编辑：

Jace Browning's answer 在 Python >= 2.7 中使用 with 显然更简洁易读，所以我的第二个 sn-p 将（并且应该）看起来像：

import gzip
with open('/home/joe/file.txt', 'rb') as f_in, gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
    f_out.writelines(f_in)

【讨论】：

第二个版本是否像 gzip 命令那样用 gzip 文件替换原始文件？好像没有。
@Benoît：由于输出文件的名称与正在读取的文件不同，因此很明显它没有这样做。这样做需要将压缩数据临时存储在其他地方，直到原始文件中的所有数据都被压缩。
使用 gzip 时，输出文件名与输入文件名不同。创建输出文件后，它仍然会删除输入文件。我只是在问 python gzip 模块是否做了同样的事情。
以读取模式打开的文件只是正常读取。 gzip 模块无法知道数据的来源并执行诸如删除文件之类的操作。之后使用Path(in_path).unlink() 删除文件。或者只使用check_call(['gzip', in_path])，它会更快地压缩并删除文件。
我建议更正： content = b"Lots of content here"

【解决方案2】：

以二进制 (rb) 模式读取原始文件，然后使用 gzip.open 创建 gzip 文件，您可以使用 writelines 像普通文件一样写入：

import gzip

with open("path/to/file", 'rb') as orig_file:
    with gzip.open("path/to/file.gz", 'wb') as zipped_file:
        zipped_file.writelines(orig_file)

甚至更短，您可以将with 语句合并为一行：

with open('path/to/file', 'rb') as src, gzip.open('path/to/file.gz', 'wb') as dst:
    dst.writelines(src)

【讨论】：

在这种情况下，我们是否必须将文件写回相同的路径？我们不能暂时将它们存储在其他地方以便我们以后可以将它们保存到 S3 中吗？

【解决方案3】：

试试这个：

check_call(['gzip', fullFilePath])

根据您对这些文件数据的处理方式，Skirmantas 指向http://docs.python.org/library/gzip.html 的链接也可能会有所帮助。请注意页面底部附近的示例。如果您不需要访问数据，或者您的 Python 代码中没有数据，执行 gzip 可能是最简洁的方法，因此您不必在 Python 中处理数据。

【讨论】：

好吧，如果“干净”是正确的词，我想知道，但它肯定是最快的方式，而且需要最少的代码。

【解决方案4】：

来自docs for Python3

压缩现有文件

import gzip
import shutil
with open('file.txt', 'rb') as f_in:
    with gzip.open('file.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

# or because I hate nested with statements

import gzip
import shutil
from contextlib import ExitStack
with ExitStack() as stack:
    f_in = stack.enter_context(open('file.txt', 'rb'))
    f_out = stack.enter_context(gzip.open('file.txt.gz', 'wb'))
    shutil.copyfileobj(f_in, f_out)

创建一个新的 gzip 文件：

import gzip
content = b"Lots of content here"
with gzip.open("file.txt.gz", "wb") as f:
    f.write(content)

注意content 被转换为字节

如果您没有像上面的示例那样将内容创建为字符串/字节文字，另一种方法是

import gzip
# get content as a string from somewhere else in the code
with gzip.open("file.txt.gz", "wb") as f:
    f.write(content.encode("utf-8"))

有关其他编码方法的讨论，请参阅 this SO question。

【讨论】：

我不知道ExitStack...有趣！

【解决方案5】：

使用gzip 模块：

import gzip
import os

in_file = "somefile.data"
in_data = open(in_file, "rb").read()
out_gz = "foo.gz"
gzf = gzip.open(out_gz, "wb")
gzf.write(in_data)
gzf.close()

# If you want to delete the original file after the gzip is done:
os.unlink(in_file)

您的错误：OSError: [Errno 2] No such file or directory' 告诉您文件 fullFilePath 不存在。如果您仍然需要走这条路，请确保该文件存在于您的系统上，并且您使用的是绝对路径而不是相对路径。

【讨论】：

感谢大家的快速响应。这里的每个人都在建议 gzip。我也尝试过。这是更好的方法吗？我不使用它的原因是它保留了原始文件是。所以我最终得到了两个版本 - 常规和 gzip 文件。我正在访问文件的数据。@retracele，你的修复工作，非常感谢。我仍然想知道我应该使用 subprocess 还是 gzip。
@Rinks 最简单的方法是：gzip 完成后，调用os.unlink(original_File_Name) 删除您从中制作 gzip 的原始文件。查看我的编辑。
@Rinks：我不使用它的原因是它保留了原始文件 - 那你为什么不之后删除文件呢？
再次感谢。我当然可以稍后删除该文件。我将测试这两种方法 -gzip 和 check_call 几天并最终确定一个。

【解决方案6】：

这方面的文档实际上非常简单

如何读取压缩文件的示例：

import gzip
f = gzip.open('file.txt.gz', 'rb')
file_content = f.read()
f.close()

如何创建压缩 GZIP 文件的示例：

import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'wb')
f.write(content)
f.close()

如何 GZIP 压缩现有文件的示例：

import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

https://docs.python.org/2/library/gzip.html

这就是整个文档。 . .

【讨论】：

中间示例仅在我写 content = b"Lots of content here" 时运行（注意 b）。

【解决方案7】：

import gzip

def gzip_file(src_path, dst_path):
    with open(src_path, 'rb') as src, gzip.open(dst_path, 'wb') as dst:
        for chunk in iter(lambda: src.read(4096), b""):
            dst.write(chunk)

【讨论】：

【解决方案8】：

Windows 子进程可用于运行 7za 实用程序：从 https://www.7-zip.org/download.html 下载 7-Zip Extra：独立控制台版本、7z DLL、Far Manager 插件 compact 获取 gzip 目录中的所有 csv 文件并将每个文件压缩为 gzip 格式。原始文件被删除。 7z 选项可以在https://sevenzip.osdn.jp/chm/cmdline/index.htm中找到

import os
from pathlib import Path
import subprocess


def compact(cspath, tec, extn, prgm):  # compress each extn file in tec dir to gzip format
    xlspath = cspath / tec  # tec location
    for baself in xlspath.glob('*.' + str(extn)):  # file iteration inside directory
        source = str(baself)
        target = str(baself) + '.gz'
        try:
            subprocess.call(prgm + " a -tgzip \"" + target + "\" \"" + source + "\" -mx=5")
            os.remove(baself)  # remove src xls file
        except:
            print("Error while deleting file : ", baself)
    return 


exe = "C:\\7za\\7za.exe"  # 7za.exe (a = alone) is a standalone version of 7-Zip
csvpath = Path('C:/xml/baseline/')  # working directory
compact(csvpath, 'gzip', 'csv', exe)  # xpress each csv file in gzip dir to gzip format

【讨论】：