需要使用 Python 从 Azure Databricks 解压缩存档答案

【问题标题】：Need to decompress an archive from Azure Databricks, using Python需要使用 Python 从 Azure Databricks 解压缩存档
【发布时间】：2021-10-22 02:21:43
【问题描述】：

我正在使用代码来解压缩来自 blob 存储的存档，并且此代码已经可以用于另一个具有 300mb 的存档，但是在尝试解压缩另一个比这更大的存档时，我遇到了这个错误：

"NotImplementedError: That compression method is not supported"
The last lines of error console show this :
/usr/local/lib/python3.8/zipfile.py in _get_decompressor(compress_type)
    718 
    719 def _get_decompressor(compress_type):
--> 720     _check_compression(compress_type)
    721     if compress_type == ZIP_STORED:
    722         return None

/usr/local/lib/python3.8/zipfile.py in _check_compression(compression)
    698                 "Compression requires the (missing) lzma module")
    699     else:
--> 700         raise NotImplementedError("That compression method is not supported")

我正在使用此代码：

# mother folder
files = dbutils.fs.ls(dl_path)

for fi in sorted(files, reverse=True):
  zip_files = zipfile.ZipFile(f'/dbfs{dl_path}{fi.name}')
  print(zip_files.namelist())
  for f in zip_files.namelist():
    zip_files.extract(f, str(extract_path).replace('dbfs:', '/dbfs'))

我不知道为什么在其中一个档案中，这个有效，而另一个无效。我想它可能是关于大小？所以我正在考虑尝试一下：第一个代码和第二个代码除外？ Idk，有人有提示吗？

【问题讨论】：

代码中的哪一行失败了？
" zip_files.extract(f, str(extract_path).replace('dbfs:', '/dbfs'))" 这一行。进行提取时，不支持我尝试提取的这个更大的 zip，但它不知道是什么，除非大小不同，否则我认为什么都没有。

标签： python compression azure-databricks unzip zipfile

【解决方案1】：

zip 文件中允许使用多种压缩方法。该 zip 文件似乎使用了 Python 库不支持的压缩方法。

对未能列出内容的 zip 文件使用命令行解压缩。做unzip -lv file.zip。它将列出使用的压缩方法。

【讨论】：

是的，bash 是我的解决方案。所以我做了这段代码： %sh unzip $path_zip -d extract_path 并且在考虑路径的同时进行了其他修改以使这段代码尽可能通用。不过谢谢你的回答！
那么Python不支持的压缩方式是什么？