【发布时间】:2019-10-11 17:52:10
【问题描述】:
我是 Python 新手,我正在尝试构建一个从各种网站下载和提取 zip 文件的程序。我已经粘贴了我为此编写的两个程序。第一个程序是一个名为“urls”的“子”程序,我将其导入到第二个程序中。我正在尝试遍历每个 url,并在每个 url 中遍历每个数据文件,最后检查“关键字”列表是否是文件名的一部分,如果是,下载并提取该文件。我陷入了需要遍历“关键字”列表以检查要下载的文件名的部分。你能帮忙吗?我感谢您的任何建议或指导。谢谢你。安迪
**Program #1 called "urls":**
urls = [
"https://www.dentoncad.com/content/data-extracts/1-appraisal-data-extracts/1-2019/1-preliminary/2019-preliminary" \
"-protax-data.zip",
"http://www.dallascad.org/ViewPDFs.aspx?type=3&id=//DCAD.ORG\WEB\WEBDATA\WEBFORMS\DATA%20PRODUCTS\DCAD2020_" \
"CURRENT.ZIP"
]
keywords = [
"APPRAISAL_ENTITY_INFO",
"SalesExport",
"account_info",
"account_apprl_year",
"res_detail",
"applied_std_exempt",
"land",
"acct_exempt_value"
]`enter code here`
enter code here
**Program #2 (primary program):**
import requests
import zipfile
import os
import urls
def main():
print_header()
dwnld_zfiles_from_web()
def print_header():
print('---------------------------------------------------------------------')
print(' DOWNLOAD ZIP FILES FROM THE WEB APP')
print('---------------------------------------------------------------------')
print()
def dwnld_zfiles_from_web():
file_num = 0
dest_folder = "C:/Users/agbpi/OneDrive/Desktop/test//"
# loop through each url within the url list, assigning it a unique file number each iteration
for url in urls.urls:
file_num = file_num + 1
url_resp = requests.get(url, allow_redirects=True, timeout=5)
if url_resp.status_code == 200:
saved_archive = os.path.basename(url)
with open(saved_archive, 'wb') as f:
f.write(url_resp.content)
# for match in urls.keywords:
print("Extracting...", url_resp.url)
with zipfile.ZipFile('file{0}'.format(str(file_num)), "r") as z:
zip_files = z.namelist()
# print(zip_files)
for content in zip_files:
while urls.keywords in content:
z.extract(path=dest_folder, member=content)
# while urls.keywords in zip_files:
# for content in zip_files:
# z.extract(path=dest_folder, member=content)
print("Finished!")
if __name__ == '__main__':
main()
【问题讨论】:
-
您是否还在为寻找 zip 文件或从 zip 中提取单个文件而苦恼?到目前为止你的代码是什么?
-
@Trapli 感谢您的回复。这是我坚持的代码。
-
@Trapli elif "data-real-and-mh" in url_resp.url: with zipfile.ZipFile('file{0}'.format(str(file_num)), "r") as z: zip_files=print(z.namelist()) # 显示 zip 文件夹中的可用文件 if "APPRAISAL_ENTITY_INFO" in zip_files: z.extract(path=dest_folder, member="2019-04-04_005519_APPRAISAL_ENTITY_INFO.txt")
-
对不起,未格式化的代码...我对网站的工作方式不熟悉。我感谢你的帮助。我正在尝试确定动态指向 member="YYYY-MM-DD_APPRAISAL_ENTITY_INFO.txt" 的最佳方式,因为文件名会随着日期的变化而变化。
标签: python-3.x zipfile