在 Python 中使用请求的不完整下载答案

【问题标题】：Incomplete download using requests in Python在 Python 中使用请求的不完整下载
【发布时间】：2020-07-14 13:13:41
【问题描述】：

我正在关注一个预测空气质量指数的在线项目。为此，我们需要首先获取从网站下载的数据。以下是作者提供的源代码：

import os
import time
import requests
import sys

def retrieve_html():
    for year in range(2013,2019):
        for month in range(1,13):
            if(month<10):
                url='http://en.tutiempo.net/climate/0{}-{}/ws-421820.html'.format(month
                                                                          ,year)
            else:
                url='http://en.tutiempo.net/climate/{}-{}/ws-421820.html'.format(month
                                                                          ,year)
            texts=requests.get(url)
            text_utf=texts.text.encode('utf=8')
            
            if not os.path.exists("Data/Html_Data/{}".format(year)):
                os.makedirs("Data/Html_Data/{}".format(year))
            with open("Data/Html_Data/{}/{}.html".format(year,month),"wb") as output:
                output.write(text_utf)
            
        sys.stdout.flush()
        
if __name__=="__main__":
    start_time=time.time()
    retrieve_html()
    stop_time=time.time()
    print("Time taken {}".format(stop_time-start_time))

这工作得很好。现在，我尝试自己编写相同的代码。这是我的代码：

import os
import time
import requests
import sys


def retrieve_html():
    for year in range(2013,2019):
        for month in range(1,13):
            if(month<10):
                url='http://en.tutiempo.net/climate/0{}-{}/ws-421820.html'.format(month, year)
            else:
                url='http://en.tutiempo.net/climate/{}-{}/ws-421820.html'.format(month, year)
        
        texts=requests.get(url)
        text_utf=texts.text.encode("utf=8")
        
        if not os.path.exists("Data/Html_Data/{}".format(year)):
            os.makedirs("Data/Html_Data/{}".format(year))
        
        with open("Data/Html_Data/{}/{}.html".format(year,month),"wb") as output:
            output.write(text_utf)
            
    sys.stdout.flush()
        
if __name__=="__main__":
    start_time=time.time()
    retrieve_html()
    stop_time=time.time()
    print("Time taken: {}".format(stop_time-start_time))

但每当我运行此脚本时，只会下载第 12 个月的数据，而不会下载其他月份的其余数据。我使用作者提供的代码进行了检查，虽然我的代码与他的完全相同，但它工作得非常好。这真让我抓狂。谁能指出我哪里出错了？

【问题讨论】：

你的代码不一样。 for month in range(1,13) 循环下面只有四行代码缩进，而作者有十行。
谢谢！我正在从 Java 迁移过来，所以我完全忘记了 Python 中的缩进。

标签： python data-science data-analysis

【解决方案1】：

不完全一样，有不同的缩进：

【讨论】：

【解决方案2】：

嗯，你应该缩进这个：

        texts=requests.get(url)
        text_utf=texts.text.encode("utf=8")
        
        if not os.path.exists("Data/Html_Data/{}".format(year)):
            os.makedirs("Data/Html_Data/{}".format(year))
        
        with open("Data/Html_Data/{}/{}.html".format(year,month),"wb") as output:
            output.write(text_utf)

【讨论】：

【解决方案3】：

代码是正确的，只是存在缩进问题。以下代码应该在内部for循环中

texts=requests.get(url)
text_utf=texts.text.encode("utf=8")
        
if not os.path.exists("Data/Html_Data/{}".format(year)):
   os.makedirs("Data/Html_Data/{}".format(year))
        
   with open("Data/Html_Data/{}/{}.html".format(year,month),"wb") as output:
        output.write(text_utf)

并且下面的代码应该在外层的for循环中

sys.stdout.flush()

【讨论】：