【问题标题】:Scraping Issue. My loop doesn't keep going after error刮问题。错误后我的循环不会继续
【发布时间】:2021-08-09 07:55:57
【问题描述】:
import requests
from bs4 import BeautifulSoup
import string

try:
    for c in string.ascii_lowercase:
        URL = 'https://www.colonialzone-dr.com/'+c+'-dominicanismos-dictionary'
        page = requests.get(URL)
        soup = BeautifulSoup(page.content, 'html.parser')
        main_div = soup.find('div', attrs={"class": "entry-content"})

        words = main_div.find_all("p")
        for word in words:
            print(word.text)
except:
    print("No Vocabulary Avaliable for " + c)
    pass

x 的网页不存在,因此它会停止,但我希望它继续获取 y 和 z 的网页的信息

【问题讨论】:

    标签: python css arrays web-scraping


    【解决方案1】:

    您需要将try/except 放入循环中。

    
    for c in string.ascii_lowercase:
        URL = 'https://www.colonialzone-dr.com/'+c+'-dominicanismos-dictionary'
        try:
            page = requests.get(URL)
            soup = BeautifulSoup(page.content, 'html.parser')
            main_div = soup.find('div', attrs={"class": "entry-content"})
            words = main_div.find_all("p")
            for word in words:
                print(word.text)
        except:
            print("No Vocabulary Avaliable for " + c)
    

    【讨论】:

    • 同一个,在w页之后就停止了,没有继续
    • Traceback(最近一次调用最后一次):文件“C:/Users/Mehki/PycharmProjects/SideProjects/scrape.py”,第 16 行,在 words = main_div.find_all("p" ) AttributeError: 'NoneType' 对象没有属性 'find_all'
    • 你应该检查main_div是否包含任何东西。
    • 我认为错误在于获取页面。我也将try 放在所有其他代码周围。
    • 但你也可以使用if main_div:
    【解决方案2】:

    在无法定位的main_div 内使用try 和except 块,因此它会转到except 并打印关联字母!

    from bs4 import BeautifulSoup
    import requests
    for i in string.ascii_lowercase:
        html=requests.get("https://www.colonialzone-dr.com/"+i+"-dominicanismos-dictionary").text
        soup=BeautifulSoup(html,"html.parser")
        try:
            main_div = soup.find('div', attrs={"class": "entry-content"})
            words = main_div.find_all("p")
            for word in words:
                print(word.text)
        except:
            print("could not find for letter",i)
    

    【讨论】:

      猜你喜欢
      • 2021-03-23
      • 1970-01-01
      • 2018-07-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多