【问题标题】:Unable to let my script keep trying for few times when a condition is not met当条件不满足时,无法让我的脚本继续尝试几次
【发布时间】:2019-10-22 23:50:19
【问题描述】:

我在 python 中创建了一个脚本来从网页的不同链接中获取某些帖子的标题。问题是我尝试玩的网页有时不会为我提供有效的响应,但当我尝试两次或三次时,我确实得到了有效的响应。

我一直在尝试以这样的方式创建一个循环,以便脚本检查我定义的标题是否为空。如果标题什么都没有,那么脚本将继续循环 4 次以查看是否可以成功。但是,在每个链接的第四次尝试之后,脚本将去另一个链接重复相同的操作,直到所有链接都用尽。

这是我迄今为止的尝试:

import time
import requests
from bs4 import BeautifulSoup

links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
    ]
counter = 0

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""

    if not title:
        while counter<=4:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            fetch_data(link)
    else:
        counter = 0

    print("tried with this link:",link)

if __name__ == '__main__':
    for link in links:
        fetch_data(link)

这是我此时可以在控制台看到的输出:

trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

我的预期输出:

trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

PS I used wrong selector within my script so that I can let it meet the condition I've defined above.

如何让我的脚本在不满足条件时继续尝试每个链接几次

【问题讨论】:

  • @QHarr 的回答会给你你想要的。但现在我只是想知道你没有得到有效回复是什么意思?

标签: python python-3.x web-scraping conditional-statements


【解决方案1】:

我想重新安排你的代码如下。

import time
import requests
from bs4 import BeautifulSoup
​
links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
    ]

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""
​
    if not title:
        while counter<=4:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            fetch_data(link)   
​
if __name__ == '__main__':
    for link in links:
        counter = 0
        fetch_data(link)
        print("tried with this link:",link)

【讨论】:

  • 这是您的预期吗?
  • 对不起@QHarr,如果我追求你的逻辑,用例将会很复杂。查看the simpler one。谢谢。
猜你喜欢
  • 2017-02-03
  • 1970-01-01
  • 2018-09-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-07-28
  • 2012-09-16
相关资源
最近更新 更多