如何优先考虑一个条件而不是另一个条件？ [复制]答案

【问题标题】：How to prioritize a condition over another? [duplicate]如何优先考虑一个条件而不是另一个条件？ [复制]
【发布时间】：2018-04-16 14:55:40
【问题描述】：

我编写了一个脚本来解析每个网页的可见文本contact 或about 中的可用链接。然而，当我运行我的脚本时，我可以看到我的爬虫总是用于解析about 中的链接。只有当about 不可用时，它才会解析contact 中的链接。我怎样才能让我的脚本做相反的事情，我的意思是它会寻找连接到contact而不是about的链接。如果contact 不可用，那么只有它会解析about。我尝试了以下方法来完成它，但它正在按照我描述的方式进行。

这是我的尝试：

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

links = (
    "http://www.mount-zion.biz/",
    "http://www.latamcham.org/",
    "http://www.innovaprint.com.sg/",
    "http://www.cityscape.com.sg/"
    )

def Get_Link(site):
    res = requests.get(site)
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select("a[href]"):
        if "contact" in item.text.lower():
            abslink = urljoin(site,item['href']) ##I thought the script prioritizes the first condition but I am wrong
            print(abslink)
            break
        else:
            if "about" in item.text.lower():
                abslink = urljoin(site,item['href'])
                print(abslink)
                break

if __name__ == '__main__':
    for link in links:
        Get_Link(link)

有没有办法根据条件的可用性来确定条件的优先级？底线是我想让链接连接到contact。如果它不可用，则脚本将查找连接到about 的链接。

【问题讨论】：

考虑到多个if's 和elif's 语句之间的差异，您将此标记为重复，而我的问题是优先于另一个@jpp。我的帖子是用希伯来语写的吗？
不，但如果你这样做，我可能会更好地理解你！

标签： python python-3.x if-statement web-scraping

【解决方案1】：

不要使用else。改用少量if。还要检查what's difference between if, elif and else。

你的函数应该是这样的：

def Get_Link(site):
    res = requests.get(site)
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select("a[href]"):
        if "contact" in item.text.lower() or "about" in item.text.lower():
            abslink = urljoin(site,item['href']) 
            print(abslink)
            break

您不能使用 break 语句，因为它们会破坏程序的块，而第二个 if 永远不会触发。

另请注意，在 Python 中，我们有 convention 来命名蛇案例中的方法/函数，如下所示：my_function() 或 my_method() 和 CamelCase 中的类名，如下所示：MyClass。

编辑：

好的，您的代码似乎更复杂，因为您在另一个循环中运行循环。所以基本上你有几个选择：

首先使if "contact" 循环，如果在所有情况下都失败，请使用“about”
在代码中放置一些标志来控制if 语句
使用函数编写

或者破解它：

def Get_Link(site):
    res = requests.get(site)
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select("a[href]"):
        if "contact" in item.text.lower():
            abslink = urljoin(site,item['href'])
            print(abslink)
            return 0 # Exit from function
    for item in soup.select("a[href]"):
        if "about" in item.text.lower():
            abslink = urljoin(site,item['href'])
            print(abslink)
            return 0

【讨论】：

我尝试了您的解决方案，但结果仍然相同。如果我从上面的脚本中注释掉else 块之后的部分，那么脚本会解析每个网页中的“联系人”链接，但我想在我的脚本中保持这两个条件处于活动状态，这样如果一个条件丢失，另一个将是使用。还有其他建议可以解决这个问题吗？谢谢。
@Topto 我已经编辑了答案。立即尝试。
是的，我明白了。我相信你没有注意到它会产生什么结果。我之前的脚本和您建议的脚本产生的结果没有区别。结果两个脚本都产生http://www.mount-zion.biz/index.html http://www.latamcham.org/about-us-2/ http://www.innovaprint.com.sg/about.html http://www.cityscape.com.sg/?page_id=27 我预计会有以下结果：http://www.mount-zion.biz/contactus.html http://www.latamcham.org/contact-us/ http://www.innovaprint.com.sg/contact.html http://www.cityscape.com.sg/?page_id=37。
@Topto。好的，我已经检查过了。正如我所建议的，问题出在您的break 语句中。他们从for 块中退出您的程序，而另一个if 永远不会触发。
感谢您的编辑。然而，这一次的结果变得非常可怕。我希望从上面提到的四个站点中填充 4 个链接，但它给了我 13 个链接。最重要的是，它提供了连接到about 和contact 的所有链接。如果是这样，那么我认为没有必要在这里使用conditional statement。我仍然想知道这个问题与导致重复链接的问题有何相似之处。谢谢。