使多个 if 语句不那么冗长答案

【问题标题】：Making multiple if statements less verbose使多个 if 语句不那么冗长
【发布时间】：2014-01-31 16:49:18
【问题描述】：

我正在抓取一个在其 html 标记中不使用任何有用的类或 id 的网页，因此我不得不删除所有链接并在链接中查找模式。这是一个示例 html 的样子

<span>Category</span><link href='example.com/link-about-a'>A</a>

在另一个页面上，我们可能有不同的类别

<span>Category</span><link href='example.com/link-about-b'>B</a>

使用beautifulsoup4，我目前的解决方案是这样的

def category(soup):
    for x in soup.find_all('a'):
        if 'link-about-a' in x['href']:
            return 'A'
        if 'link-about-b' in x['href']:
            return 'B'

等等..但这很丑。

我想知道是否有一种方法可以减少这种冗长。

喜欢用字典

categories = {'A': 'link-about-a', 'B': 'link-about-b'}

并将其简化为单个表达式。

【问题讨论】：

链接中的模式的可预测性如何？如果子字符串匹配是查找模式的唯一方法，那么 Eric 的解决方案很好。对于我只是作为键/值对迭代的东西，我个人可能会使用元组的元组而不是字典，但这是一个微不足道的区别。但是，如果您可以使用正则表达式之类的东西可靠地提取模式，那么拥有将模式映射到类别的字典将是最好的方法。
@PeterDeGlopper 该模式是可预测的，并且来自预定义的类别列表（A、B、C ...），所以你是对的，我发现正则表达式实现更有用。谢谢。

标签： python filter beautifulsoup list-comprehension

【解决方案1】：

你只需要另一个循环：

for x in soup.find_all('a'):
    for k, v in categories.iteritems():
        if v in x['href']:
            return k

虽然如果你想要一个表达式：

category = next((
    k for x in soup.find_all('a')
      for k, v in categories.iteritems()
      if v in x['href']
), None)

【讨论】：

【解决方案2】：

使用正则表达式和类别列表可能更灵活一些：

categories = [[re.compile('link-about-a'), 'A'], 
              [re.compile('link-about-b'), 'B']]

def category(soup):
    for x in soup.findAll('a'):
        for expression, description in categories:
            if expression.search(x['href']):
                return description
    else:
        return None

【讨论】：