测试 BeautifulSoup 中的标签中是否存在属性答案

【问题标题】：Test if an attribute is present in a tag in BeautifulSoup测试 BeautifulSoup 中的标签中是否存在属性
【发布时间】：2011-06-28 06:42:51
【问题描述】：

我想获取文档中的所有<script> 标签，然后根据某些属性的存在（或不存在）处理每个标签。

例如，对于每个<script> 标记，如果存在for 属性，则执行某些操作；否则，如果存在 bar 属性，请执行其他操作。

这是我目前正在做的事情：

outputDoc = BeautifulSoup(''.join(output))
scriptTags = outputDoc.findAll('script', attrs = {'for' : True})

但是这样我过滤了所有带有for 属性的<script> 标记...但是我丢失了其他标记（那些没有for 属性的标记）。

【问题讨论】：

“但是 if ... in 不起作用”？那是什么意思？语法错误？ “不起作用”是什么意思？请非常具体地说明出了什么问题。
您要测试 any 标签、all 标签中是否存在某个属性，还是分别处理标签的每个出现？跨度>

标签： python beautifulsoup

【解决方案1】：

如果我理解得很好，你只需要所有的脚本标签，然后检查其中的一些属性吗？

scriptTags = outputDoc.findAll('script')
for script in scriptTags:
    if script.has_attr('some_attribute'):
        do_something()

【讨论】：

我无法执行以下操作：如果脚本中的“some_attribute”？，这就是我所追求的，我想避免一次又一次地调用 findAll...
要检查可用属性，您必须使用 python dict 方法，例如：script.has_key('some_attribute')
如何检查标签是否有任何属性？虽然 tag.has_key('some_attribute') 工作正常，但 tag.keys() 会引发异常（'NoneType' 对象不可调用）。
请更新这篇文章，has_key 已被弃用。请改用 has_attr。
很遗憾，对我没有用。也许这种方式soup_response.find('err').string is not None 也可以用于其他属性...

【解决方案2】：

为了以后参考，has_key 已经弃用是beautifulsoup 4。现在你需要使用has_attr

scriptTags = outputDoc.find_all('script')
  for script in scriptTags:
    if script.has_attr('some_attribute'):
      do_something()

【讨论】：

【解决方案3】：

您不需要任何 lambdas 来按属性过滤，您只需在 find 或 find_all 中使用 some_attribute=True。

script_tags = soup.find_all('script', some_attribute=True)

# or

script_tags = soup.find_all('script', {"some-data-attribute": True})

这里还有更多其他方法的示例：

soup = bs4.BeautifulSoup(html)

# Find all with a specific attribute

tags = soup.find_all(src=True)
tags = soup.select("[src]")

# Find all meta with either name or http-equiv attribute.

soup.select("meta[name],meta[http-equiv]")

# find any tags with any name or source attribute.

soup.select("[name], [src]")

# find first/any script with a src attribute.

tag = soup.find('script', src=True)
tag = soup.select_one("script[src]")

# find all tags with a name attribute beginning with foo
# or any src beginning with /path
soup.select("[name^=foo], [src^=/path]")

# find all tags with a name attribute that contains foo
# or any src containing with whatever
soup.select("[name*=foo], [src*=whatever]")

# find all tags with a name attribute that endwith foo
# or any src that ends with  whatever
soup.select("[name$=foo], [src$=whatever]")

您还可以将正则表达式与 find 或 find_all 一起使用：

import re
# starting with
soup.find_all("script", src=re.compile("^whatever"))
# contains
soup.find_all("script", src=re.compile("whatever"))
# ends with 
soup.find_all("script", src=re.compile("whatever$"))

【讨论】：

我同意这应该是公认的答案。我简化了主要示例以使其更加突出。

【解决方案4】：

如果你只需要获取带有属性的标签，你可以使用 lambda:

soup = bs4.BeautifulSoup(YOUR_CONTENT)

带有属性的标签

tags = soup.find_all(lambda tag: 'src' in tag.attrs)

或

tags = soup.find_all(lambda tag: tag.has_attr('src'))

带有属性的特定标签

tag = soup.find(lambda tag: tag.name == 'script' and 'src' in tag.attrs)

等等……

认为它可能有用。

【讨论】：

优雅的解决方案！

【解决方案5】：

你可以检查一些属性是否存在

scriptTags = outputDoc.findAll('script', some_attribute=True) 对于 scriptTags 中的脚本：做一点事（）

【讨论】：

【解决方案6】：

通过使用 pprint 模块，您可以检查元素的内容。

from pprint import pprint

pprint(vars(element))

在 bs4 元素上使用它会打印出类似这样的内容：

{'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']},
 'can_be_empty_element': False,
 'contents': [u'\n\t\t\t\tNESNA\n\t'],
 'hidden': False,
 'name': u'span',
 'namespace': None,
 'next_element': u'\n\t\t\t\tNESNA\n\t',
 'next_sibling': u'\n',
 'parent': <h1 class="pie-compoundheader" itemprop="name">\n<span class="pie-description">Bedside table</span>\n<span class="pie-productname size-3 name global-name">\n\t\t\t\tNESNA\n\t</span>\n</h1>,
 'parser_class': <class 'bs4.BeautifulSoup'>,
 'prefix': None,
 'previous_element': u'\n',
 'previous_sibling': u'\n'}

要访问一个属性 - 让我们说类列表 - 使用以下内容：

class_list = element.attrs.get('class', [])

您可以使用这种方法过滤元素：

for script in soup.find_all('script'):
    if script.attrs.get('for'):
        # ... Has 'for' attr
    elif "myClass" in script.attrs.get('class', []):
        # ... Has class "myClass"
    else: 
        # ... Do something else

【讨论】：

【解决方案7】：

一种选择所需内容的简单方法。

outputDoc.select("script[for]")

【讨论】：