如何确定html的这些元素？答案

【问题标题】：How to determine these elements of html?如何确定html的这些元素？
【发布时间】：2020-07-27 06:28:01
【问题描述】：

在这个答案中，@Andrej Kesely 使用以下代码从this url 的 html 中删除不必要的元素（广告、巨大空间……）。

import requests
from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'):
    script.extract()

print(soup.h2.text)
print(''.join(map(str, soup.select_one('.hom').contents)))

在我看来，那些不必要的元素被标记为script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2。

您能否详细说明如何查看 html 结构（按 F12）以固定它们？

【问题讨论】：

一种方法是右键单击 chrome 并使用 livedom.validator.nu 或任何其他在线服务可视化 html DOM
非常感谢@bigbounty！我明白了你的想法。如果您不介意，请查看this question。

标签： html python-3.x beautifulsoup web-crawler

【解决方案1】：

@bigbounty 的评论解决了我的问题。我把它贴在这里是为了从未回答的列表中删除我的问题。

一种方法是右键单击 chrome 并使用 livedom.validator.nu 或任何其他在线服务可视化 html DOM

【讨论】：