使用 Beautiful Soup 提取 HTML 表格数据答案

【问题标题】：Extracting HTML Table data using Beautiful Soup使用 Beautiful Soup 提取 HTML 表格数据
【发布时间】：2020-12-13 01:04:30
【问题描述】：

我希望使用 Beautiful Soup 从this 页面提取所有品牌。到目前为止，我的程序是：

from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup

def main():
    opts = Options()
    opts.headless = True
    assert opts.headless  # Operating in headless mode
    browser = Firefox(options=opts)
    browser.get('https://neighborhoodgoods.com/pages/brands')
    html = browser.page_source
    soup = BeautifulSoup(html, 'html.parser')

    brand = []
    for tag in soup.find('table'):
        brand.append(tag.contents.text)
    print(brand)

    browser.close()
    print('This program is terminated.')

我正在努力找出要使用的正确标签，因为数据嵌套在 tr/td 中。有什么建议吗？非常感谢！

【问题讨论】：

预期输出是什么？
您还想要brandlistRight 类下的数据（描述）吗？还是只是公司名称？

标签： python html beautifulsoup

【解决方案1】：

如果我正确理解你的问题，你只想得到公司名称（每张表的第一个<td>）

尝试使用 CSS 选择器 td:nth-of-type(1)，它选择每个表的第一个 <td>。

import requests
from bs4 import BeautifulSoup

URL = "https://neighborhoodgoods.com/pages/brands"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

print([tag.text for tag in soup.select("td:nth-of-type(1)")])

输出：

['A.N Other', 'Act + Acre', ...And on.. , 'Wild One']

【讨论】：