使用 Python 的 BeautifulSoup 上的属性错误（网络抓取）答案

【问题标题】：Attribute error on BeautifulSoup with Python (web scraping)使用 Python 的 BeautifulSoup 上的属性错误（网络抓取）
【发布时间】：2019-11-10 18:59:31
【问题描述】：

我正在关注一个关于使用 Python 进行网络抓取的教程，到目前为止我有这个：

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.de/JBL-Charge-Bluetooth-Lautsprecher-Schwarz-      integrierter/dp/B07HGHRYCY/ref=sr_1_2_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&  keywords=jbl+charge+4&qid=1562775856&s=gateway&sr=8-2-spons&psc=1'
headers = {
    "User-Agent": 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Mobile Safari/537.36'}
page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
title = soup.find(id="productTitle").get_text()
print(title.strip())

我正在尝试从 Amazon 打印某些产品的名称，但每当我尝试从 BeautifulSoup 库运行 get_text() 方法时，我都会收到此错误：AttributeError: 'NoneType' object has no attribute 'get_text'。怎样才能成功打印产品名称？

【问题讨论】：

标签： python web-scraping beautifulsoup amazon-product-api

【解决方案1】：

get_text() 不起作用，因为您的选择器没有找到合适的元素，而是返回了None。所以你在一个没有get_text() 方法的空元素上调用它。我不确定为什么id=productTitle 不能像查看 imo 的 HTML 那样工作。但是，您可以使用不同的选择器并获取其上方的 div 以获得类似的结果：

title = soup.find(id="title").get_text()
print(title.strip())

输出是：

"JBL Charge 4 Bluetooth-Lautsprecher in Schwarz, Wasserfeste, portable Boombox mit integrierter Powerbank, Mit nur einer Akku-Ladung bis zu 20 Stunden kabellos Musik streamen"

【讨论】：

【解决方案2】：

尝试以下方法：

title = soup.find('span', id="productTitle").get_text()

这应该可行。

【讨论】：