Python中的网络爬虫答案

【问题标题】：Webscraper in PythonPython中的网络爬虫
【发布时间】：2020-09-24 18:21:57
【问题描述】：

我的问题是是否有可能得到一个像这样的跨度内的数字：

<html junk>
 <div class="test">
     <span>
     55
     </span>
 </div>
</html junk>

如您所见，span 没有类或 id。

我当前的代码只是刮板的默认代码（删除了用户代理和 URL）：

import requests
from bs4 import BeautifulSoup

URL = ''

headers = {"User-Agent": ''}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

#Here is where the "55" should be found (the number is going to change over time so im not excactly looking for it
title = soup.find('') 

print(title)

【问题讨论】：

那么你的问题是什么？你的代码是做什么的？什么不起作用？即使填写了编辑过的值，它看起来除了查找长度为零的字符串之外什么也没做。

标签： python html web web-scraping

【解决方案1】：

如果我正确理解了您的问题，您是在尝试获取两个跨度标签之间的数字吗？如果是这样，您可以这样做。

import requests
from bs4 import BeautifulSoup

URL = ''

headers = {"User-Agent": ''}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

#Here is where the "55" should be found (the number is going to change over time so im not excactly looking for it
title = soup.find('span').getText() 

print(title)

【讨论】：

我得到一个 AttributeError: 'NoneType' object has no attribute 'getText'
抱歉，请将.getText() 替换为.text