从特定位置用 python 抓取

【问题标题】：Scraping with python from specific location从特定位置用 python 抓取
【发布时间】：2017-05-14 17:03:16
【问题描述】：

我正在尝试从网站上抓取一些信息，请记住我是 python 新手。

我当前的代码是这样的

from lxml import html
import requests

page1 = requests.get('snip')
page2 = requests.get('snip')
page3 = requests.get('snip')
page4 = requests.get('snip')

tree = html.fromstring(page.content)

我需要从这里提取数字（目前是 37）：

<div class='count col-xs-4'>
<p><strong>37</strong> <br class='hidden-md hidden-lg'/>followers</p>
</div>

但是我不太确定该怎么做。谁能帮我解决这个问题？

【问题讨论】：

标签： python web web-crawler screen-scraping

【解决方案1】：

您可以使用 BeautifulSoup (bs4) 以及许多其他工具来完成此操作。试试看，因为按照教程很容易。如果您仍然迷路，我可以帮助您更多。

【讨论】：

好的，我现在正在使用 bs4，但我有一个问题。 `

4
帖子

37
关注者

470
following
`我需要从following，followers等中提取数字，但是它们没有ID并且它们具有相同的类。
但这太完美了！使用 soup.find_all("div", "count col-xs-4")，并遍历结果列表。使用 .contents 方法获取其中的文本

【解决方案2】：

您可以使用 Xpath 来获取信息。以下应该可以工作。

tree =  html.fromstring(page1.text)   
number = tree.xpath('//*[@class="count col-xs-4"]/p/strong/text()')

【讨论】：