【发布时间】:2020-07-16 17:16:22
【问题描述】:
我正在从如下网页中抓取网址
from bs4 import BeautifulSoup
import requests
url = "https://www.investing.com/search/?q=Axon&tab=news"
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.content, "html.parser")
for s in soup.find_all('div',{'class':'articleItem'}):
for a in s.find_all('div',{'class':'textDiv'}):
for b in a.find_all('a',{'class':'title'}):
print(b.get('href'))
输出如下所示
/news/stock-market-news/axovant-updates-on-parkinsons-candidate-axolentipd-1713474
/news/stock-market-news/digital-alley-up-24-on-axon-withdrawal-from-patent-challenge-1728115
/news/stock-market-news/axovant-sciences-misses-by-009-763209
/analysis/microns-mu-shares-gain-on-q3-earnings-beat-upbeat-guidance-200529289
/analysis/axon,-espr,-momo,-zyne-200182141
/analysis/factors-likely-to-impact-axon-enterprises-aaxn-q4-earnings-200391393
{{link}}
{{link}}
问题是
- 未提取所有 URL
- 看到最后两条,为什么会这样?
以上两个问题有什么解决办法吗?
【问题讨论】:
-
它在网站上无限加载,当您发出获取请求时,我只加载它的一部分,它在浏览器中就像这样,但是当您向下滚动更多页面加载时
标签: python web-scraping beautifulsoup python-requests