【问题标题】:Web page doesn't render using python requests and BeautifulSoup网页不使用 python 请求和 BeautifulSoup 呈现
【发布时间】:2021-10-15 18:14:30
【问题描述】:

我正在尝试使用 python 中的 requests 和 BeautifulSoup 模块从该网页抓取文本:https://seekingalpha.com/article/4441901-apple-inc-aapl-ceo-tim-cook-on-q3-2021-results-earnings-call-transcript

但是,一旦提出请求并尝试使用以下代码获取文本,我只会得到以下文本:

url = "https://seekingalpha.com/article/4441901-apple-inc-aapl-ceo-tim-cook-on-q3-2021-results-earnings-call-transcript"
head={'YOUR HEADER PARAMETERS'}
response = requests.get(url, headers=head)
transcript = BeautifulSoup(response.content, 'html.parser').text

正文:

Apple Inc. (AAPL) CEO Tim Cook on Q3 2021 Results - Earnings Call Transcript | Seeking Alpha


Javascript is Disabled
Your current browser configurationis not compatible with this site.

有什么办法可以绕过这个从网页中获取文本?

谢谢

【问题讨论】:

  • 使用selenium

标签: python web-scraping beautifulsoup python-requests


【解决方案1】:

API 正在加载 URL 中的数据。我已尝试获取标题和内容。

这里是 API 端点。 您可以向该端点发出 GET 请求并找到您需要的数据。

https://seekingalpha.com/api/v3/articles/4441901?include=author%2CprimaryTickers%2CsecondaryTickers%2CotherTags%2Cpresentations%2Cpresentations.slides%2Cauthor.authorResearch%2Cco_authors%2CpromotedService%2Csentiments

content我的意思是变量内容)是 HTML 格式,你可以使用beautifulsoup 来提取你需要的任何数据。

这是打印title 和一些content 的代码。

import requests
url = 'https://seekingalpha.com/api/v3/articles/4441901?include=author%2CprimaryTickers%2CsecondaryTickers%2CotherTags%2Cpresentations%2Cpresentations.slides%2Cauthor.authorResearch%2Cco_authors%2CpromotedService%2Csentiments'

r = requests.get(url)
r = r.json()
title = r['data']['attributes']['title']
content = r['data']['attributes']['content']

print(title)
print(content[:500].strip())
Apple Inc. (AAPL) CEO Tim Cook on Q3 2021 Results - Earnings Call Transcript

<p>Apple Inc. <span class="ticker-hover-wrapper">(NASDAQ:<a href="https://seekingalpha.com/symbol/AAPL" title="Apple Inc.">AAPL</a>)</span> Q3 2021 Earnings Conference Call July 27, 2021  5:00 PM ET</p>
<p><strong>Company Participants</strong></p>
<p>Tejas Gala - Director of IR and Corporate Finance</p>
<p>Tim Cook - CEO</p>
<p>Luca Maestri - CFO</p>
<p><strong>Conference Call Participants</strong></p>
<p>Chris Caso - Raymond James</p>
<p>Jim Suva - Citigroup</p>
<p>Shannon Cross - Cross Resea
.
.
.

【讨论】:

  • 太好了,谢谢测试解决方案。你是怎么找到api url的?
  • 你可以看到使用 Chrome 开发工具 -> 网络。它将向您显示网页发出的所有请求。
猜你喜欢
  • 2020-04-20
  • 1970-01-01
  • 1970-01-01
  • 2016-02-21
  • 2020-07-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-07-27
相关资源
最近更新 更多