网页不使用 python 请求和 BeautifulSoup 呈现答案

【问题标题】：Web page doesn't render using python requests and BeautifulSoup网页不使用 python 请求和 BeautifulSoup 呈现
【发布时间】：2021-10-15 18:14:30
【问题描述】：

我正在尝试使用 python 中的 requests 和 BeautifulSoup 模块从该网页抓取文本：https://seekingalpha.com/article/4441901-apple-inc-aapl-ceo-tim-cook-on-q3-2021-results-earnings-call-transcript

但是，一旦提出请求并尝试使用以下代码获取文本，我只会得到以下文本：

url = "https://seekingalpha.com/article/4441901-apple-inc-aapl-ceo-tim-cook-on-q3-2021-results-earnings-call-transcript"
head={'YOUR HEADER PARAMETERS'}
response = requests.get(url, headers=head)
transcript = BeautifulSoup(response.content, 'html.parser').text

正文：

Apple Inc. (AAPL) CEO Tim Cook on Q3 2021 Results - Earnings Call Transcript | Seeking Alpha


Javascript is Disabled
Your current browser configurationis not compatible with this site.

有什么办法可以绕过这个从网页中获取文本？

谢谢

【问题讨论】：

使用selenium。

标签： python web-scraping beautifulsoup python-requests

【解决方案1】：

API 正在加载 URL 中的数据。我已尝试获取标题和内容。

这里是 API 端点。 您可以向该端点发出 GET 请求并找到您需要的数据。

https://seekingalpha.com/api/v3/articles/4441901?include=author%2CprimaryTickers%2CsecondaryTickers%2CotherTags%2Cpresentations%2Cpresentations.slides%2Cauthor.authorResearch%2Cco_authors%2CpromotedService%2Csentiments

content（我的意思是变量内容）是 HTML 格式，你可以使用beautifulsoup 来提取你需要的任何数据。

这是打印title 和一些content 的代码。

import requests
url = 'https://seekingalpha.com/api/v3/articles/4441901?include=author%2CprimaryTickers%2CsecondaryTickers%2CotherTags%2Cpresentations%2Cpresentations.slides%2Cauthor.authorResearch%2Cco_authors%2CpromotedService%2Csentiments'

r = requests.get(url)
r = r.json()
title = r['data']['attributes']['title']
content = r['data']['attributes']['content']

print(title)
print(content[:500].strip())

Apple Inc. (AAPL) CEO Tim Cook on Q3 2021 Results - Earnings Call Transcript

<p>Apple Inc. <span class="ticker-hover-wrapper">(NASDAQ:<a href="https://seekingalpha.com/symbol/AAPL" title="Apple Inc.">AAPL</a>)</span> Q3 2021 Earnings Conference Call July 27, 2021  5:00 PM ET</p>
<p><strong>Company Participants</strong></p>
<p>Tejas Gala - Director of IR and Corporate Finance</p>
<p>Tim Cook - CEO</p>
<p>Luca Maestri - CFO</p>
<p><strong>Conference Call Participants</strong></p>
<p>Chris Caso - Raymond James</p>
<p>Jim Suva - Citigroup</p>
<p>Shannon Cross - Cross Resea
.
.
.

【讨论】：

太好了，谢谢测试解决方案。你是怎么找到api url的？
你可以看到使用 Chrome 开发工具 -> 网络。它将向您显示网页发出的所有请求。