【问题标题】:Can't parse full webpage, using BeautifulSoup无法解析完整的网页,使用 BeautifulSoup
【发布时间】:2021-08-20 14:38:31
【问题描述】:

QA服务sli.do解析页面:

import urllib.request
from bs4 import BeautifulSoup

voting_url = "https://app.sli.do/event/i6jqiqxm/live/questions"
voting_page = urllib.request.urlopen(voting_url)

soup = BeautifulSoup(voting_page, 'lxml')

print(soup.prettify())

for link in soup.find_all('span'):
    print(link.get('Linkify'))

print(soup.prettify())返回html-document,但是没有带有标签spanclass="Linkify"的内容,其中包含问题的文本。可以在 Chrome 中找到:https://app.sli.do/event/i6jqiqxm/live/questions

【问题讨论】:

  • 你检查过voting_page的内容吗?您正在寻找的元素是通过 JavaScript 生成的。 requestbs4 都不能解释或以其他方式执行 JavaScript。因此,这个问题与Using python Requests with javascript pages 重复

标签: python parsing beautifulsoup request urllib


【解决方案1】:

您可以通过 api 因为数据是动态生成的。如果 access_token 部分也是动态变化的,您可能需要弄清楚它。

import requests

s = requests.Session()
auth = s.post('https://app.sli.do/api/v0.5/events/8ca635b0-e80e-47be-b506-cb131dbbed4c/auth').json()
access_token = auth['access_token']

url = 'https://app.sli.do/api/v0.5/events/8ca635b0-e80e-47be-b506-cb131dbbed4c/questions'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
'authorization': 'Bearer %s' %access_token}
payload = {
'path': '/questions',
'eventSectionId': '4145620',
'sort': 'top',
'highlighted_first': 'true',
'limit': '9999'}

jsonData= s.get(url, headers=headers, params=payload).json()


for each in jsonData:
    print(each['text'])

输出:

Can I ask a question anonymously?
How many participants does Slido support?
Do participants need an account to join?
Can I download the list of questions  from my Q&A?
Can the moderators control what questions are seen?
How do you pronounce Slido?
Is it possible to change the colors of Slido so that they match our branding? ?
What tools does Slido integrate with?
Is it easy to ask a question? 
Can i send a link to participants prior to event?
Can participants submit questions at any time?
Is there a profanity control for the text of the questions? 
Is there an option to have a name required?
Is Slido free to use?
Is Slido good for a regular meeting q&a with the CEO where you can ask questions anonymously in advance?
how do i upload slido into my powerpoint presentation?
Can everyone see each other's questions?

【讨论】:

  • 嘿 @chitown88 只是为了从 xhr 获得知识,你找到了这个网址??你能分享一下你是怎么找到的
  • @BhavyaParikh,是的。那正是我找到它的地方
  • @chitown88 你在哪里找到这部分:8ca635b0-e80e-47be-b506-cb131dbbed4c ?
  • @BhavyaParikh 也在 XHR 中
  • 你是说 Crome 开发者工具?
猜你喜欢
  • 1970-01-01
  • 2012-10-04
  • 2018-03-13
  • 1970-01-01
  • 2019-04-12
  • 2019-10-23
  • 2020-02-25
  • 2015-09-16
  • 1970-01-01
相关资源
最近更新 更多