【发布时间】:2020-01-29 18:06:55
【问题描述】:
当我尝试抓取此page 的信息时遇到问题。
这是我的代码: '''
import requests
from bs4 import BeautifulSoup
request = requests.get("https://www.aiscore.com/basketball/20200128")
page = request.content
soup = BeautifulSoup(page, 'html.parser')
print(soup.prettify())
matchs = soup.findAll("div", {"class":"list"})
for match in matchs:
hour = match.find("span", {"class":"fs-12 flex-1 text-center"})
hour = hour.text
status = match.find("div", {"class":"fs-12 color-999 flex-1 text-center"})
status = status.text
teams = match.findAll("div", {"class":"w-o-h"})
i = 1
for team in teams:
if i == 1:
t1 = team.text
elif i == 2:
t2 = team.text
else:
print("+ de 2 équipes dans le match")
i += 1
scores = match.findAll("div", {"class":"flex align-center justify-center fs-12 color-999 w-bar-100 flex-1"})
i = 1
for score in scores:
scs_qtps = score.findAll("div", {"class":"flex-1 text-center isVisible"})
if i == 1:
k = 1
for sc_qtp in scs_qtps:
if k == 1:
sc_qt1_t1 = sc_qtp.text
elif k == 2:
sc_qt2_t1 = sc_qtp.text
elif k == 3:
sc_qt3_t1 = sc_qtp.text
elif k == 4:
sc_qt4_t1 = sc_qtp.text
else :
print("plus de 4 quart tps")
k += 1
sc_final_t1 = score.find("div", {"class":"flex-1 text-center"})
sc_final_t1 = sc_final_t1.text
elif i == 2:
k = 1
for sc_qtp in scs_qtps:
if k == 1:
sc_qt1_t2 = sc_qtp.text
elif k == 2:
sc_qt2_t2 = sc_qtp.text
elif k == 3:
sc_qt3_t2 = sc_qtp.text
elif k == 4:
sc_qt4_t2 = sc_qtp.text
else :
print("plus de 4 quart tps")
k += 1
sc_final_t2 = score.find("div", {"class":"flex-1 text-center"})
sc_final_t2 = sc_final_t2.text
i += 1
odds = match.findAll("div", {"style":"height: 19px; line-height: 19px; color: rgb(102, 102, 102);"})
i = 1
for odd in odds:
if i == 1:
odd_t1 = odd.text
elif i == 2:
odd_t2 = odd.text
i += 1
print(hour, status, t1, t2)
print(sc_qt1_t1, sc_qt2_t1, sc_qt3_t1, sc_qt4_t1, "%t", sc_final_t1)
print(sc_qt1_t2, sc_qt2_t2, sc_qt3_t2, sc_qt4_t2, "%t", sc_final_t2)
print("1 :", odd_t1, "; 2 :", odd_t2)
'''
我想抓取所有分数,但有一个问题:我无法访问 html 页面中的所有数据。事实上,我想要抓取的所有信息都位于这个 div 中:
<div class="vue-recycle-scroller scroller page-mode direction-vertical"
/div>
但是当我用print(soup.prettify()) 打印html 页面时,这个div 中除了!-- -- 之外没有任何内容。
所以我的问题是:我怎样才能访问“定位”在这个div 中的信息?
我对所有类型的答案持开放态度(也许我应该使用 Selenium 来抓取那种信息?)
非常感谢!
对不起我的基本英语
【问题讨论】:
-
我认为它会将所有数据动态附加到该 div。这就是为什么您没有在 Print 中获得它。我认为您应该使用 Selenium 来模仿浏览器行为。
-
是的,这就是我的想法,但我怎么能模仿 Selenium 滚动?你知道一些有用的内容吗?
-
仅供参考,它是 scrape(以及 scraping、scraped、scraper)而不是 scrap
标签: python html web-scraping beautifulsoup