【发布时间】:2021-03-17 01:20:04
【问题描述】:
我的问题是参数应该转移到接下来的 10 条评论并迭代,直到没有更多评论可供游戏使用。然而,循环一遍又一遍地打印相同的 10 条评论,而不会进入下 10 条。这是下面的代码。谢谢
def review_scraper():
from alive_progress import alive_bar
import re
start_time = time.time()
url = "https://steamcommunity.com/app/933110/homecontent/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36'}
regex = re.compile('apphub_CardContentAuthorName')
for x in range(1, 5):
offset = (x*10) - 10
payload = {
'userreviewsoffset': offset,
'p': x,
'workshopitemspage': x,
'readytouseitemspage': x,
'mtxitemspage': x,
'itemspage': x,
'screenshotspage': x,
'videospage': x,
'artpage': x,
'allguidepage': x,
'webguidepage': x,
'integratedguidepage': x,
'discussionspage': x,
'numperpage': '10',
'browsefilter': 'mostrecent',
'browsefilter': 'mostrecent',
'l': 'english',
'appHubSubSection': '10',
'filterLanguage': 'default',
'searchText': '',
'forceanon': '1'}
page = requests.get(url, headers=headers, params=payload)
soup = BeautifulSoup(page.text, "html.parser")
cards = soup.find_all('div',{'class':'apphub_Card modalContentLink interactable'})
y = 0
for card in cards:
title = card.find('div',{'class':'title'}).text
hours = card.find('div',{'class':'hours'}).text
content = card.find('div',{'class':'apphub_CardTextContent'}).text.strip()
author = card.find('div',{'class':regex}).text
print(title + '\n' + hours + '\n' + content + '\n\n' + 'Author: ' + author + '\n' + '#'*50)
y = y + 1
print("--- %s seconds ---" % (time.time() - start_time))
print(y)
【问题讨论】:
标签: python json web-scraping beautifulsoup