使用 XMR 页面抓取 Steam 评论，而不是其他评论答案

【问题标题】：Scraping Steam Reviews with XMR page, not going to rest of reviews使用 XMR 页面抓取 Steam 评论，而不是其他评论
【发布时间】：2021-03-17 01:20:04
【问题描述】：

我的问题是参数应该转移到接下来的 10 条评论并迭代，直到没有更多评论可供游戏使用。然而，循环一遍又一遍地打印相同的 10 条评论，而不会进入下 10 条。这是下面的代码。谢谢

def review_scraper():
    from alive_progress import alive_bar
    import re
    start_time = time.time()

    url = "https://steamcommunity.com/app/933110/homecontent/"
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36'}
    regex = re.compile('apphub_CardContentAuthorName')
    for x in range(1, 5):
        offset = (x*10) - 10

        payload = {
        'userreviewsoffset': offset,
        'p': x,
        'workshopitemspage': x,
        'readytouseitemspage': x,
        'mtxitemspage': x,
        'itemspage': x,
        'screenshotspage': x,
        'videospage': x,
        'artpage': x,
        'allguidepage': x,
        'webguidepage': x,
        'integratedguidepage': x,
        'discussionspage': x,
        'numperpage': '10',
        'browsefilter': 'mostrecent',
        'browsefilter': 'mostrecent',
        'l': 'english',
        'appHubSubSection': '10',
        'filterLanguage': 'default',
        'searchText': '',
        'forceanon': '1'}
        page = requests.get(url, headers=headers, params=payload)
        soup = BeautifulSoup(page.text, "html.parser")
        cards = soup.find_all('div',{'class':'apphub_Card modalContentLink interactable'})
        y = 0
        for card in cards:
            title = card.find('div',{'class':'title'}).text
            hours = card.find('div',{'class':'hours'}).text
            content = card.find('div',{'class':'apphub_CardTextContent'}).text.strip()
            author = card.find('div',{'class':regex}).text

            print(title + '\n' + hours + '\n' + content + '\n\n' + 'Author: ' + author + '\n' + '#'*50)
            y = y + 1
    print("--- %s seconds ---" % (time.time() - start_time))
    print(y)

【问题讨论】：

标签： python json web-scraping beautifulsoup

【解决方案1】：

因此，如果有人遇到与我相同的问题，我找到了解决方案。这是下面列出的代码。

def review_scraper():
    from alive_progress import alive_bar
    import re
    import json
    start_time = time.time()
    z = 0
    url = "https://store.steampowered.com/appreviews/933110?json=1&cursor=*"
    response = requests.get(url).json()
    tot_reviews = (response['query_summary']['total_reviews'])
    pages_to_iterate = tot_reviews/20
    pages_to_iterate = round(int(pages_to_iterate))
    cursor = "*"
    for i in range(20):
        url = "https://store.steampowered.com/appreviews/933110?json=1&cursor=" + str(cursor)
        response = requests.get(url).json()
        reviews = (response['reviews'])
        cursor = (response['cursor'])
        for x in range(response['query_summary']['num_reviews']):
            print(reviews[x]['review'])
            z = z + 1
    print(z)

唯一的问题是，在大约 200 条评论之后，它会开始重复相同的评论。这是光标在此标记周围找不到下一部分的问题。但是，此代码将使您上路。

【讨论】：