【问题标题】:Scraping Steam Reviews with XMR page, not going to rest of reviews使用 XMR 页面抓取 Steam 评论,而不是其他评论
【发布时间】:2021-03-17 01:20:04
【问题描述】:

我的问题是参数应该转移到接下来的 10 条评论并迭代,直到没有更多评论可供游戏使用。然而,循环一遍又一遍地打印相同的 10 条评论,而不会进入下 10 条。这是下面的代码。谢谢

def review_scraper():
    from alive_progress import alive_bar
    import re
    start_time = time.time()

    url = "https://steamcommunity.com/app/933110/homecontent/"
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36'}
    regex = re.compile('apphub_CardContentAuthorName')
    for x in range(1, 5):
        offset = (x*10) - 10

        payload = {
        'userreviewsoffset': offset,
        'p': x,
        'workshopitemspage': x,
        'readytouseitemspage': x,
        'mtxitemspage': x,
        'itemspage': x,
        'screenshotspage': x,
        'videospage': x,
        'artpage': x,
        'allguidepage': x,
        'webguidepage': x,
        'integratedguidepage': x,
        'discussionspage': x,
        'numperpage': '10',
        'browsefilter': 'mostrecent',
        'browsefilter': 'mostrecent',
        'l': 'english',
        'appHubSubSection': '10',
        'filterLanguage': 'default',
        'searchText': '',
        'forceanon': '1'}
        page = requests.get(url, headers=headers, params=payload)
        soup = BeautifulSoup(page.text, "html.parser")
        cards = soup.find_all('div',{'class':'apphub_Card modalContentLink interactable'})
        y = 0
        for card in cards:
            title = card.find('div',{'class':'title'}).text
            hours = card.find('div',{'class':'hours'}).text
            content = card.find('div',{'class':'apphub_CardTextContent'}).text.strip()
            author = card.find('div',{'class':regex}).text

            print(title + '\n' + hours + '\n' + content + '\n\n' + 'Author: ' + author + '\n' + '#'*50)
            y = y + 1
    print("--- %s seconds ---" % (time.time() - start_time))
    print(y)

【问题讨论】:

    标签: python json web-scraping beautifulsoup


    【解决方案1】:

    因此,如果有人遇到与我相同的问题,我找到了解决方案。这是下面列出的代码。

    def review_scraper():
        from alive_progress import alive_bar
        import re
        import json
        start_time = time.time()
        z = 0
        url = "https://store.steampowered.com/appreviews/933110?json=1&cursor=*"
        response = requests.get(url).json()
        tot_reviews = (response['query_summary']['total_reviews'])
        pages_to_iterate = tot_reviews/20
        pages_to_iterate = round(int(pages_to_iterate))
        cursor = "*"
        for i in range(20):
            url = "https://store.steampowered.com/appreviews/933110?json=1&cursor=" + str(cursor)
            response = requests.get(url).json()
            reviews = (response['reviews'])
            cursor = (response['cursor'])
            for x in range(response['query_summary']['num_reviews']):
                print(reviews[x]['review'])
                z = z + 1
        print(z)
    

    唯一的问题是,在大约 200 条评论之后,它会开始重复相同的评论。这是光标在此标记周围找不到下一部分的问题。但是,此代码将使您上路。

    【讨论】:

      猜你喜欢
      • 2020-03-05
      • 2014-12-11
      • 1970-01-01
      • 1970-01-01
      • 2019-05-08
      • 2023-02-24
      • 1970-01-01
      • 2022-06-30
      • 1970-01-01
      相关资源
      最近更新 更多