【问题标题】:How to scrape data from list inside <script> from website?如何从网站的 <script> 列表中抓取数据?
【发布时间】:2020-07-10 22:16:27
【问题描述】:

这是我第一次进行网络抓取,我不知道如何从脚本标签内的字典列表中抓取数据。由于脚本标签没有类,我不知道如何访问该特定脚本标签的内容。

目前的代码是:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://champion.gg/statistics/').text


soup = BeautifulSoup(source, 'lxml')

stats = soup.find('script')

这是我要从中抓取的一小部分数据:

<script>
      matchupData.stats = [{"key":"Ezreal","role":"ADC","title":"Ezreal","general":{"winPercent":0.5046896283323604,"playPercent":0.17104628134933184,"banRate":0.03167301835610511,"experience":8.02309599159886,"kills":6.8780177725887,"deaths":5.2307193357981445,"assists":7.356567425569177,"totalDamageDealtToChampions":23163,"totalDamageTaken":18062,"totalHeal":2607,"largestKillingSpree":8,"minionsKilled":175.73481222027632,"neutralMinionsKilledTeamJungle":6.87918531491211,"neutralMinionsKilledEnemyJungle":1.9552831290134267,"goldEarned":11840,"overallPosition":1,"overallPositionChange":0}},{"key":"LeeSin","role":"Jungle","title":"Lee Sin","general":{"winPercent":0.47603732897085066,"playPercent":0.11936072603416044,"banRate":0.016735176155369534,"experience":11.93326860841424,"kills":6.229476502082094,"deaths":5.414578375966687,"assists":7.773758179654967,"totalDamageDealtToChampions":12340,"totalDamageTaken":26015,"totalHeal":7518,"largestKillingSpree":8,"minionsKilled":25.68465571088638,"neutralMinionsKilledTeamJungle":71.98670806067817,"neutralMinionsKilledEnemyJungle":8.807053093396787,"goldEarned":10255,"overallPosition":6,"overallPositionChange":0}},{"key":"Thresh","role":"Support","title":"Thresh","general":{"winPercent":0.4940108608284812,"playPercent":0.11318544159496746,"banRate":0.012075421458170381,"experience":10.618539868530172,"kills":1.975553333725421,"deaths":5.545197906251838,"assists":12.660628516536297,"totalDamageDealtToChampions":6957

【问题讨论】:

    标签: python web-scraping beautifulsoup python-requests


    【解决方案1】:

    数据为json格式。您可以通过以下方式获得它:

    import json
    
    stats = soup.find_all('script')
    for s in stats:
        if s.string and "matchupData.stats" in s.string:
            target = s.string.strip().split(" = ")[1][:-1]        
    json.loads(target)
    

    输出:

    [{'key': 'Ezreal',
      'role': 'ADC',
      'title': 'Ezreal',
      'general': {'winPercent': 0.5046896283323604,
       'playPercent': 0.17104628134933184,
       'banRate': 0.03167301835610511,
    

    等等,等等……

    【讨论】:

    • target 有可能是None
    • @bigbounty 不在此特定链接上;它确实在那里。
    猜你喜欢
    • 1970-01-01
    • 2013-05-21
    • 1970-01-01
    • 2014-07-06
    • 2018-10-13
    • 2019-08-27
    相关资源
    最近更新 更多