【问题标题】:Why does my import urlopen from urllib.request not go through?为什么我从 urllib.request 导入 urlopen 没有通过?
【发布时间】:2021-05-22 21:25:39
【问题描述】:

我有下面的代码,我最终想要进行网络抓取和分析。

我的代码已经运行了将近一个小时,但似乎并没有从这个站点中提取出来。


import bs4 as bs
from urllib.request import urlopen as ureq


my_url2 = 'https://www.dreamteamfc.com/g/#tournament/stats-centre-stats'

ureq(my_url2)

【问题讨论】:

    标签: web-scraping beautifulsoup urllib


    【解决方案1】:

    您要查找的数据是通过 Ajax 从其他 URL 加载的(所以 BeautifulSoup 看不到它)。此外,使用requests 模块获取页面/Json 数据 - 它会自动处理压缩、重定向等。

    要加载数据,请使用以下示例:

    import json
    import requests
    
    url = "https://nuk-data.s3.eu-west-1.amazonaws.com/json/players_tournament.json"
    data = requests.get(url).json()
    
    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))
    
    # print some data to screen:
    for player in data:
        print(
            "{:<15} {:<15} {}".format(
                player["first_name"], player["last_name"], player["cost"]
            )
        )
    

    打印:

    Cristiano       Ronaldo         7000000
    Goran           Pandev          1000000
    David           Marshall        2000000
    Jesús           Navas           3000000
    Kasper          Schmeichel      3000000
    Sergio          Ramos           5000000
    Raúl            Albiol          2000000
    Giorgio         Chiellini       3500000
    
    ...and so on.
    

    编辑:要将数据加载到数据框中,您可以使用.json_normalize

    import json
    import requests
    import pandas as pd
    
    url = "https://nuk-data.s3.eu-west-1.amazonaws.com/json/players_tournament.json"
    data = requests.get(url).json()
    
    df = pd.json_normalize(data)
    print(df)
    df.to_csv("data.csv", index=None)
    

    打印:

             id  first_name         last_name  squad_id     cost   status positions  locked injury_type injury_duration suspension_length cname  stats.round_rank  stats.season_rank  stats.games_played  stats.total_points  stats.avg_points  stats.high_score  stats.low_score  stats.last_3_avg  stats.last_5_avg  stats.selections  stats.owned_by  stats.MIN  stats.SMR  stats.SMB  stats.GS  stats.ASS  stats.YC  stats.RC  stats.PM  stats.PS  stats.CS  stats.GC  stats.star_man_awards  stats.7_plus_ratings  stats.goals  stats.assists  stats.cards  stats.clean_sheets  tournament_stats.star_man_awards  tournament_stats.7_plus_ratings  tournament_stats.goals  tournament_stats.assists  tournament_stats.cards  tournament_stats.clean_sheets
    0     14937   Cristiano           Ronaldo       359  7000000  playing       [4]       0        None            None              None  None                 0                  0                   9                   0                 0                 0                0                 0                 0             22760            41.3        764          0          0        15          0         1         0         0         0         7         0                      0                     0            0              0            0                   0                                 0                                0                       0                         0                       0                              0
    1     15061       Goran            Pandev       504  1000000  playing       [4]       0        None            None              None  None                 0                  0                   0                   0                 0                 0                0                 0                 0                50             0.1          0          0          0         0          0         0         0         0         0         0         0                      0                     0            0              0            0                   0                                 0                                0                       0                         0                       0                              0
    2     15144       David          Marshall       115  2000000  playing       [1]       0        None            None              None  None                 0                  0                   0                   0                 0                 0                0                 0                 0               166             0.3          0          0          0         0          0         0         0         0         0         0         0                      0                     0            0              0            0                   0                                 0                                0                       0                         0                       0                              0
    3     17740       Jesús             Navas       118  3000000  playing       [3]       0        None            None              None  None                 0                  0                   0                   0                 0                 0                0                 0                 0               154             0.3          0          0          0         0          0         0         0         0         0         0         0                      0                     0            0              0            0                   0                                 0                                0                       0                         0                       0                              0
    4     17745      Kasper        Schmeichel       369  3000000  playing       [1]       0        None            None              None  None                 0                  0                   9                   0                 0                 0                0                 0                 0              3261             5.9        810          0          0         0          0         1         0         0         0         4         0                      0                     0            0              0            0                   0                                 0                                0                       0                         0                       0                              0
    5     17861      Sergio             Ramos       118  5000000  playing       [2]       0        None            None              None  None                 0                  0                   9                   0                 0                 0                0                 0                 0             14647            26.6        712          0          0         1          0         1         0         0         0         6         0                      0                     0            0              0            0                   0                                 0                                0                       0                         0                       0                              0
    
    ...and so on.
    

    并保存data.csv(来自 LibreOffice 的屏幕截图):

    【讨论】:

    • 这很有帮助,谢谢!如何将上面的“所有数据”结果拆分为数据集?
    • 哇,这么简单,简直不敢相信!非常感谢
    猜你喜欢
    • 2015-10-14
    • 2021-02-12
    • 1970-01-01
    • 2017-12-07
    • 1970-01-01
    • 1970-01-01
    • 2020-10-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多