【问题标题】:Creating a python program that scrapes file from a website创建一个从网站抓取文件的python程序
【发布时间】:2015-03-01 12:50:44
【问题描述】:

这是我目前所拥有的

import urllib
Champions=["Aatrox","Ahri","Akali","Alistar","Amumu","Anivia","Annie","Ashe","Azir","Blitzcrank","Brand","Braum","Caitlyn","Cassiopeia","ChoGath","Corki","Darius","Diana","DrMundo","Draven","Elise","Evelynn","Ezreal","Fiddlesticks","Fiora","Fizz","Galio","Gangplank","Garen","Gnar","Gragas","Graves","Hecarim","Heimerdinger","Irelia","Janna","JarvanIV","Jax","Jayce","Jinx","Kalista","Karma","Karthus","Kassadin","Katarina","Kayle","Kennen","KhaZix","KogMaw","LeBlanc","LeeSin","Leona","Lissandra","Lucian","Lulu","Lux","Malphite","Malzahar","Maokai","MasterYi","MissFortune","Mordekaiser","Morgana","Nami","Nasus","Nautilus","Nidalee","Nocturne","Nunu","Olaf","Orianna","Pantheon","Poppy","Quinn","Rammus","RekSai","Renekton","Rengar","Riven","Rumble","Ryze","Sejuani","Shaco","Shen","Shyvana","Singed","Sion","Sivir","Skarner","Sona","Soraka","Swain","Syndra","Talon","Taric","Teemo","Thresh","Tristana","Trundle","Tryndamere","TwistedFate","Twitch","Udyr","Urgot","Varus","Vayne","Veigar","VelKoz","Vi","Viktor","Vladimir","Volibear","Warwick","Wukong","Xerath","XinZhao","Yasuo","Yorick","Zac","Zed","Ziggs","Zilean","Zyra"]
currentCount=0
while currentCount < len(Champions):
    urllib.urlretrieve("http://www.lolflavor.com/champions/"+Champions[currentCount]+ "/Recommended/"+Champions[currentCount]+"_lane_scrape.json","C:\Users\Jay\Desktop\LolFlavor\ " +Champions[currentCount]+ "\ "+Champions[currentCount]+ "_lane_scrape.json")
    currentCount+=1

程序的目的是使用列表和 currentCount 来获得冠军,然后进入网站,例如“Aatrox”http://www.lolflavor.com/champions/Aatrox/Recommended/Aatrox_lane_scrape.json,然后下载文件并将其存储在文件夹 LolFlavor/本例中为 Aatrox/Aatrox_lane_scrape.json。

亚托克斯的位元会根据英雄而变化。 谁能帮我让它工作?

编辑:值错误的当前代码:

import json
import os
import requests
Champions=["Aatrox","Ahri","Akali","Alistar","Amumu","Anivia","Annie","Ashe","Azir","Blitzcrank","Brand","Braum","Caitlyn","Cassiopeia","ChoGath","Corki","Darius","Diana","DrMundo","Draven","Elise","Evelynn","Ezreal","Fiddlesticks","Fiora","Fizz","Galio","Gangplank","Garen","Gnar","Gragas","Graves","Hecarim","Heimerdinger","Irelia","Janna","JarvanIV","Jax","Jayce","Jinx","Kalista","Karma","Karthus","Kassadin","Katarina","Kayle","Kennen","KhaZix","KogMaw","LeBlanc","LeeSin","Leona","Lissandra","Lucian","Lulu","Lux","Malphite","Malzahar","Maokai","MasterYi","MissFortune","Mordekaiser","Morgana","Nami","Nasus","Nautilus","Nidalee","Nocturne","Nunu","Olaf","Orianna","Pantheon","Poppy","Quinn","Rammus","RekSai","Renekton","Rengar","Riven","Rumble","Ryze","Sejuani","Shaco","Shen","Shyvana","Singed","Sion","Sivir","Skarner","Sona","Soraka","Swain","Syndra","Talon","Taric","Teemo","Thresh","Tristana","Trundle","Tryndamere","TwistedFate","Twitch","Udyr","Urgot","Varus","Vayne","Veigar","VelKoz","Vi","Viktor","Vladimir","Volibear","Warwick","Wukong","Xerath","XinZhao","Yasuo","Yorick","Zac","Zed","Ziggs","Zilean","Zyra"]
for champ in Champions:
    os.makedirs("C:\\Users\\Jay\\Desktop\\LolFlavor\\{}\\Recommended".format(champ), exist_ok=True)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_lane_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_lane_scrape.json".format(champ,champ))
        json.dump(r.json(),f)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_jungle_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_jungle_scrape.json".format(champ,champ))
        json.dump(r.json(),f)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_support_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_support_scrape.json".format(champ,champ))
        json.dump(r.json(),f)
    with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_aram_scrape.json".format(champ,champ),"w") as f:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_aram_scrape.json".format(champ,champ))
        json.dump(r.json(),f)

【问题讨论】:

    标签: python html download


    【解决方案1】:
    import  requests
    
    Champions=["Aatrox","Ahri","Akali","Alistar","Amumu","Anivia","Annie","Ashe","Azir","Blitzcrank","Brand","Braum","Caitlyn","Cassiopeia","ChoGath","Corki","Darius","Diana","DrMundo","Draven","Elise","Evelynn","Ezreal","Fiddlesticks","Fiora","Fizz","Galio","Gangplank","Garen","Gnar","Gragas","Graves","Hecarim","Heimerdinger","Irelia","Janna","JarvanIV","Jax","Jayce","Jinx","Kalista","Karma","Karthus","Kassadin","Katarina","Kayle","Kennen","KhaZix","KogMaw","LeBlanc","LeeSin","Leona","Lissandra","Lucian","Lulu","Lux","Malphite","Malzahar","Maokai","MasterYi","MissFortune","Mordekaiser","Morgana","Nami","Nasus","Nautilus","Nidalee","Nocturne","Nunu","Olaf","Orianna","Pantheon","Poppy","Quinn","Rammus","RekSai","Renekton","Rengar","Riven","Rumble","Ryze","Sejuani","Shaco","Shen","Shyvana","Singed","Sion","Sivir","Skarner","Sona","Soraka","Swain","Syndra","Talon","Taric","Teemo","Thresh","Tristana","Trundle","Tryndamere","TwistedFate","Twitch","Udyr","Urgot","Varus","Vayne","Veigar","VelKoz","Vi","Viktor","Vladimir","Volibear","Warwick","Wukong","Xerath","XinZhao","Yasuo","Yorick","Zac","Zed","Ziggs","Zilean","Zyra"]
    
    for champ in Champions:
        r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_lane_scrape.json".format(champ,champ))
        print(r.json())
    

    如果您想将每个文件保存到文件中。 dump json。

    import json
    import simplejson 
    
    for champ in Champions:
        with open(r"C:\Users\Jay\Desktop\LolFlavor\{}_lane_scrape.json".format(champ),"w") as f:
            try:
                r = requests.get("http://www.lolflavor.com/champions/{}/Recommended/{}_lane_scrape.json".format(champ, champ))
                json.dump(r.json(),f)
            except simplejson.scanner.JSONDecodeError as e:
                print(e.r.url)
    

    错误来自404 - File or directory not found,因为其中一个调用失败,因此没有有效的 json 可解码。 违规网址是:

    u'http://www.lolflavor.com/champions/Wukong/Recommended/Wukong_lane_scrape.json'
    

    如果您在浏览器中尝试,也会出现 404 错误。这是因为没有用户Wukong,可以通过在浏览器中打开http://www.lolflavor.com/champions/Wukong/来确认

    不需要使用 while 循环来索引列表。只需直接遍历列表项并使用 str.format 将变量传递到 url。还要确保在使用\'s 时使用原始字符串r 作为文件路径,因为它们在python 中具有特殊含义,它们用于转义字符,因此\n\r 等在你的路径中会导致问题.您也可以使用/ 或使用\\ 转义。

    【讨论】:

    • 嗨,我收到一个错误 Traceback(最近一次调用最后一次):文件“C:/Users/Jay/Desktop/lolflavor scraper.py”,第 3 行,在 导入请求 ImportError:没有名为“请求”的模块
    • Pip 安装请求。您不会后悔安装请求。
    • 谢谢,我尝试将目录更改为 with open(r"C:\Users\Jay\Desktop\LolFlavor\{}\Recommended\{}_lane_scrape.json".format(champ), "w") as f: 但我得到错误元组索引超出范围
    • 你只有一个冠军
    • 谢谢你,我现在把它整理出来了,很抱歉让你烦恼,但你能帮忙解决一下 os.makedirs 吗?我必须制作文件夹。我试过这个: path=("C:\Users\Jay\Desktop\LolFlavor\{}\Recommended".format(champ)) os.makedirs(path) 我得到 Unicode Error Unicode Escape
    猜你喜欢
    • 1970-01-01
    • 2017-06-07
    • 2021-04-05
    • 1970-01-01
    • 2018-09-01
    • 2022-09-29
    • 1970-01-01
    • 2016-07-16
    • 2017-11-26
    相关资源
    最近更新 更多