模块包
requests
bs4
urllib
pyecharts
效果图
数据获取
1) 打开王者荣耀官网网站(https://pvp.qq.com/),最顶部-游戏资料-英雄资料。
2) 谷歌浏览器,F12查看页面数据。不难发现,获取英雄列表数据接口。得到的是个json数据格式文件。
3)快速查看json框架接口
可以参考(https://www.jianshu.com/p/219b5755bafc)
如上图所示,很快得到清晰的英雄数据。
4) 英雄皮肤
选择其中一个英雄并点击英雄头像跳转至详情界面,F12查看皮肤数据。以云中君为例。
皮肤个数:
皮肤地址:
代码实现
- 抓取网站英雄数据
def get_hero(self): request = requests.get(self.hero_url) hero_list = request.json() return hero_list
- 解析英雄列表和皮肤数据
def hero_skin(self, hero_list): num = 0 for hero in hero_list: num += 1 hero_no = str(hero[\'ename\']) self.detail_url = hero_no + \'.shtml\' hero_name = hero[\'cname\'] self.get_hero_skin(str(num) + hero_name, hero_no) def get_skin_html(self): url = parse.urljoin(self.base_url, self.detail_url) request = requests.get(url) request.encoding = \'gbk\' html = request.text soup = BeautifulSoup(html, \'lxml\') skip_list = soup.select(\'.pic-pf-list3\') return skip_list def get_hero_skin(self, hero_name, hero_no): skip_list = self.get_skin_html() for skin_info in skip_list: img_names = skin_info.attrs[\'data-imgname\'] name_list = img_names.split(\'|\') skin_no = 1 for skin_name in name_list: self.skin_detail_url = \'%s/%s-bigskin-%s.jpg\' % (hero_no, hero_no, skin_no) skin_no += 1 img_name = hero_name + \'-\' + skin_name.split(\'&\')[0] + str(skin_no-1) + \'.jpg\' self.download_skin(img_name)
由于篇幅原因,这里只展示部分代码。
皮肤下载,可视化
- 下载
def download_skin(self, img_name): img_url = parse.urljoin(self.skin_url, self.skin_detail_url) # 防止http请求太快,导致异常 time.sleep(0.5) request = requests.get(img_url) if request.status_code == 200: print(\'download-%s\' % img_name) img_path = os.path.join(self.img_folder, img_name) with open(img_path, \'wb\') as img: img.write(request.content) else: print(\'img error!\')
- 可视化
def m_cloud(self): cloud = ( WordCloud(init_opts=opts.InitOpts(theme=\'essos\')) .add("英雄皮肤个数", self.json_data) .set_global_opts(title_opts=opts.TitleOpts(title="英雄皮肤个数分布")) ) cloud.render("王者荣耀英雄皮肤个数.html") def m_pie(self): p = ( Pie(init_opts=opts.InitOpts(theme=\'essos\')) .add("英雄定位个数", self.json_data) .set_colors(["blue", "green", "yellow", "red", "pink", "orange", "purple"]) .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}")) .set_global_opts(title_opts=opts.TitleOpts(title="英雄定位个数分布")) ) p.render("英雄定位个数分布.html")
更多请访问:
https://mp.weixin.qq.com/s?__biz=Mzg3OTExODI3OA==&mid=2247483878&idx=1&sn=82e7f5ec5ec7ddd7fbecd33c2f3abf7c&chksm=cf08134ff87f9a5932b72d18997fb3568dd98b6c8fef277ba792fcd789a00a723607959deb9b&token=1106491029&lang=zh_CN#rd