爬取天猫店铺列表页的所有数据

首页我们有一个起始url:https://goodbaby.tmall.com/shop/view_shop.htm?spm=a230r.7195193.1997079397.2.3RayhH

我们要采取的是它里面所有宝贝,按销量排序,如图:

爬取天猫店铺列表页的所有数据

点击进去,我们可以看到列表页的链接:

爬取天猫店铺列表页的所有数据

我们查看源代码,可以发现淘宝的商品数据藏在js里面的:

爬取天猫店铺列表页的所有数据

我们找到他的接口 ,直接发起请求,从Headers直接找到他的url,然后对它发起请求,把里面的p改一下,p代表的是当前页数,有多少页,就给他个遍历.

爬取天猫店铺列表页的所有数据

最后把爬取的的数据存到excel里面,就ok了,最后附上代码:

import requests
import json,re
import xlsxwriter
import pymysql
workbook=xlsxwriter.Workbook("e:\\data.xlsx")
worksheet=workbook.add_worksheet()
worksheet.write('A1','item_id')
worksheet.write('B1','title')
worksheet.write('C1','img')
worksheet.write('D1','sold')
worksheet.write('E1','quantity')
worksheet.write('F1','totalSoldQuantity')
worksheet.write('G1','url')
worksheet.write('H1','price')
i=1

def createExcle(item_id, title, img, sold, quantity, totalSoldQuantity, url, price, i):
    worksheet.write('A%s' % i, item_id)
    worksheet.write('B%s' % i, title)
    worksheet.write('C%s' % i, img)
    worksheet.write('D%s' % i, sold)
    worksheet.write('E%s' % i, quantity)
    worksheet.write('F%s' % i, totalSoldQuantity)
    worksheet.write('G%s' % i, url)
    worksheet.write('H%s' % i, price)


for x in range(1,36):
    url='https://goodbaby.m.tmall.com/shop/shop_auction_search.do?spm=a1z60.7754813.0.0.301755f0pZ1GjU&suid=379833581&sort=s&p='+str(x)+'&page_size=12&from=h5&shop_id=60650834&ajson=1&_tm_source=tmallsearch'
    headers = {
                'User-Agent': r'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36',
                'Referer': r'https://goodbaby.m.tmall.com/shop/shop_auction_search.htm?spm=a1z60.7754813.0.0.301755f0pZ1GjU&suid=379833581&sort=default',
                # 'Connection': r'keep-alive',
                }
    file=requests.get(url,headers=headers).text
    file1=json.loads(file)
    #print(file1)
    items=(file1.get('items'))
    for a in items:
        print(a)
        item_id=a.get('item_id')
        title=a.get('title')
        img=a.get('img')
        sold=a.get('sold')
        quantity=a.get('quantity')
        totalSoldQuantity=a.get('totalSoldQuantity')
        url=a.get('url')
        price=a.get('price')

        i+=1

        createExcle(item_id,title,img,sold,quantity,totalSoldQuantity,url,price,i)
workbook.close()

这是最后爬取的效果: