最近有时间,找了一些比较麻烦的网站来练手,然后想起来 以前说要弄商标网的,今天就又上去看了下!
以前转载的链接 :商标局网请收下我的膝盖
上去查看了下,感觉怎么参数这么明显了!!!??? 应该是取消了很多爬虫限制!
然后模拟请求的试了下,请求成功,成功获取到数值!
使用的接口是:
http://sbgg.saic.gov.cn:9080/tmann/annInfoView/selectInfoidBycode.html
http://sbgg.saic.gov.cn:9080/tmann/annInfoView/imageView.html http://sbgg.saic.gov.cn:9080/tmann/annInfoView/annSearchDG.html
组合起来 能根据不同的 条件进行查询,并下载最终的图片,有一点需要注意的是 返回的是图片链接列表 ,我们需要的是 下标为3的那个
简单代码如下(仅做学习参考):
import requests, re, json, time, random
with open("搜索结果1.json", "r", encoding="utf-8") as f:
data = f.read()
def run(ann_num, page_no, ann_type_code):
url = "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/selectInfoidBycode.html"
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
"Connection": "keep-alive",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie": "",# cookie
"Host": "sbgg.saic.gov.cn:9080",
"Origin": "http://sbgg.saic.gov.cn:9080",
"Referer": "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/annSearch.html?annNum=",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}
data = {
"annNum": ann_num,
"annTypecode": ann_type_code,
}
response = requests.post(url=url, headers=headers, data=data, timeout=15)
id = response.text
print(id)
URL2 = "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/imageView.html"
data2 = {
"id": id,
"pageNum": page_no,
"flag": "1",
}
response2 = requests.post(url=URL2, headers=headers, data=data2, timeout=15)
data = response2.text
data = eval(data)
image = data["imaglist"][3]
print(image)
if __name__ == '__main__':
"""代码仅做学习参考"""
data_dict = eval(data)
total = data_dict["total"] # 商标总数
rows = data_dict["rows"] # 商标总数
print(total)
for i in rows:
page_no = i["page_no"] # 页数编号
tm_name = i["tm_name"] # 商标名称
ann_type_code = i["ann_type_code"] # 请求参数
tmname = i["tmname"] # 商标名称
reg_name = i["reg_name"] # 公司名称
ann_type = i["ann_type"] # 公告还是省定
ann_num = i["ann_num"] # 公告期数
reg_num = i["reg_num"] # 商标id
id = i["id"] # 请求id
rn = i["rn"] # 位置
app_date = i["ann_date"] # 申请日期
regname = i["regname"] # # 申请人名称???
if ann_type == "商标初步审定公告":
run(ann_num, page_no, ann_type_code)
time.sleep(5)