2014-12-23
东方时尚约车还要网上选课,资源太紧张了,于是我决定自己写一个抢票程序来帮我刷票。
第一步,抓包。浏览器登陆选课系统,用抓包工具观察网络行为,这里我用的fildder。这里抓包主要需要获取两个信息,一是浏览器发送了哪几条url请求,二是获得http报文的头部以及post出去的data内容和格式。
第二步,模拟登陆。将抓取来的头部抄过来,让python模拟浏览器进行登陆,输入用户名和密码。这里登陆需要验证码,先来一个手工识别的,让python把获取到的.jpg文件保存到本地,然后暂停等待输入,人眼识别出验证码后输入给程序,让程序继续执行。一开始总是说验证码错误,最后发现时请求方式不对,这里要注意,每次获取url都要用同一个opener去获取,这样服务器才会认为是同一只浏览器。修改后,登录成功。
第三步,解决验证码。手工输入验证码总非长远之计,还是要让机器去做。这里用了PIL包和pytesser包,里面有实现好了的解析验证码的方法,直接拿来用就行了。由于这个包对验证码的识别率不是100%,所以我把登录的代码放进一个while循环里,直到顺利登录为止。
第四步,抢课。继续抓包,分析选课的ur请求过程,然后让程序模拟。比如在浏览器发现周五晚上有一节课可以选,然后让程序去抢周五晚上的课,程序返回结果显示ok,刷新浏览器,这节课确实选上了,说明程序大功告成了!
后续,读数据。东方时尚网站的制作者也不是完全吃素的。由于好的时间段通常都选不到课,我让我的程序做while循环,一直刷课直到选上为止。刷了几小时后,网页说我的操作次数过多,今天禁止我的访问。为了解决这个问题,我把刷课的频率改为10分钟一次。由于选课也需要验证码,而验证码识别率不高,这样如果有课的时候因为验证码错误却要等待10分钟岂不是浪费机会了,所以我又将程序改为如果没课,就等待10分钟,如果有课就一直刷。这样就又需要提取数据,分析网页结构,发现有一个单独的url用来存储数据,剩余课时在其中的一个json格式的字符串里。先用正则匹配提取出这个串,然后解析这个json数据就得到需要的数据啦!
最后,贴上我的代码:
1 import re 2 import json 3 import time 4 import urllib 5 import urllib2 6 import urlparse 7 import cookielib 8 from PIL import Image, ImageDraw, ImageFont, ImageFilter 9 from pytesser import * 10 from datetime import date 11 import os 12 13 os.chdir(\'C://Python27/Lib/site-packages/pytesser\') 14 15 def getVerify(name): 16 #data = urllib2.urlopen( 17 im = Image.open(name) 18 imgry = im.convert(\'L\') 19 text = image_to_string(imgry) 20 text = re.sub(\'\W\',\'\',text) 21 return text 22 23 def urlToString(url): 24 data = urllib2.urlopen(url).read() 25 f = open(\'buffer/temp.jpg\', \'wb\') 26 f.write(data) 27 f.close() 28 return getVerify(\'buffer/temp.jpg\') 29 30 def openerUrlToString(opener, url): 31 data = opener.open(url).read() 32 f = open(\'buffer/temp.jpg\', \'wb\') 33 f.write(data) 34 f.close() 35 return getVerify(\'buffer/temp.jpg\') 36 37 def getOpener(head): 38 # deal with the Cookies 39 cj = cookielib.CookieJar() 40 pro = urllib2.HTTPCookieProcessor(cj) 41 opener = urllib2.build_opener(pro) 42 header = [] 43 for key, value in head.items(): 44 elem = (key, value) 45 header.append(elem) 46 opener.addheaders = header 47 return opener 48 49 def decodeAnyType(data): 50 ret = data 51 try: 52 temp = data.decode(\'utf-8\') 53 ret = temp 54 except: 55 pass 56 try: 57 temp = data.decode(\'gbk\') 58 ret = temp 59 except: 60 pass 61 try: 62 temp = data.decode(\'gb2312\') 63 ret = temp 64 except: 65 pass 66 return ret 67 68 header = { 69 \'Connection\': \'Keep-Alive\', 70 \'Accept\': \'text/html, application/xhtml+xml, */*\', 71 \'Accept-Language\': \'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3\', 72 \'User-Agent\': \'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko\', 73 \'Accept-Encoding\': \'gzip, deflate\', 74 \'Host\': \'wsyc.dfss.com.cn\', 75 \'DNT\': \'1\' 76 } 77 78 ## the data below are settled by customer to select the class needed 79 start = 13 80 end = 17 81 numid = \'3\' 82 year = 2014 83 month = 12 84 day = 22 85 username = \'myname\' 86 password = \'mypasswd\' 87 88 opener = getOpener(header) 89 url1 = \'http://wsyc.dfss.com.cn/\' 90 url2 = \'http://wsyc.dfss.com.cn/DfssAjax.aspx\' 91 url3 = \'http://wsyc.dfss.com.cn/validpng.aspx?aa=3&page=lg\' 92 url4 = \'http://wsyc.dfss.com.cn/pc-client/jbxx.aspx\' 93 url5 = \'http://wsyc.dfss.com.cn/validpng.aspx\' 94 95 ## try to login until the validcode is right 96 count = 0 97 while True: 98 print \'------------------------\' 99 print \'have tryed to login %d times, now try again!\' % (count) 100 count = count + 1 101 validcode = openerUrlToString(opener, url3) 102 print \'the validcode is \' + validcode 103 postDict = { 104 \'AjaxMethod\': \'LOGIN\', 105 \'Account\': username, 106 \'ValidCode\': validcode, 107 \'Pwd\': password 108 } 109 110 postData = urllib.urlencode(postDict).encode() 111 op = opener.open(url2, postData) 112 result = op.read().decode(\'utf-8\') 113 print \'the result of login is \' + result 114 #if result.find(\'true\') >= 0: 115 if result == \'true\': 116 print \'login success!\' 117 break 118 else: 119 continue 120 121 122 yuechedate = date(year, month, day) 123 today = date.today() 124 intervaldays = (yuechedate - today).days 125 print intervaldays 126 if intervaldays < 2: 127 exit() 128 validcode = \'\' 129 count = 0 130 ## try to select a class until success 131 while True: 132 print \'--------------------------\' 133 print \'have tryed to select %d times, now try again!\' % (count) 134 count = count + 1 135 try: 136 validcode = openerUrlToString(opener, url5) 137 except: 138 continue 139 url7 = \'http://wsyc.dfss.com.cn/Ajax/StuHdl.ashx?loginType=2&method=stu\'\ 140 + \'&stuid=%s&sfznum=&carid=&ValidCode=%s\' % (username, validcode) 141 data = opener.open(url7).read().decode(\'utf-8\') 142 strs = re.search(\'\[\{\"fchrdate.*?\}\]\', data) 143 #print data 144 print strs 145 if strs is None: 146 continue 147 jsontext = json.loads(strs.group()) 148 num = jsontext[intervaldays][numid].split(\'/\')[1] 149 print \'remain num is \' + num 150 if num == \'0\': 151 print \'no class avaliable!\' 152 time.sleep(600) 153 continue 154 try: 155 validcode = openerUrlToString(opener, url5) 156 except: 157 continue 158 url6 = \'http://wsyc.dfss.com.cn/Ajax/StuHdl.ashx?loginType=2&method=yueche\'\ 159 + \'&stuid=%s&bmnum=BD14101500687&start=%d&end=%d\' % (username, start, end)\ 160 + \'&lessionid=001&trainpriceid=BD13040300001&lesstypeid=02\'\ 161 + \'&date=%d-%d-%d\' % (year, month, day)\ 162 + \'&id=1&carid=&ycmethod=03&cartypeid=01&trainsessionid=0\' + numid\ 163 + \'&ReleaseCarID=&ValidCode=\' + validcode 164 result = opener.open(url6).read().decode(\'utf-8\') 165 print \'result of select is \' + result 166 if result == \'success\': 167 print \'select success!\' 168 break 169 else: 170 continue