<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">整个爬虫十分的简单。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">但是我再写他的过程中,可能是由于我看基础的时候不太仔细,再raw_input()括号里面没有加入(u\'string\')...导致乱码。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">在看了一下午的python之后,终于开始写爬虫了。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">我这次写的爬虫很简单。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">下载百度贴吧指定页数的HTML。</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">废话不多说,让我们开始吧。主要的模块只有一个 urllib2</span>
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);"></span><pre name="code" class="python">import string,urllib2
def a(url,bgp,ep):
for i in range(bgp,ep):
sName = string.zfill(i,5)+\'.html\' #自动补全为五位0000X的html文件名
print(\'downloading the\'+str(i)+\'page\')
f = open(sName,\'w+\')
m = urllib2.urlopen(url+str(i)).read()
f.write(m)
f.close
burl = str(raw_input(u\'请输入百度贴吧地址,去掉页数\n\'))
bgp1 = int(raw_input(u\'请输入开始页数\'))
ep1 = int(raw_input(u\'请输入结束页数\'))
a(burl,bgp1,ep1)