python3+beautifulSoup4.6抓取某网站小说（四）多线程抓取

上一篇多文章，是二级目录，根目录“小说”，二级目录“作品名称”，之后就是小说文件。

本篇改造了部分代码，将目录设置为根目录->作者目录->作品目录->作品章节.txt.

但这并不是本章内容当重点，重点是使用这个爬虫程序抓取当时候，经常会因为网络丢包等原因导致程序中断，

本来想着是循环获取网站状态，然后重新发起请求，结果好像也没什么用。然后在虫师讲selenium的书中看到了多线程，正好就实验下，结果发现，速度很快，cool！

以下代码基本摘自虫师的selenium2

多线程的引用

import threading

方法调用：threading.Thread(target=music, args=('music方法参数1',music方法参数2) )

from time import sleep,ctime
import threading

def music(func,loop):
    for i in range(loop):
        print('music',func,ctime())
        sleep(2)

def movie(func,loop):
    for i in range(loop):
        print('movie',func,ctime())
        sleep(4)

def testOne():
    music('简单的歌', 2)
    movie('两杆大烟枪', 2)
    print('all end', ctime())

def testTwo():
    threads = []
    t1 = threading.Thread(target=music, args=('喜欢的人',2) )
    threads.append(t1)

    t2 = threading.Thread(target=movie, args=('搏击俱乐部',2) )
    threads.append(t2)

    t3= threading.Thread(target=music, args=('喜欢的人2', 2))
    threads.append(t3)

    for t in threads:
        t.start()

    for t in threads:
        t.join()

    print('all end', ctime())

if __name__ == '__main__':
    testOne()
    #testTwo()
    #testThree()
    #threadsRun()

t.join方法用来串联线程，可以保证all end 语句在最后打印出来。

创建线程管理类

创建类名时就引入Thread：class MyThread(threading.Thread)

class MyThread(threading.Thread):

    def __init__(self, func, args, name):
        threading.Thread.__init__(self)
        self.func = func
        self.args = args
        self.name = name

    def run(self):
        self.func(*self.args)

　self：类实例，默认参数

　func：调用方法名

args：参数

name：方法+".__name__"

完整代码：

 1 class MyThread(threading.Thread):
 2 
 3     def __init__(self, func, args, name):
 4         threading.Thread.__init__(self)
 5         self.func = func
 6         self.args = args
 7         self.name = name
 8 
 9     def run(self):
10         self.func(*self.args)
11 
12 def super_play(file_,time):
13     for i in range(3):
14         print('play', file_, ctime())
15         sleep(time)
16 
17 
18 def time(args):
19     pass
20 
21 
22 def testThree():
23     threads = []
24     lists = {'气球.mp3': 3, '电影.rmvb': 4, 'last.avg' : 2}
25     for file_, time_ in lists.items():
26         t = MyThread(super_play, (file_, time_), super_play.__name__)
27         threads.append(t)
28 
29     files = range(len(lists))
30 
31     for f in files:
32         threads[f].start()
33     for f in files:
34         threads[f].join()
35 
36     print('all end', ctime())

View Code