学习爬虫（一）

学习爬虫：

1、安装Python（anaconda)

2、安装库

pip install requests

3、简单访问网页

import requests #导入requests库
r = requests.get(\'http://www.lining0806.com\') #像目标url地址发送get请求，返回一个response对象
print(r.text) #r.text是http response的网页HTML

4、获取文章标题

 1 # -*- coding: utf-8 -*-
 2 """
 3 Spyder Editor
 4 
 5 This is a temporary script file.
 6 """
 7 
 8 import requests
 9 from bs4 import BeautifulSoup
10 
11 url = \'http://www.lining0806.com\'
12 headers = {\'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36\'} 
13 r = requests.get(url, headers=headers)
14 all_title = BeautifulSoup(r.text, \'lxml\').find(\'div\', class_=\'content\').find_all(\'a\',attrs={"target": "_blank"})
15 Alltitle = []
16 for title in all_title:
17     title_temp = title.get(\'title\')
18     print(title_temp)
19     if (title_temp ==None):
20         continue
21     else:
22         Alltitle.append(title_temp)
23 print (Alltitle)

目前只有找到使用循环获取a标签下的title内容。以后有更好的方法时再更新