cedrelaliu

Purpose

最近因为要买房子,扫过了各种信息,貌似lianjia上的数据还是靠点谱的(最起码房源图片没有太大的出入),心血来潮想着做几个图表来显示下房屋的数据信息,顺便练练手。

需求分析

1从lianjia的网站上获取关键的房屋信息数据,然后按照自己的需求通过图表显示出来。

2每天从lianjia的网站上获取一次数据

3以上海地区为主(本人在上海)

4最终生成图表有:房屋交易总量,二手房均价,在售房源,近90天成交量,昨日带看次数

分析获取网站数据

1 数据源

数据的获取主要是从两个地方:

  http://sh.lianjia.com/chengjiao/   //成交量数据统计获取

  页面上的数据(下面显示的是没登录前的量,貌似登录之后会比这个量要多一点):

   http://sh.lianjia.com/ershoufang/  //二手房相关数据获取

  页面上数据:

2 获取方法获取网页数据的话,首先想到的是scrapy,不过考虑到获取的数据不是很多很复杂,这里只用urllib.request来获取就可以了。后面因为使用到tornado的异步,所以会替换成httpclient.AsyncHTTPClient().fetch()。

 

3 使用urllib.request来获取相关数据。

首先,从网页上爬数据,使用obtain_page_data基础的函数:

 1 def obtain_page_data(target_url):
 2     with urllib.request.urlopen(target_url) as f:
 3         data = f.read().decode(\'utf8\')
 4     return data
obtain_page_data()函数的话,主要是访问给定页面,然后返回页面的数据

然后,获取了数据之后,要按照需求来获取网页上的数据,主要是两大块:

1)房屋总成交量(http://sh.lianjia.com/chengjiao/)

定义函数get_total_dealed_house(),函数最终是返回页面上的总成交量,那么在调用obtain_page_data()获取页面的data后,分析下这个数据是在哪个位置。

那么看到数据一个div下,那么使用BeautifulSoup解析一下获取的html数据后,通过下面的命令来获取text数据:

dealed_house = soup_obj.html.body.find(\'div\', {\'class\': \'list-head\'}).text

找到了text内容之后通过正则表达式过滤掉非数字的字符,然后就获取到了这个数据,具体如下:

1 def get_total_dealed_house(target_url):
2     # 获取总的房屋成交量
3     page_data = obtain_page_data(target_url)
4     soup_obj = BeautifulSoup(page_data,"html.parser")
5     dealed_house = soup_obj.html.body.find(\'div\', {\'class\': \'list-head\'}).text
6     dealed_house_num = re.findall(r\'\d+\', dealed_house)[0]
7 
8     return int(dealed_house_num)

2)获取其他在线数据(http://sh.lianjia.com/ershoufang/)

类似的,要先分析自己要的数据在网页中的哪个位置,然后去获取,过滤,具体如下:

1 def get_online_data(target_url):
2     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
3     page_data = obtain_page_data(target_url)
4     soup_obj = BeautifulSoup(page_data, "html.parser")
5     online_data_str = soup_obj.html.body.find(\'div\', {\'class\': \'secondcon\'}).text
6     online_data = online_data_str.replace(\'\n\', \'\')
7     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\'\d+\', online_data)
8 
9     return {\'avg_price\':avg_price,\'on_sale\':on_sale,\'sold_in_90\':sold_in_90,\'yesterday_check_num\':yesterday_check_num}

3)数据整合/细分各区

使用shanghai_data_process()函数来整合一下1,2中获取的数据,另外lianjia网页上上海区域的数据其实是可以按照各个区来查询的,那么这里也做一下处理,如下:

 1 def shanghai_data_process():
 2     \'\'\'
 3     获取上海各个区的数据
 4     :return:
 5     \'\'\'
 6     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
 7     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
 8     sh_area_dict = {
 9         "all":"",
10         "pudongxinqu": "pudongxinqu/",
11         "minhang": "minhang/",
12         "baoshan": "baoshan/",
13         "xuhui": "xuhui/",
14         "putuo": "putuo/",
15         "yangpu": "yangpu/",
16         "changning": "changning/",
17         "songjiang": "songjiang/",
18         "jiading": "jiading/",
19         "huangpu": "huangpu/",
20         "jingan": "jingan/",
21         "zhabei": "zhabei/",
22         "hongkou": "hongkou/",
23         "qingpu": "qingpu/",
24         "fengxian": "fengxian/",
25         "jinshan": "jinshan/",
26         "chongming": "chongming/",
27         "shanghaizhoubian": "shanghaizhoubian/",
28     }
29     dealed_house_num = get_total_dealed_house(chenjiao_page)
30     sh_online_data = {}
31     for key,value in sh_area_dict.items():
32         sh_online_data[key] = get_online_data(ershoufang_page+sh_area_dict[key])
33     print("dealed_house_num %s" %dealed_house_num)
34     for key,value in sh_online_data.items():
35         print(key,value)

4)整体代码以及输出效果

 1 import urllib.request
 2 import re
 3 from bs4 import BeautifulSoup
 4 import time
 5 
 6 def obtain_page_data(target_url):
 7     with urllib.request.urlopen(target_url) as f:
 8         data = f.read().decode(\'utf8\')
 9     return data
10 
11 def get_total_dealed_house(target_url):
12     # 获取总的房屋成交量
13     page_data = obtain_page_data(target_url)
14     soup_obj = BeautifulSoup(page_data,"html.parser")
15     dealed_house = soup_obj.html.body.find(\'div\', {\'class\': \'list-head\'}).text
16     dealed_house_num = re.findall(r\'\d+\', dealed_house)[0]
17 
18     return int(dealed_house_num)
19 
20 def get_online_data(target_url):
21     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
22     page_data = obtain_page_data(target_url)
23     soup_obj = BeautifulSoup(page_data, "html.parser")
24     online_data_str = soup_obj.html.body.find(\'div\', {\'class\': \'secondcon\'}).text
25     online_data = online_data_str.replace(\'\n\', \'\')
26     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\'\d+\', online_data)
27 
28     return {\'avg_price\':avg_price,\'on_sale\':on_sale,\'sold_in_90\':sold_in_90,\'yesterday_check_num\':yesterday_check_num}
29 
30 def shanghai_data_process():
31     \'\'\'
32     获取上海各个区的数据
33     :return:
34     \'\'\'
35     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
36     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
37     sh_area_dict = {
38         "all":"",
39         "pudongxinqu": "pudongxinqu/",
40         "minhang": "minhang/",
41         "baoshan": "baoshan/",
42         "xuhui": "xuhui/",
43         "putuo": "putuo/",
44         "yangpu": "yangpu/",
45         "changning": "changning/",
46         "songjiang": "songjiang/",
47         "jiading": "jiading/",
48         "huangpu": "huangpu/",
49         "jingan": "jingan/",
50         "zhabei": "zhabei/",
51         "hongkou": "hongkou/",
52         "qingpu": "qingpu/",
53         "fengxian": "fengxian/",
54         "jinshan": "jinshan/",
55         "chongming": "chongming/",
56         "shanghaizhoubian": "shanghaizhoubian/",
57     }
58     dealed_house_num = get_total_dealed_house(chenjiao_page)
59     sh_online_data = {}
60     for key,value in sh_area_dict.items():
61         sh_online_data[key] = get_online_data(ershoufang_page+sh_area_dict[key])
62     print("dealed_house_num %s" %dealed_house_num)
63     for key,value in sh_online_data.items():
64         print(key,value)
65 
66 def main():
67     start_time = time.time()
68     shanghai_data_process()
69     print("time cost: %s" % (time.time() - start_time))
70 
71 
72 if __name__==\'__main__\':
73     main()
初版源码collect_data.py

Result:

 1 dealed_house_num 51691
 2 zhabei {\'yesterday_check_num\': \'1050\', \'sold_in_90\': \'533\', \'avg_price\': \'67179\', \'on_sale\': \'1674\'}
 3 changning {\'yesterday_check_num\': \'1861\', \'sold_in_90\': \'768\', \'avg_price\': \'77977\', \'on_sale\': \'2473\'}
 4 baoshan {\'yesterday_check_num\': \'2232\', \'sold_in_90\': \'1410\', \'avg_price\': \'48622\', \'on_sale\': \'4655\'}
 5 putuo {\'yesterday_check_num\': \'1695\', \'sold_in_90\': \'910\', \'avg_price\': \'64942\', \'on_sale\': \'3051\'}
 6 qingpu {\'yesterday_check_num\': \'463\', \'sold_in_90\': \'253\', \'avg_price\': \'40801\', \'on_sale\': \'1382\'}
 7 jinshan {\'yesterday_check_num\': \'0\', \'sold_in_90\': \'8\', \'avg_price\': \'20370\', \'on_sale\': \'11\'}
 8 chongming {\'yesterday_check_num\': \'0\', \'sold_in_90\': \'3\', \'avg_price\': \'26755\', \'on_sale\': \'9\'}
 9 all {\'yesterday_check_num\': \'28682\', \'sold_in_90\': \'14550\', \'avg_price\': \'59987\', \'on_sale\': \'49396\'}
10 jingan {\'yesterday_check_num\': \'643\', \'sold_in_90\': \'277\', \'avg_price\': \'91689\', \'on_sale\': \'896\'}
11 xuhui {\'yesterday_check_num\': \'2526\', \'sold_in_90\': \'878\', \'avg_price\': \'80623\', \'on_sale\': \'3254\'}
12 songjiang {\'yesterday_check_num\': \'1571\', \'sold_in_90\': \'930\', \'avg_price\': \'44367\', \'on_sale\': \'3294\'}
13 yangpu {\'yesterday_check_num\': \'2774\', \'sold_in_90\': \'981\', \'avg_price\': \'67976\', \'on_sale\': \'2886\'}
14 pudongxinqu {\'yesterday_check_num\': \'7293\', \'sold_in_90\': \'3417\', \'avg_price\': \'62101\', \'on_sale\': \'12767\'}
15 shanghaizhoubian {\'yesterday_check_num\': \'0\', \'sold_in_90\': \'2\', \'avg_price\': \'24909\', \'on_sale\': \'15\'}
16 minhang {\'yesterday_check_num\': \'3271\', \'sold_in_90\': \'1989\', \'avg_price\': \'54968\', \'on_sale\': \'5862\'}
17 hongkou {\'yesterday_check_num\': \'936\', \'sold_in_90\': \'444\', \'avg_price\': \'71654\', \'on_sale\': \'1605\'}
18 fengxian {\'yesterday_check_num\': \'346\', \'sold_in_90\': \'557\', \'avg_price\': \'30423\', \'on_sale\': \'1279\'}
19 jiading {\'yesterday_check_num\': \'875\', \'sold_in_90\': \'767\', \'avg_price\': \'41609\', \'on_sale\': \'2846\'}
20 huangpu {\'yesterday_check_num\': \'1146\', \'sold_in_90\': \'423\', \'avg_price\': \'93880\', \'on_sale\': \'1437\'}
21 time cost: 12.94211196899414
Result

 

移植到tornado上

1 为什么要使用tornado

tornado是一个小巧的异步的python框架,这里使用到它是因为在发送request获取网页数据(IO密集)其实可以使用异步来提高效率,特别是在后期访问量大的时候,使用tornado会提高效率。

2 移植上面初步获取数据功能到tornado上

这里的关键点有这么几个:

1)异步获取网页数据

    使用httpclient.AsyncHTTPClient().fetch()来获取页面数据,配合使用gen.coroutine+yield来实现异步。

2)返回数据的时候要使用raise gen.Return(data)

3)初步改造后的版本以及运行结果如下:

 1 import re
 2 from bs4 import BeautifulSoup
 3 import time
 4 from tornado import httpclient,gen,ioloop
 5 
 6 @gen.coroutine
 7 def obtain_page_data(target_url):
 8     response = yield httpclient.AsyncHTTPClient().fetch(target_url)
 9     data = response.body.decode(\'utf8\')
10     print("start %s %s" %(target_url,time.time()))
11 
12     raise gen.Return(data)
13 
14 @gen.coroutine
15 def get_total_dealed_house(target_url):
16     # 获取总的房屋成交量
17     page_data = yield obtain_page_data(target_url)
18     soup_obj = BeautifulSoup(page_data,"html.parser")
19     dealed_house = soup_obj.html.body.find(\'div\', {\'class\': \'list-head\'}).text
20     dealed_house_num = re.findall(r\'\d+\', dealed_house)[0]
21 
22     raise gen.Return(int(dealed_house_num))
23 
24 @gen.coroutine
25 def get_online_data(target_url):
26     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
27     page_data = yield obtain_page_data(target_url)
28     soup_obj = BeautifulSoup(page_data, "html.parser")
29     online_data_str = soup_obj.html.body.find(\'div\', {\'class\': \'secondcon\'}).text
30     online_data = online_data_str.replace(\'\n\', \'\')
31     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\'\d+\', online_data)
32 
33     raise gen.Return({\'avg_price\':avg_price,\'on_sale\':on_sale,\'sold_in_90\':sold_in_90,\'yesterday_check_num\':yesterday_check_num})
34 
35 @gen.coroutine
36 def shanghai_data_process():
37     \'\'\'
38     获取上海各个区的数据
39     :return:
40     \'\'\'
41     start_time = time.time()
42     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
43     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
44     dealed_house_num = yield get_total_dealed_house(chenjiao_page)
45     sh_area_dict = {
46         "all": "",
47         "pudongxinqu": "pudongxinqu/",
48         "minhang": "minhang/",
49         "baoshan": "baoshan/",
50         "xuhui": "xuhui/",
51         "putuo": "putuo/",
52         "yangpu": "yangpu/",
53         "changning": "changning/",
54         "songjiang": "songjiang/",
55         "jiading": "jiading/",
56         "huangpu": "huangpu/",
57         "jingan": "jingan/",
58         "zhabei": "zhabei/",
59         "hongkou": "hongkou/",
60         "qingpu": "qingpu/",
61         "fengxian": "fengxian/",
62         "jinshan": "jinshan/",
63         "chongming": "chongming/",
64         "shanghaizhoubian": "shanghaizhoubian/",
65     }
66     sh_online_data = {}
67     for key,value in sh_area_dict.items():
68         sh_online_data[key] = yield get_online_data(ershoufang_page+sh_area_dict[key])
69     print("dealed_house_num %s" %dealed_house_num)
70     for key,value in sh_online_data.items():
71         print(key,value)
72 
73     print("tornado time cost: %s" %(time.time()-start_time) )
74 
75 
76 if __name__==\'__main__\':
77     io_loop = ioloop.IOLoop.current()
78     io_loop.run_sync(shanghai_data_process)
tornado初版
 1 start http://sh.lianjia.com/chengjiao/ 1480320585.879013
 2 start http://sh.lianjia.com/ershoufang/jinshan/ 1480320586.575354
 3 start http://sh.lianjia.com/ershoufang/chongming/ 1480320587.017322
 4 start http://sh.lianjia.com/ershoufang/yangpu/ 1480320587.515317
 5 start http://sh.lianjia.com/ershoufang/hongkou/ 1480320588.051793
 6 start http://sh.lianjia.com/ershoufang/fengxian/ 1480320588.593865
 7 start http://sh.lianjia.com/ershoufang/jiading/ 1480320589.134367
 8 start http://sh.lianjia.com/ershoufang/qingpu/ 1480320589.6134
 9 start http://sh.lianjia.com/ershoufang/pudongxinqu/ 1480320590.215136
10 start http://sh.lianjia.com/ershoufang/putuo/ 1480320590.696576
11 start http://sh.lianjia.com/ershoufang/zhabei/ 1480320591.34218
12 start http://sh.lianjia.com/ershoufang/changning/ 1480320591.935762
13 start http://sh.lianjia.com/ershoufang/xuhui/ 1480320592.5159
14 start http://sh.lianjia.com/ershoufang/minhang/ 1480320593.096085
15 start http://sh.lianjia.com/ershoufang/songjiang/ 1480320593.749226
16 start http://sh.lianjia.com/ershoufang/ 1480320594.306287
17 start http://sh.lianjia.com/ershoufang/shanghaizhoubian/ 1480320594.807418
18 start http://sh.lianjia.com/ershoufang/huangpu/ 1480320595.2744
19 start http://sh.lianjia.com/ershoufang/jingan/ 1480320595.850909
20 start http://sh.lianjia.com/ershoufang/baoshan/ 1480320596.368479
21 dealed_house_num 51691
22 jinshan {\'yesterday_check_num\': \'0\', \'on_sale\': \'11\', \'avg_price\': \'20370\', \'sold_in_90\': \'8\'}
23 yangpu {\'yesterday_check_num\': \'2774\', \'on_sale\': \'2886\', \'avg_price\': \'67976\', \'sold_in_90\': \'981\'}
24 hongkou {\'yesterday_check_num\': \'936\', \'on_sale\': \'1605\', \'avg_price\': \'71654\', \'sold_in_90\': \'444\'}
25 fengxian {\'yesterday_check_num\': \'346\', \'on_sale\': \'1279\', \'avg_price\': \'30423\', \'sold_in_90\': \'557\'}
26 chongming {\'yesterday_check_num\': \'0\', \'on_sale\': \'9\', \'avg_price\': \'26755\', \'sold_in_90\': \'3\'}
27 pudongxinqu {\'yesterday_check_num\': \'7293\', \'on_sale\': \'12767\', \'avg_price\': \'62101\', \'sold_in_90\': \'3417\'}
28 putuo {\'yesterday_check_num\': \'1695\', \'on_sale\': \'3051\', \'avg_price\': \'64942\', \'sold_in_90\': \'910\'}
29 zhabei {\'yesterday_check_num\': \'1050\', \'on_sale\': \'1674\', \'avg_price\': \'67179\', \'sold_in_90\': \'533\'}
30 changning {\'yesterday_check_num\': \'1861\', \'on_sale\': \'2473\', \'avg_price\': \'77977\', \'sold_in_90\': \'768\'}
31 baoshan {\'yesterday_check_num\': \'2232\', \'on_sale\': \'4655\', \'avg_price\': \'48622\', \'sold_in_90\': \'1410\'}
32 xuhui {\'yesterday_check_num\': \'2526\', \'on_sale\': \'3254\', \'avg_price\': \'80623\', \'sold_in_90\': \'878\'}
33 minhang {\'yesterday_check_num\': \'3271\', \'on_sale\': \'5862\', \'avg_price\': \'54968\', \'sold_in_90\': \'1989\'}
34 songjiang {\'yesterday_check_num\': \'1571\', \'on_sale\': \'3294\', \'avg_price\': \'44367\', \'sold_in_90\': \'930\'}
35 all {\'yesterday_check_num\': \'28682\', \'on_sale\': \'49396\', \'avg_price\': \'59987\', \'sold_in_90\': \'14550\'}
36 shanghaizhoubian {\'yesterday_check_num\': \'0\', \'on_sale\': \'15\', \'avg_price\': \'24909\', \'sold_in_90\': \'2\'}
37 jingan {\'yesterday_check_num\': \'643\', \'on_sale\': \'896\', \'avg_price\': \'91689\', \'sold_in_90\': \'277\'}
38 jiading {\'yesterday_check_num\': \'875\', \'on_sale\': \'2846\', \'avg_price\': \'41609\', \'sold_in_90\': \'767\'}
39 qingpu {\'yesterday_check_num\': \'463\', \'on_sale\': \'1382\', \'avg_price\': \'40801\', \'sold_in_90\': \'253\'}
40 huangpu {\'yesterday_check_num\': \'1146\', \'on_sale\': \'1437\', \'avg_price\': \'93880\', \'sold_in_90\': \'423\'}
41 tornado time cost: 10.953541040420532
初版运行结果

 

存储数据到数据库中

这里我使用的是mysql数据库,那么在tornado中可以使用pymysql来连接数据库,并且我这里使用了sqlalchemy来完成程序中的DML。

sqlalchemy部分的内容详见这里

1)表结构

这里需要的表不是很多,如下:

sh_area   //上海区域表,存放上海各个区域

sh_total_city_dealed  //上海地区二手房总成交量

online_data  //上海各区二手房数据

2) 使用sqlalchemy来初始化表

settings中设置的是数据库连接相关内容。

 1 from sqlalchemy import create_engine
 2 from sqlalchemy.orm import sessionmaker
 3 DB={
 4     \'connector\':\'mysql+pymysql://root:xxxxx@127.0.0.1:3306/devdb1\',
 5     \'max_session\':5
 6 }
 7 
 8 engine = create_engine(DB[\'connector\'], max_overflow= DB[\'max_session\'], echo= False)
 9 SessionCls = sessionmaker(bind=engine)
10 session = SessionCls()
settings.py

初始化脚本

 1 from sqlalchemy.ext.declarative import declarative_base
 2 from sqlalchemy import Column,Integer,String,ForeignKey,DateTime
 3 
 4 import os,sys
 5 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 6 sys.path.append(BASE_DIR)
 7 
 8 from conf import settings
 9 
10 Base = declarative_base()
11 
12 class SH_Area(Base):
13     __tablename__ = \'sh_area\'  # 表名
14     id = Column(Integer, primary_key=True)
15     name = Column(String(64))
16 
17 class Online_Data(Base):
18     __tablename__ = \'online_data\'  # 表名
19     id = Column(Integer, primary_key=True)
20     sold_in_90 = Column(Integer)
21     avg_price = Column(Integer)
22     yesterday_check_num = Column(Integer)
23     on_sale = Column(Integer)
24     date = Column(DateTime)
25     belong_area = Column(Integer,ForeignKey(\'sh_area.id\'))
26 
27 class SH_Total_city_dealed(Base):
28     __tablename__ = \'sh_total_city_dealed\'  # 表名
29     id = Column(Integer, primary_key=True)
30     dealed_house_num = Column(Integer)
31     date = Column(DateTime)
32     memo = Column(String(64),nullable=True)
33 
34 def db_init():
35     Base.metadata.create_all(settings.engine)  # 创建表结构
36     for district in settings.sh_area_dict.keys():
37         item_obj = SH_Area(name = district)
38         settings.session.add(item_obj)
39     settings.session.commit()
40 
41 
42 if __name__ == \'__main__\':
43     db_init()
database_init

 

图表绘制

1前端绘制

图表绘制的话,这里我使用的是Highcharts。图形比较美观,使用的时候只需要提供需要的数据即可。

我使用的是基础折线图,需要在前端引入几个js文件,如下:jquery.min.js,highcharts.js,exporting.js。然后添加一个div,使用id来标示这个div,样例中使用的是id="container"

官方js部分的代码如下:

 1 $(function () {
 2     $(\'#container\').highcharts({
 3         title: {
 4             text: \'Monthly Average Temperature\',
 5             x: -20 //center
 6         },
 7         subtitle: {
 8             text: \'Source: WorldClimate.com\',
 9             x: -20
10         },
11         xAxis: {
12             categories: [\'Jan\', \'Feb\', \'Mar\', \'Apr\', \'May\', \'Jun\',
13                          \'Jul\', \'Aug\', \'Sep\', \'Oct\', \'Nov\', \'Dec\']
14         },
15         yAxis: {
16             title: {
17                 text: \'Temperature (°C)\'
18             },
19             plotLines: [{
20                 value: 0,
21                 width: 1,
22                 color: \'#808080\'
23             }]
24         },
25         tooltip: {
26             valueSuffix: \'°C\'
27         },
28         legend: {
29             layout: \'vertical\',
30             align: \'right\',
31             verticalAlign: \'middle\',
32             borderWidth: 0
33         },
34         series: [{
35             name: \'Tokyo\',
36             data: [7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]
37         }, {
38             name: \'New York\',
39             data: [-0.2, 0.8, 5.7, 11.3, 17.0, 22.0, 24.8, 24.1, 20.1, 14.1, 8.6, 2.5]
40         }, {
41             name: \'Berlin\',
42             data: [-0.9, 0.6, 3.5, 8.4, 13.5, 17.0, 18.6, 17.9, 14.3, 9.0, 3.9, 1.0]
43         }, {
44             name: \'London\',
45             data: [3.9, 4.2, 5.7, 8.5, 11.9, 15.2, 17.0, 16.6, 14.2, 10.3, 6.6, 4.8]
46         }]
47     });
48 });
官方js

我的工作是在这个基础上,修改js内容来画出符合自己的图。

具体的参考github上代码中的修改,最后画出来的图是这样的。

 

2 后端获取数据并传输给前端

基本上前端表哥需要的数据是一维或者二维数组,比如横坐标时间数组[time1,time2,time3],纵坐标数据数组[data1,data2,data3]这样子。

这里需要注意几点:

1)tornado后端返回数据,使用render()函数渲染到指定的页面即可。

2) js中使用{{ data_rendered }}来获取数据

3)后端传入前端的时间数据为timestamp时间戳,这里需要format一下显示,如下:

 1 function formatDate(timestamp_v) {
 2                   var now = new Date(parseFloat(timestamp_v)*1000);
 3                   var   year=now.getFullYear();
 4                   var   month=now.getMonth()+1;
 5                   var   date=now.getDate();
 6                   var   hour=now.getHours();
 7                   var   minute=now.getMinutes();
 8                   var   second=now.getSeconds();
 9                   return   year+"-"+month+"-"+date+"   "+hour+":"+minute+":"+second;
10 
11             };
formatDate

4)注意js部分二维数组的定义处理

 

3 前端请求传给后端参数

因为需求中可以查询上海各个区的图表,那么可以设计访问地址为r\'/view/(\w+)/(\w+)\',这样前面是city(比如sh,bj等)后面是具体的哪个区area。后端接收到这两个参数后去数据库中查找数据并返回。

最终成型

在数据库中有了数据之后,后面的内容就是前端后端数据的交互,在前端哪些地方绘制图表,需要什么数据,后端返回即可,最终主要的代码是这样的:

  1 import re
  2 from bs4 import BeautifulSoup
  3 import datetime
  4 import time
  5 from tornado import httpclient,gen,ioloop,httpserver
  6 from tornado import web
  7 import tornado.options
  8 import json
  9 
 10 import os,sys
 11 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 12 sys.path.append(BASE_DIR)
 13 
 14 from conf import settings
 15 from database_init import Online_Data,SH_Total_city_dealed,SH_Area
 16 from tornado.options import define,options
 17 
 18 define("port",default=8888,type=int)
 19 
 20 
 21 @gen.coroutine
 22 def obtain_page_data(target_url):
 23     response = yield httpclient.AsyncHTTPClient().fetch(target_url)
 24     data = response.body.decode(\'utf8\')
 25     print("start %s %s" %(target_url,time.time()))
 26 
 27     raise gen.Return(data)
 28 
 29 @gen.coroutine
 30 def get_total_dealed_house(target_url):
 31     # 获取总的房屋成交量
 32     page_data = yield obtain_page_data(target_url)
 33     soup_obj = BeautifulSoup(page_data,"html.parser")
 34     dealed_house = soup_obj.html.body.find(\'div\', {\'class\': \'list-head\'}).text
 35     dealed_house_num = re.findall(r\'\d+\', dealed_house)[0]
 36 
 37     raise gen.Return(int(dealed_house_num))
 38 
 39 @gen.coroutine
 40 def get_online_data(target_url):
 41     # 获取 城市挂牌均价,正在出售数量,90天内交易量,昨日看房次数
 42     page_data = yield obtain_page_data(target_url)
 43     soup_obj = BeautifulSoup(page_data, "html.parser")
 44     online_data_str = soup_obj.html.body.find(\'div\', {\'class\': \'secondcon\'}).text
 45     online_data = online_data_str.replace(\'\n\', \'\')
 46     avg_price, on_sale, _, sold_in_90, yesterday_check_num = re.findall(r\'\d+\', online_data)
 47 
 48     raise gen.Return({\'avg_price\':avg_price,\'on_sale\':on_sale,\'sold_in_90\':sold_in_90,\'yesterday_check_num\':yesterday_check_num})
 49 
 50 @gen.coroutine
 51 def shanghai_data_process():
 52     \'\'\'
 53     获取上海各个区的数据
 54     :return:
 55     \'\'\'
 56     start_time = time.time()
 57     chenjiao_page = "http://sh.lianjia.com/chengjiao/"
 58     ershoufang_page = "http://sh.lianjia.com/ershoufang/"
 59     dealed_house_num = yield get_total_dealed_house(chenjiao_page)
 60     sh_online_data = {}
 61     for key,value in settings.sh_area_dict.items():
 62         sh_online_data[key] = yield get_online_data(ershoufang_page+settings.sh_area_dict[key])
 63     print("dealed_house_num %s" %dealed_house_num)
 64     for key,value in sh_online_data.items():
 65         print(key,value)
 66 
 67     print("tornado time cost: %s" %(time.time()-start_time) )
 68 
 69     #settings.session
 70     update_date = datetime.datetime.now()
 71     dealed_house_num_obj = SH_Total_city_dealed(dealed_house_num=dealed_house_num,
 72                                                 date = update_date)
 73     settings.session.add(dealed_house_num_obj)
 74 
 75     for key,value in sh_online_data.items():
 76         area_obj = settings.session.query(SH_Area).filter_by(name=key).first()
 77         online_data_obj = Online_Data(sold_in_90 = value[\'sold_in_90\'],
 78                                       avg_price = value[\'avg_price\'],
 79                                       yesterday_check_num = value[\'yesterday_check_num\'],
 80                                       on_sale = value[\'on_sale\'],
 81                                       date = update_date,
 82                                       belong_area = area_obj.id)
 83         settings.session.add(online_data_obj)
 84     settings.session.commit()
 85 
 86 class IndexHandler(web.RequestHandler):
 87     def get(self, *args, **kwargs):
 88         total_dealed_house_num = settings.session.query(SH_Total_city_dealed).all()
 89         cata_list = []
 90         data_list = []
 91         for item in total_dealed_house_num:
 92             cata_list.append(time.mktime(item.date.timetuple()))
 93             data_list.append(item.dealed_house_num)
 94 
 95         area_id = settings.session.query(SH_Area).filter_by(name=\'all\').first()
 96         area_avg_price = settings.session.query(Online_Data).filter_by(belong_area = area_id.id).all()
 97         area_date_list = []
 98         area_data_list = []
 99         area_on_sale_list = []
100         area_sold_in_90_list = []
101         area_yesterday_check_num = []
102         for item in area_avg_price:
103             area_date_list.append(time.mktime(item.date.timetuple()))
104             area_data_list.append(item.avg_price)
105             area_on_sale_list.append([time.mktime(item.date.timetuple()),item.on_sale])
106             area_sold_in_90_list.append(item.sold_in_90)
107             area_yesterday_check_num.append(item.yesterday_check_num)
108         self.render("index.html",cata_list=cata_list,
109                     data_list=data_list,area_date_list = area_date_list,area_data_list = area_data_list,
110                     area_on_sale_list = area_on_sale_list,area_sold_in_90_list=area_sold_in_90_list,
111                     area_yesterday_check_num = area_yesterday_check_num,city="sh",area="all")
112 
113 class QueryHandler(web.RequestHandler):
114     def get(self,city,area):
115 
116         if city == "sh":
117             total_dealed_house_num = settings.session.query(SH_Total_city_dealed).all()
118 
119             cata_list = []
120             data_list = []
121             for item in total_dealed_house_num:
122                 cata_list.append(time.mktime(item.date.timetuple()))
123                 data_list.append(item.dealed_house_num)
124 
125             area_id = settings.session.query(SH_Area).filter_by(name=area).first()
126             area_avg_price = settings.session.query(Online_Data).filter_by(belong_area=area_id.id).all()
127             area_date_list = []
128             area_data_list = []
129             area_on_sale_list = []
130             area_sold_in_90_list = []
131             area_yesterday_check_num = []
132             for item in area_avg_price:
133                 area_date_list.append(time.mktime(item.date.timetuple()))
134                 area_data_list.append(item.avg_price)
135                 area_on_sale_list.append([time.mktime(item.date.timetuple()), item.on_sale])
136                 area_sold_in_90_list.append(item.sold_in_90)
137                 area_yesterday_check_num.append(item.yesterday_check_num)
138 
139             self.render("index.html", cata_list=cata_list,
140                         data_list=data_list, area_date_list=area_date_list, area_data_list=area_data_list,
141                         area_on_sale_list=area_on_sale_list, area_sold_in_90_list=area_sold_in_90_list,
142                         area_yesterday_check_num=area_yesterday_check_num,city=city,area=area)
143         else:
144             self.redirect("/")
145 
146 
147 
148 
149 class MyApplication(web.Application):
150     def __init__(self):
151         handlers = [
152             (r\'/\',IndexHandler),
153             (r\'/view/(\w+)/(\w+)\',QueryHandler),
154 
155         ]
156 
157         settings = {
158             \'static_path\': os.path.join(os.path.dirname(os.path.dirname(__file__)), "static"),
159             \'template_path\': os.path.join(os.path.dirname(os.path.dirname(__file__)), "templates"),
160         }
161 
162         super(MyApplication,self).__init__(handlers,**settings)
163 
164 # ioloop.PeriodicCallback(f2s, 2000).start()
165 
166 if __name__==\'__main__\':
167     http_server = httpserver.HTTPServer(MyApplication())
168     http_server.listen(options.port)
169     ioloop.PeriodicCallback(shanghai_data_process,86400000).start() #毫秒 86400000
170     ioloop.IOLoop.instance().start()
data_collect

几点说明:

1 因为要定期去网页上获取数据,这里使用了ioloop.PeriodicCallback()函数来定时处理。

 

结合nginx部署

自己有一台AWS 的EC2虚机,操作系统是centos7,最后是要把程序放到上面去跑。

1 安装部署nginx

  因为时间关系没有做过深入的研究,只是从网上翻了下几本的东西,如下:

1 使用wget下载nginx包(nginx-1.11.6.tar.gz),并解压
2 进入nginx-1.11.6
3 ./configure
4 make
5 make install

配置文件修改/usr/local/nginx/conf/nginx.conf

reload nginx 使用 /usr/local/nginx/sbin/nginx -s reload

2 调整虚机的inbound 防火墙规则,我添加的是80端口(nginx配置文件中同样监听80端口)

1、登录到AWS console主界面
2、左侧INSTANCES-Instances
3、右侧group security
4、下面inbounds
5、edit
6、edit inbounds rules页面中自己添加规则

3 测试访问nginx

如果正常,会显示Welcome nginx的页面

4 运行tornadao代码后reload nginx

 

效果图以及代码

1 几个效果图如下:

 

 

 

 

 

2 代码放在github

 

解决sqlalchemy session问题

在代码运行之后的几天发现,每隔大约半天的时间,程序虽然不会挂掉,但是在浏览器访问的时候会出现500 error。后台日志中也会报访问的错误。

仔细研究了下后台日志的报错,发现应该是浏览器使用旧的session信息来访问,但是session信息在程序中已经过期,所以导致错误。仔细审查了下代码,确实是在settings文件中初始化了一个session,然后后面所有的DB相关操作都用了这个session。显然是有问题的。

 

解决办法其实很简单,只要把数据库session的生命周期与http 每次request的生命周期放在一起即可。也就说在每次http request开始的时候初始化一个db session,然后在每次reqeust结束的时候close掉这个db session即可。可以参考下flask框架中这部分内容的介绍

 

1 sqlalchemy部分

为了实现上述的说明,sqlalchemy 这边需要使用一个新的对象scoped_session,官方示例如下:

1 >>> from sqlalchemy.orm import scoped_session
2 >>> from sqlalchemy.orm import sessionmaker
3 
4 #创建session
5 >>> session_factory = sessionmaker(bind=some_engine)
6 >>> Session = scoped_session(session_factory)
7 
8 #关闭session
9 >>> Session.remove()

更多的说明参考这里

2 tornado 部分

在RequestHandler中重写initialize()和on_finish()两个函数。initialize()函数中初始化db session,而在on_finish()的时候结束这个db session。BaseHandler是一个基础的handler,其他request handler 只需要继承 BaseHandler即可。

1 class BaseHandler(web.RequestHandler):
2     def initialize(self):
3         self.db_session = scoped_session(sessionmaker(bind=settings.engine))
4         self.db_query = self.db_session().query
5 
6     def on_finish(self):
7         self.db_session.remove()

 

分类:

技术点:

相关文章:

  • 2021-06-22
  • 2021-11-30
  • 2021-12-19
  • 2021-11-12
  • 2021-06-05
  • 2021-10-06
  • 2021-08-21
  • 2021-08-23
猜你喜欢
  • 2022-01-20
  • 2021-05-17
  • 2021-12-16
  • 2021-10-16
  • 2021-12-19
  • 2021-07-18
  • 2021-04-12
相关资源
相似解决方案