【发布时间】:2021-08-14 06:39:41
【问题描述】:
导入请求 进口时间 从 bs4 导入 BeautifulSoup 从 DB.model 导入 CrawlingBook 从日期时间导入日期时间
url = "https://www.aladin.co.kr/shop/common/wbest.aspx?BestType=Bestseller&BranchType=1&CID=0&cnt=1000&SortOrder=1&page=" 对于范围内的 i (1,20):
pageUrl = url + str(i)
response = requests.get(pageUrl)
html = response.text
parsedHtml = BeautifulSoup(html, 'html.parser')
tableList = parsedHtml.select('#Myform .ss_book_box')
for book in tableList:
imgUrl = book.select('table')[0].select('img')[0].get('src')
title = book.select('.ss_book_list')[0].select('ul .bo3')[0].text
authorIndex = 1;
if(book.select('.ss_book_list')[0].select('ul .ss_ht1')):
authorIndex = 2;
author = book.select('.ss_book_list')[0].select('ul li')[authorIndex].select('a')[0].text
else:
author = book.select('.ss_book_list')[0].select('ul li')[authorIndex].select('a')[0].text
now = datetime.now()
crawlingBook = CrawlingBook()
crawlingBook.title = title
crawlingBook.author_name = author
crawlingBook.img_url = imgUrl
crawlingBook.create_at = str(now)
print(i, '페이지 크롤링 완료...')
time.sleep(1)
我想爬取图书信息(title, author_name, img_url) 并查看数据的创建时间。但我坚持将数据传输到我的数据库(MySQL)中。任何帮助的话将不胜感激。
【问题讨论】:
标签: python database web-crawler