【发布时间】:2020-03-22 01:13:37
【问题描述】:
我想制作韩国音乐节的列表,所以我尝试爬取一个销售音乐节门票的网站:
import requests
from bs4 import BeautifulSoup
INTERPARK_BASE_URL = 'http://ticket.interpark.com'
# Festival List Page
req = requests.get('http://ticket.interpark.com/TPGoodsList.asp?Ca=Liv&SubCa=Fes')
html = req.text
soup = BeautifulSoup(html, 'lxml')
for title_raw in soup.find_all('span', class_='fw_bold'):
title = str(title_raw.find('a').text)
url_raw = str(title_raw.find('a').get('href'))
url = INTERPARK_BASE_URL + url_raw
# Detail Page
req_detail = requests.get(url)
html_detail = req_detail.text
soup_detail = BeautifulSoup(html_detail, 'lxml')
details_1 = soup_detail.find('table', class_='table_goods_info')
details_2 = soup_detail.find('ul', class_='info_Lst')
image = soup_detail.find('div', class_='poster')
singers = str(details_1.find_all('td')[4].text)
place = str(details_1.find_all('td')[5].text)
date_text = str(details_2.find('span').text)
image_url = str(image.find('img').get('src'))
print(title)
print(url)
print(singers)
print(place)
print(date_text)
print(image_url)
我用for循环浏览了列表页中的所有详情页,但是加载每个详情页太慢了。
如何加快我的代码速度?
【问题讨论】:
标签: python beautifulsoup