【发布时间】:2019-08-03 11:04:19
【问题描述】:
url = 'https://www.tripadvisor.ie/Attraction_Review-g295424-d2038312-Reviews-Global_Village-Dubai_Emirate_of_Dubai.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
def get_links():
review_links = []
for review_link in soup.find_all('a', {'class':'title'},href=True):
review_link = review_link['href']
review_links.append(review_link)
return review_links
link = 'https://www.tripadvisor.ie'
review_urls = []
for i in get_links():
review_url = link + i
print (review_url)
review_urls.append(review_url)
此代码用于保存此网页上存在的所有超链接 - 但我想抓取页面上的所有超链接直到 319。禁用分页时无法实现
【问题讨论】:
标签: python web-scraping beautifulsoup pagination