【发布时间】:2018-10-11 02:10:08
【问题描述】:
我正在抓取这个网站:http://housing.ucdavis.edu/dining/menus/dining-commons/tercero/。以下是我的代码:
import requests #For request to the website
from bs4 import BeautifulSoup #For parsering
from warnings import warn #For non-200 status code
url = 'http://housing.ucdavis.edu/dining/menus/dining-commons/tercero/'
page = requests.get (url)
if page.status_code != 200:
warn('Search: {}; Status code: {}. Status of the request is not normal.'.format (search, page.status_code))
else:
soup = BeautifulSoup(page.content, 'html.parser')
main_content = soup.find('div', attrs = {'id': 'tab4content'})
meal_tag = main_content.find_all('h4')
meal_list = []
for meal in meal_tag:
meal_name = meal.text
meal_list.append(meal_name)
print ('The meals we have today are: '+", ".join(meal_list))
print (meal_list)
for meal_pick in meal_list:
print (meal_pick)
locations_per_meal = main_content.find('h4',text=str(meal_pick)).find_next_siblings('h5')
for location in locations_per_meal:
print (location.text)
dish_list = main_content.find ('h5',text=location.text).find_next_sibling('ul')
real_dish_list = []
for dish in dish_list:
real_dish_list = dish_list.findChildren('span')
real_item_list = []
for item in real_dish_list:
item = item.text
real_item_list.append(item)
print (real_item_list)
基本上,我想重现所有菜名、它们在哪里以及它们属于哪一餐。但是,我的代码仅适用于早餐,其他餐点的菜肴与早餐完全相同,除非早餐中未显示位置。我想不知怎的,我不能用新菜来覆盖旧菜。有人可以评论并帮助我解决这个问题吗?谢谢~
【问题讨论】:
标签: python loops web-scraping review