【发布时间】:2018-12-10 21:26:22
【问题描述】:
我目前正在构建一个程序,该程序通过 wikipedia 解析以在地图上显示一个国家的山脉。
我已经能够找到感兴趣的 url,但是我无法重定向到新的 url(所有需要的数据所在的位置)。
非常感谢任何和所有建议,包括使用其他库!
import requests
from bs4 import BeautifulSoup
from csv import writer
import urllib3
#Requests country name from user
user_input=input('Enter Country:')
fist_letter=user_input[0:1].upper()
country=fist_letter+user_input[1:] #takes the country name and capatalizes
the first letter
#Request response for wikipedia parse
response=requests.get('https://en.wikipedia.org/wiki/Category:
Lists_of_mountains_by_country')
bs=BeautifulSoup(response.text,'html.parser')
#country query
for content in bs.find_all(class_='mw-category')[1]:
category_letter=content.find('h3')
#Locates target category to find the country of interest
if fist_letter in category_letter:
country_lists=category_letter.find_next_sibling('ul')
#Locates the country of interest from the lists of countries in target
#category
target=country_lists.find('li',text="List of mountains in
"+str(country))
#Grabs the link which will redirect to the page containing the list of
#mountains for the country of interest.
target_link=target.find('a')
link=target_link.get('href')
new_link='https://enwikipedia.org'+link
#Attempts to redirect to the target link
new_response=requests.get(new_link)
BS=BeautifulSoup(new_response.text,'html.parser')
mountain_list=content.find('tbody')
print(mountain_list)
else:
pass
【问题讨论】:
-
https://enwikipedia.org不应该是https://en.wikipedia.org。无论如何,只添加国家名称会更容易:https://en.wikipedia.org/wiki/Category:Lists_of_mountains_of_COUNTRYNAME -
哇,是的,可能是这样,我会试试看,看看效果如何!谢谢!
-
不客气@jamil。你会接受我的评论作为答案吗?
-
是的,当然! PS。我还没有足够的积分来投票...
标签: python url beautifulsoup python-requests html-parsing