【发布时间】:2017-07-21 23:40:49
【问题描述】:
我尝试了其他类型的 css 选择器和 xpath,所以我假设我可能错误地使用了该库,但没有文档没有告诉我其他情况。我还尝试了其他 bs4 函数,例如 find_all,但很多都没有返回任何其他结果。任何类型的帮助都将不胜感激,干杯!
代码:
import bs4 as bs
from requests import get
query = input('Please Enter Your Topic of intrest: ')
first_part = query.replace(" ", "%20")
second_part = query.replace(" ", "+")
results= "0"
num_of_pages = int(input('How many pages do you want scraped? '))
for i in range(num_of_pages):
results= int(results)
results += 10
gsearch_url = "https://www.google.com/search?q={}#q={}%3F&start={}&*".format(first_part, second_part, results)
sauce = get(gsearch_url)
soup = bs.BeautifulSoup(sauce.text, 'lxml')
for url in soup.select('.r a'):
print(url.get('href'))
返回:
/url?q=http://www.codingdojo.com/blog/9-most-in-demand-programming-languages-of-2016/&sa=U&ved=0ahUKEwja3a21w7fSAhWSZiYKHdLGA9gQFggdMAI&usg=AFQjCNFmDl_1epVQRmDfc4y5MWFeNvrPQg
/url?q=https://fossbytes.com/best-popular-programming-languages-2017/&sa=U&ved=0ahUKEwja3a21w7fSAhWSZiYKHdLGA9gQFgghMAM&usg=AFQjCNEKhYqx1FbKl_Wu-9EoMYd3e9i_Dw
/url?q=http://www.bestprogramminglanguagefor.me/&sa=U&ved=0ahUKEwja3a21w7fSAhWSZiYKHdLGA9gQFggnMAQ&usg=AFQjCNHmbzuLwFo_egaWnbXSOW4p-Fva3g
/url?q=http://www.codingdojo.com/blog/9-most-in-demand-programming-languages-of-2016/&sa=U&ved=0ahUKEwja3a21w7fSAhWSZiYKHdLGA9gQFggyMAU&usg=AFQjCNFmDl_1epVQRmDfc4y5MWFeNvrPQg
etc....
【问题讨论】:
-
我不明白你的问题,请说明你想要的回报(结果)并正确呈现你的代码。
标签: web-scraping beautifulsoup python-3.5 google-search