【发布时间】:2020-10-21 04:28:20
【问题描述】:
我已经用 BS4 制作了一个 python 脚本来尝试从谷歌搜索中获取结果。
问题:我只能从 Google 的第 1 页获取数据
尝试解决:我尝试在googlepage list底部提取每个页面href并从1,2,3...10中获取每个页面的href并重复我的过程就像我在第 1 页所做的那样。
尝试问题:当我尝试提取页面 1..10 URL 时,这些页面链接与 google.com 上的 inspect 元素中的链接不同。 google inspect element links
import requests
from bs4 import BeautifulSoup
import functions
#-----------------------------------------------------------------------
url = 'https://google.com/search?q=manga' # main link to get data
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'} # headers
source = requests.get(url,headers = headers).text # url source
#making tasty soup
soup = BeautifulSoup(source, 'lxml')
#-----------------------------------------------------------------------
pages = []
search_div = soup.find_all(class_='rc') # find all divs tha contains search result
def get_result(search):
result = []
for result in search: # loop result list
print('Title: %s' % result.h3.string) # geting h3
result.append(result.h3.string)
print('Url: %s' % result.a.get('href')) # geting a.href
print('Description: %s' % result.find(class_='st').text) # description
print('\n###############\n')
return result
result = get_result(search_div)
a = soup.find('table')
b = soup.find("tr", {'valign':'top'})
for i in b:
print(str(i))
编辑: 上面的代码产生:
Title: Manga - Wikipedia
Url: https://en.wikipedia.org/wiki/Manga
Description: Manga are comics or graphic novels originating from Japan. Most manga conform to a style developed in Japan in the late 19th century, though the art form has ...
###############
Title: Read the Best Manga - VIZ
Url: https://www.viz.com/read
Description: Action, adventure, fantasy, mystery, romance and more—thousands of manga volumes for every fan!
###############
Title: Manga Toon - Free manga, comic and novel reader online
Url: https://mangatoon.mobi/
Description: MangaToon is a Global APP for Reading Comic Manga and Novel. Different comics in Action, Romance, Boys' love, Comedy, Horror and more are updated ...
###############
Title: Read Popular Manga Online - Crunchyroll
Url: https://www.crunchyroll.com/comics/manga
Description: Read your favorite Japanese manga online on Crunchyroll including Attack on Titan, Fairy Tail, The Seven Deadly Sins, Fuuka, Knight's & Magic, and more.
###############
Title: Manga Books - Goodreads
Url: https://www.goodreads.com/genres/manga
Description: Manga. Japanese or Japanese-influenced comics and graphic novels. Usually printed in black-and-white. There are many genres inside manga, the most distinct being shojo (for girls) and shonen (for boys).
###############
Title: Manga and Anime Books | Barnes & Noble®
Url: https://www.barnesandnoble.com/b/books/graphic-novels-comics/manga/_/N-29Z8q8Zucc
Description: Discover an extensive collection of manga and anime books at Barnes & Noble. Shop a wide variety of Manga series, boxed sets, bestsellers, and more.
###############
Title: 50 Best Manga You Must Read Right Now: Classics And New ...
Url: https://bookriot.com/2020/05/26/best-manga/
Description: May 26, 2020 - New to reading manga and don't know where to start? Want to find a new series to dive into? Here's a list of the 50 best manga to add to your ...
###############
Title: MANGA Plus
Url: https://mangaplus.shueisha.co.jp/updates
Description: "MANGA Plus by SHUEISHA" is the official manga reader from Shueisha Inc., and is available globally. We publish the greatest manga in the world such as ...
###############
<td class="b d6cvqb"><span class="SJajHc" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-24px 0;width:28px"></span></td>
<td class="YyVfkd"><span class="SJajHc" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-53px 0;width:20px"></span>1</td>
<td><a aria-label="Page 2" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=10&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExAs"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>2</a></td>
<td><a aria-label="Page 3" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=20&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExAu"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>3</a></td>
<td><a aria-label="Page 4" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=30&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExAw"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>4</a></td>
<td><a aria-label="Page 5" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=40&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExAy"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>5</a></td>
<td><a aria-label="Page 6" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=50&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExA0"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>6</a></td>
<td><a aria-label="Page 7" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=60&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExA2"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>7</a></td>
<td><a aria-label="Page 8" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=70&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExA4"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>8</a></td>
<td><a aria-label="Page 9" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=80&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExA6"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>9</a></td>
<td><a aria-label="Page 10" class="fl" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=90&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8tMDegQIExA8"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-74px 0;width:20px"></span>10</a></td>
<td aria-level="3" class="b d6cvqb" role="heading"><a class="G0iuSb" href="/search?q=manga&ei=E5r7XviHIMbn-QbH4b0Y&start=10&sa=N&ved=2ahUKEwi43ZGeqqrqAhXGc94KHcdwDwMQ8NMDegQIExA-" id="pnnext" style="text-align:left"><span class="SJajHc NVbCr" style="background:url(/images/nav_logo299.png) no-repeat;background-position:-96px 0;width:71px"></span><span style="display:block;margin-left:53px">Next</span></a></td>
Process finished with exit code 0
我如何测试链接是否正确:我进入 chrome 上的搜索栏并输入“google.com”以查看页码是否更改。我尝试了每个链接,我总是在第一页。
【问题讨论】:
-
你能创建一个minimal reproducible example吗?一些应该有效但无效的最小内容,以及您认为它应该做什么以及它实际做了什么的解释。我想
source的内容不是你想象的那样。
标签: python html google-chrome web-scraping beautifulsoup