尝试遍历多个 URL 并从每个 URL 导入一些数据答案

【问题标题】：Trying to loop through multiple URLs and import some data rom each尝试遍历多个 URL 并从每个 URL 导入一些数据
【发布时间】：2022-01-05 23:21:04
【问题描述】：

我正在尝试将循环通过几个 URL 并从每个 URL 中获取一些数据点的代码组合在一起。这是我的超级黑客代码。

import requests
from bs4 import BeautifulSoup

base_url = "https://www.amazon.com/s?k=mountain+bikes&ref=nb_sb_noss_"
current_page = 1

while current_page < 5:
    print(current_page)
    url = base_url + str(current_page)
    #current_page += 1
    r = requests.get(url)
    zute_soup = BeautifulSoup(r.text, 'html.parser')
    firme = zute_soup.findAll('div', {'class': 'brand-follow-tooltip-root'})
    
    title = []
    desc = []
    page = []
    for title in firme:
        title1 = title.findAll('h1')[0].text
        print(title1)
        adresa = title.findAll('div', {'class': 'brand-follow-tooltip-root'})[0].text
        print(adresa)
        print('\n')
        page_line = "{title1}\n{adresa}".format(
            title1=title1,
            adresa=adresa
        )
        
        title.append(title1)
        desc.append(adresa)
        page.append(page_line)
    current_page += 1

代码在几秒钟内完成，我没有收到任何错误，但任何列表都没有附加任何内容。我认为这很接近，但我不知道这里有什么问题。

【问题讨论】：

标签： python python-3.x beautifulsoup python-requests

【解决方案1】：

对于您使它们无效的每次迭代，这是预期的吗？

while current_page < 5:
    
  .
  .
  .
    title = []
    desc = []
    page = []
.
.
.
        title.append(title1)
        desc.append(adresa)
        page.append(page_line)
    current_page += 1

移动

    title = []
    desc = []
    page = []

退出while循环。并且您的附加内容不会被取消。

【讨论】：

好收获！不知道为什么我没有看到。我做了改变，结果还是一样；没有结果。我认为问题出在这里： title1 = title.findAll('h1')[0].text 另外，这里似乎有问题： adresa = title.findAll('div', {'class': 'brand- follow-tooltip-root'})[0].text