下一页 python beautifulsoup答案

【问题标题】：Next page with python beautifulsoup下一页 python beautifulsoup
【发布时间】：2020-12-16 04:21:19
【问题描述】：

我是 Python 新手，坚持“下一页”逻辑。

我尝试了 while loop & selenium with chrome 没有任何效果。

请对此有所了解-

import requests
from bs4 import BeautifulSoup
import csv 

pages = [ 0 , 25 , 50 , 75]
for page in pages:
    source = requests.get('https://finance.yahoo.com/screener/predefined/day_gainers?count=25&offset={}'.format(page)).text

soup = BeautifulSoup(source , 'lxml') 

for link in soup.find_all("a"):
    table = soup.find("table",{"class":"W(100%)"})
    thead = table.find("thead").find_all("th")
    table_head = [th.text for th in thead]
    #print(table_head)

    table_body = table.find ("tbody").find_all("tr")
        
with open("report.csv" , "a" , newline="") as csv_file:
        csv_write = csv.writer(csv_file)
        csv_write.writerow(table_head)
        
        for tr in table_body:
            table_data = [td.text.strip() for td in tr.find_all('td') ]
            csv_write.writerow(table_data)

【问题讨论】：

欢迎来到 StackOverflow。提供更多关于什么不起作用的描述是有用的（即使你认为它隐含地存在）例如发生了什么，应该发生什么，任何错误代码以及发生错误的位置......您可以从How to Ask 和tour 以及help center 获得更多提示。

标签： python selenium web-scraping beautifulsoup next

【解决方案1】：

我认为需要缩进您的代码并且它的工作正常。这是代码：

import requests
from bs4 import BeautifulSoup
import csv

pages = [ 0 , 25 , 50 , 75]
for page in pages:
    
    source = requests.get('https://finance.yahoo.com/screener/predefined/day_gainers?count=25&offset={}'.format(page)).text
    

    soup = BeautifulSoup(source , 'lxml')


    for link in soup.find_all("a"):
        table = soup.find("table",{"class":"W(100%)"})
        thead = table.find("thead").find_all("th")
        table_head = [th.text for th in thead]
        #print(table_head)

        table_body = table.find ("tbody").find_all("tr")

        with open("report.csv" , "a" , newline="") as csv_file:
                csv_write = csv.writer(csv_file)
                csv_write.writerow(table_head)

                for tr in table_body:
                    table_data = [td.text.strip() for td in tr.find_all('td') ]
                    csv_write.writerow(table_data)

编辑对于第二个 for 循环获取重复值。所以删除第二个 for 循环。这是编辑后的代码。

import requests
from bs4 import BeautifulSoup
import csv

pages = [ 0,25,50,75 ]
for page in pages:
    source = requests.get('https://finance.yahoo.com/screener/predefined/day_gainers?count=25&offset={}'.format(page)).text
    soup = BeautifulSoup(source , 'lxml')

    table = soup.find("table",{"class":"W(100%)"})
    thead = table.find("thead").find_all("th")
    table_head = [th.text for th in thead]
    table_body = table.find ("tbody").find_all("tr")
    with open("report.csv" , "a" , newline="") as csv_file:
            csv_write = csv.writer(csv_file)
            csv_write.writerow(table_head)
            for tr in table_body:
                table_data = [td.text.strip() for td in tr.find_all('td') ]
                csv_write.writerow(table_data)

【讨论】：

谢谢，但我收到了很多重复。