想要从网页及其下一页中抓取答案

【问题标题】：Want to scrape from a web page and its next pages想要从网页及其下一页中抓取
【发布时间】：2020-07-23 04:32:14
【问题描述】：

我想将公司名称、人员、国家、电话和电子邮件提取到 excel 文件中。我尝试了以下代码，但它在 excel 文件中只返回一个值。如何在第一页和下一页也循环播放..

import csv
import re
import requests
import urllib.request
from bs4 import BeautifulSoup
for page in range(10):
        url = "http://www.aepcindia.com/buyersdirectory"
        soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'lxml')
        tbody = soup('div', {'class':'view-content'})#[0].find_all('')
        f = open('filename.csv', 'w', newline = '')
        Headers = "Name,Person,Country,Email,Phone\n"
        csv_writer = csv.writer(f)
        f.write(Headers)
        for i in tbody:
                try:
                    name = i.find("div", {"class":"company_name"}).get_text()
                    person = i.find("div", {"class":"title"}).get_text()
                    country = i.find("div", {"class":"views-field views-field-field-country"}).get_text()
                    email = i.find("div", {"class":"email"}).get_text()
                    phone = i.find("div", {"class":"telephone_no"}).get_text()
                    print(name, person, country, email, phone)
                    f.write("{}".format(name).replace(","," ")+ ",{}".format(person)+ ",{}".format(country)+ ",{}".format(email) + ",{}".format(phone) + "\n")
                except: AttributeError
        f.close()

这是网页的链接 http://www.aepcindia.com/buyersdirectory

【问题讨论】：

标签： python beautifulsoup screen-scraping

【解决方案1】：

import requests
from bs4 import BeautifulSoup
import csv

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'}


def main(url):
    with requests.Session() as req:
        with open("data.csv", 'w', newline="") as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Title", "Country", "Email", "Phone"])
            for item in range(0, 10):
                print(f"Extracting Page# {item +1}")
                r = req.get(url.format(item), headers=headers)
                soup = BeautifulSoup(r.content, 'html.parser')

                name = [name.text for name in soup.select("div.company_name")]
                title = [title.text for title in soup.select("div.title")]
                country = [country.text for country in soup.findAll(
                    "div", class_="field-content", text=True)]
                email = [email.a.text for email in soup.select(
                    "div.email")]
                phone = [phone.text
                         for phone in soup.select("div.telephone_no")]
                data = zip(name, title, country, email, phone)
                writer.writerows(data)


main("http://www.aepcindia.com/buyersdirectory?page={}")

输出：view-online

【讨论】：

兄弟，就像魔术一样工作......非常感谢，你能再教我一件事吗？我在excel中有一个搜索查询列表，需要在google上搜索，结果应该保存在另一个excel中怎么做？