【发布时间】:2021-12-08 23:50:23
【问题描述】:
不在 csv 文件中保存 75 条记录,但在终端中打印记录 网址:https://sehat.com.pk/categories/Over-The-Counter-Drugs/Diarrhea-and-Vomiting-/?sort=alphaasc&page=2
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
for page_number in range(1, 6):
url = f'https://sehat.com.pk/categories/Over-The-Counter-Drugs/Diarrhea-and-Vomiting-/?sort=featured&page='+str(page_number)
r = requests.get(url)
#time.sleep(6)
soup = BeautifulSoup(r.content, 'html.parser')
content = soup.find_all('div', class_ = 'col-md-12 pr-0 pl-0')
suit =[]
for property in content:
names = property.find('div',class_='col-md-12 d-table-cell align-middle')
name= names.find('img', class_ = 'img-fluid')['alt']
links=property.find('a')['href']
try:
price= property.find('div', class_ = 'ProductPriceRating d-table-cell text-center pl-1 pr-1 align-middle').text.strip()
except AttributeError:
price=''
try:
product_brand =property.find('div',class_ ='ProductBoxProductBrand-div d-table-row text-center pl-1 pr-1 align-middle').text.strip()
except AttributeError:
product_brand=''
print(name,product_brand,links,price)
fabric = {
'productname':name,
'product_Brand':product_brand,
'Product_price': price,
'links': links,
}
suit.append(fabric)
print ("Importing to Data into CSV File...!!")
df = pd.DataFrame(suit)
print("Saved Sucessfully....")
df.to_csv('Diarrhea_and_Vomiting_pagination.csv', index=False)
【问题讨论】:
-
您只保存最后一个网页上的项目。您的列表“西装”应在 for 循环之前初始化。
-
它有效,谢谢
标签: python pandas beautifulsoup request webdriver