【发布时间】:2019-07-31 13:40:20
【问题描述】:
我试图只捕获“Other”文本,本质上是提取强标签元素
<ul class="listing-row__meta">
<li>
<strong>Ext. Color:</strong>
Other
</li>
</ul>
到目前为止我的代码:
import requests
from bs4 import BeautifulSoup
from csv import writer
response = requests.get('https://www.cars.com/for-sale/searchresults.action/?mdId=21811&mkId=20024&page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=false&sort=relevance&stkTypId=28880&zc=11209')
soup = BeautifulSoup(response.text, 'html.parser')
posts = soup.find_all(class_='shop-srp-listings__inner')
with open('posts.csv', 'w') as csv_file:
csv_writer = writer(csv_file)
headers = ['title', 'color', 'price']
csv_writer.writerow(headers)
for post in posts:
title = post.find(class_="listing-row__title").get_text().replace('\n', '').strip()
color = post.find("li").get_text().replace('\n', '').strip()
colorremove = color.strong.extract()
price = post.find("span", attrs={"class": "listing-row__price"}).get_text().replace('\n', '').strip()
csv_writer.writerow([title, colorremove, price])
这个特定的脚本没有运行,在此之前我只保留了颜色线并且工作正常,但它描绘了“Ext. Color”
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup screen-scraping