【发布时间】:2016-01-03 01:42:01
【问题描述】:
我一直试图弄清楚为什么当数据保存在 csv 中时,这段代码会为每个产品生成多个价格。似乎产品所在页面上的行的所有价格都保存在该行中的每个产品下。显然,我想要做的只是为每个产品节省一个价格,而不是每个产品 3 或 4 个。
我自己无法弄清楚这一点。需要更改什么以便只存储每种产品的正确价格?
import mechanize
from lxml import html
import csv
import io
from time import sleep
def save_products (products, writer):
for product in products:
writer.writerow([ product["title"][0].encode('utf-8') ])
for price in product['prices']:
writer.writerow([ price["value"][0].encode('utf-8') ])
f_out = open('ssdResult.csv', 'wb')
writer = csv.writer(f_out)
links = ["http://sciencesuppliesdirect.com/research-chemicals", "http://sciencesuppliesdirect.com/research-chemicals?p=2", "http://sciencesuppliesdirect.com/research-chemicals?p=3","http://sciencesuppliesdirect.com/research-chemicals?p=4","http://sciencesuppliesdirect.com/research-chemicals?p=5","http://sciencesuppliesdirect.com/research-chemicals?p=6","http://sciencesuppliesdirect.com/research-chemicals?p=7","http://sciencesuppliesdirect.com/research-chemicals?p=8","http://sciencesuppliesdirect.com/research-chemicals?p=9","http://sciencesuppliesdirect.com/research-chemicals?p=10","http://sciencesuppliesdirect.com/research-chemicals?p=11","http://sciencesuppliesdirect.com/research-chemicals?p=12","http://sciencesuppliesdirect.com/research-chemicals?p=13","http://sciencesuppliesdirect.com/research-chemicals?p=14","http://sciencesuppliesdirect.com/research-chemicals?p=15","http://sciencesuppliesdirect.com/research-chemicals?p=16","http://sciencesuppliesdirect.com/research-chemicals?p=17","http://sciencesuppliesdirect.com/research-chemicals?p=18","http://sciencesuppliesdirect.com/research-chemicals?p=19","http://sciencesuppliesdirect.com/research-chemicals?p=20","http://sciencesuppliesdirect.com/research-chemicals?p=21","http://sciencesuppliesdirect.com/research-chemicals?p=22","http://sciencesuppliesdirect.com/research-chemicals?p=23","http://sciencesuppliesdirect.com/research-chemicals?p=24","http://sciencesuppliesdirect.com/cannabinoids","http://sciencesuppliesdirect.com/cannabinoids?p=2","http://sciencesuppliesdirect.com/cannabinoids?p=3","http://sciencesuppliesdirect.com/cannabinoids?p=4","http://sciencesuppliesdirect.com/cannabinoids?p=5","http://sciencesuppliesdirect.com/cannabinoids?p=6","http://sciencesuppliesdirect.com/cannabinoids?p=7","http://sciencesuppliesdirect.com/pellets","http://sciencesuppliesdirect.com/pellets?p=2","http://sciencesuppliesdirect.com/pellets?p=3","http://sciencesuppliesdirect.com/herbal-blends","http://sciencesuppliesdirect.com/herbal-blends?p=2","http://sciencesuppliesdirect.com/branded-products","http://sciencesuppliesdirect.com/branded-products?p=2"]
br = mechanize.Browser()
for link in links:
print(link)
r = br.open(link)
content = r.read()
products = []
tree = html.fromstring(content)
product_nodes = tree.xpath('//div[@class="category-products"]/ul')
for product_node in product_nodes:
product = {}
try:
product['title'] = product_node.xpath('.//li/div[2]/h2/a/text()')
except:
product['title'] = ""
price_nodes = product_node.xpath('.//li/div[2]/div[1]/span')
product['prices'] = []
for price_node in price_nodes:
price = {}
try:
price['value'] = price_node.xpath('.//span/text()')
except:
price['value'] = ""
product['prices'].append(price)
products.append(product)
save_products(products, writer)
f_out.close()
【问题讨论】:
-
什么意思?请包括输入 (html?) ,你得到什么输出。以及您的预期。
-
输入是代码中的链接。如果你运行它,你会看到 csv 中的结果对每个项目都有多个价格,而在页面上每个项目只有一个价格。
-
您好像存储了多个价格
-
是的,没错。我试图弄清楚为什么我要存储多个价格,而不仅仅是每件商品的相应价格。
-
没有人对这个问题有什么建议吗?
标签: python csv xpath web-scraping mechanize