【问题标题】:Scrape Mobile Specification from a webpage从网页中抓取移动规范
【发布时间】:2017-11-30 09:53:53
【问题描述】:

我正在尝试获取网页中列出的所有手机的详细信息,例如名称、价格和规格。我成功地获得了名称和价格,以及规格——它搞砸了。有 24 款手机列表,当我尝试获取规格时,它会将规格全部放在一个列表中。我无法根据它们所属的电话找到合适的方法将它们分开。任何帮助都将不胜感激。下面是函数定义-

def get_link(self,link):
    page = requests.get(link)
    tree = html.fromstring(page.content)
    name = tree.xpath("//div[@class='_3wU53n']/text()")
    print name
    time.sleep(5)
    price = tree.xpath("//div[@class='_1vC4OE _2rQ-NK']/text()")[1::2]
    print price      
    time.sleep(5)
    highlights = tree.xpath("//ul[@class='vFw0gD']/li/text()")
    print highlights


'''
    dictionary={}
    for i in range(len(name)):
        dictionary[name[i]]=price[i]
    print dictionary


    return
'''

通过的链接是-https://www.flipkart.com/mobiles-accessories/mobiles/pr?count=40&otracker=categorytree&p%5B%5D=sort%3Dpopularity&sid=tyy%2F4io

到目前为止的输出是 -

['Mi A1 (Black, 64 GB)', 'Redmi Note 4 (Gold, 32 GB)', 'Mi A1 (Rose Gold, 64 GB)', 'Redmi Note 4 (Gold, 64 GB)', 'Redmi Note 4 (Black, 32 GB)', 'Honor 9i (Graphite Black, 64 GB)', 'Redmi Note 4 (Black, 64 GB)', 'Moto E4 Plus (Fine Gold, 32 GB)', 'Moto E4 Plus (Iron Gray, 32 GB)', 'Intex Aqua 5.5 VR (Champagne, White, 8 GB)', 'Lenovo K8 Plus (Venom Black, 32 GB)', 'Redmi Note 4 (Dark Grey, 64 GB)', 'Panasonic Eluga Ray (Gold, 16 GB)', 'Moto C Plus (Pearl White, 16 GB)', 'Moto C Plus (Starry Black, 16 GB)', 'Moto C Plus (Fine Gold, 16 GB)', 'Lenovo K8 Plus (Fine Gold, 32 GB)', 'Panasonic Eluga Ray 700 (Champagne Gold, 32 GB)', 'Panasonic Eluga I5 (Gold, 16 GB)', 'OPPO F5 (Black, 64 GB)', 'Lenovo K8 Plus (Fine Gold, 32 GB)', 'Moto X4 (Super Black, 64 GB)', 'Swipe ELITE Sense- 4G with VoLTE', 'Swipe ELITE Sense- 4G with VoLTE']


['14,999', '9,999', '14,999', '11,999', '9,999', '17,999', '11,999', '9,999', '9,999', '4,499', '9,999', '11,999', '6,999', '6,999', '6,999', '6,999', '9,999', '9,999', '6,499', '24,990', '10,999', '22,999', '5,555', '5,555']


['4 GB RAM | 64 GB ROM | Expandable Upto 128 GB', '5.5 inch Full HD Display', '12MP + 12MP Dual Rear Camera | 5MP Front Camera', '3080 mAh Li-polymer Battery', 'Qualcomm Snapdragon 625 64 bit Octa Core 2GHz Processor', 'Android Nougat 7.1.2 | Stock Android Version', 'Android One Smartphone - with confirmed upgrades to Android Oreo and Android P', 'Brand Warranty of 1 Year Available for Mobile and 6 Months for Accessories', .....]

【问题讨论】:

  • @RomanPerekhrest 先生,它现在可读了吗,有关问题解决方案的任何帮助都会有所帮助..

标签: python xpath web-scraping


【解决方案1】:

试一试。我认为这是您的预期输出:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.flipkart.com/mobiles/pr?count=40&otracker=categorytree&p=sort%3Dpopularity&sid=tyy%2C4io')
soup = BeautifulSoup(res.text, "lxml")
for items in soup.select("._1UoZlX"):
    name = items.select("._3wU53n")[0].text
    price = items.select("._1vC4OE._2rQ-NK")[0].text
    specifics = ' '.join([item.text for item in items.select(".tVe95H")])
    print("Name: {}\nPrice: {}\nSpecification: {}\n".format(name,price,specifics))

单个潜在客户的输出:

Name: Mi A1 (Black, 64 GB)
Price: ₹14,999
Specification: 4 GB RAM | 64 GB ROM | Expandable Upto 128 GB 5.5 inch Full HD Display 12MP + 12MP Dual Rear Camera | 5MP Front Camera 3080 mAh Li-polymer Battery Qualcomm Snapdragon 625 64 bit Octa Core 2GHz Processor Android Nougat 7.1.2 | Stock Android Version Android One Smartphone - with confirmed upgrades to Android Oreo and Android P Brand Warranty of 1 Year Available for Mobile and 6 Months for Accessories

【讨论】:

  • 你执行了代码吗?有什么反馈?当有人试图解决你的问题时,至少试着做出回应。谢谢。
  • 很抱歉回复太晚了,我的项目已经停止了网络报废,所以不再使用它了。不过谢谢。它奏效了。
猜你喜欢
  • 1970-01-01
  • 2014-05-12
  • 1970-01-01
  • 2020-06-18
  • 2014-04-12
  • 1970-01-01
  • 2014-10-30
  • 2017-07-04
相关资源
最近更新 更多